Is there any data set which have duplicate documents and in categorized form?



I have created a model which takes new documents and tells in which category it lies, along with that if same document keeps on arriving, then my model also detect the duplicates.
to check the efficiency of my model i want a huge document data set with duplicates in it and also categorized according to labels . please tell me if anyone knows how to retrieve such data set.