Skip to main content


eCommons@Cornell >
Faculty of Computing and Information Science >
Computing and Information Science >
Computing and Information Science Technical Reports >

Please use this identifier to cite or link to this item:
Title: Plagiarism Detection in arXiv
Authors: Sorokina, Daria
Gehrke, Johannes
Warner, Simeon
Ginsparg, Paul
Keywords: computer science
technical report
Issue Date: 28-Sep-2006
Publisher: Cornell University
Abstract: We describe a large-scale application of methods for finding plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.
Appears in Collections:Computing and Information Science Technical Reports

Files in This Item:

File Description SizeFormat
TR2006-2046.pdf171.46 kBAdobe PDFView/Open

Refworks Export

Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.


© 2014 Cornell University Library Contact Us