|
eCommons@Cornell >
Faculty of Computing and Information Science >
Computing and Information Science >
Computing and Information Science Technical Reports >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1813/5743
| Title: | Plagiarism Detection in arXiv |
| Authors: | Sorokina, Daria Gehrke, Johannes Warner, Simeon Ginsparg, Paul |
| Keywords: | computer science technical report |
| Issue Date: | 28-Sep-2006 |
| Publisher: | Cornell University |
| Citation: | http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cis/TR2006-2046 |
| Abstract: | We describe a large-scale application of methods for finding
plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger. |
| URI: | http://hdl.handle.net/1813/5743 |
| Appears in Collections: | Computing and Information Science Technical Reports
|
Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.
|