Skip to main content


eCommons@Cornell >
College of Engineering >
Computer Science >
Computer Science Technical Reports >

Please use this identifier to cite or link to this item:
Title: Contribution to the Theory of Indexing
Authors: Salton, Gerard
Yang, C. S.
Yu, C. T.
Keywords: computer science
technical report
Issue Date: Nov-1973
Publisher: Cornell University
Abstract: An attempt is made to characterize the usefulness of terms occurring in stored documents and user queries as a function of their frequency characteristics across the documents of a collection. It is found that the best terms are those having medium frequency in the collection and skewed frequency distributions. Correspondingly, terms exhibiting either very high or very low document frequency are not as useful. To improve the indexing vocabulary, it becomes necessary to group low frequency terms into classes, and to break up high frequency terms by forming phrases. An indexing theory is described based on term frequency considerations, and a new phrase generation method is introduced. The resulting improvements in the indexing vocabulary are evaluated.
Appears in Collections:Computer Science Technical Reports

Files in This Item:

File Description SizeFormat
73-188.pdf1 MBAdobe PDFView/Open
73-188.ps398.55 kBPostscriptView/Open

Refworks Export

Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.


© 2014 Cornell University Library Contact Us