Skip to main content


eCommons@Cornell >
College of Engineering >
Computer Science >
Computer Science Technical Reports >

Please use this identifier to cite or link to this item:
Title: Automatic Structuring and Retrieval of Large Text Files
Authors: Salton, Gerard
Allan, James
Buckley, Chris
Keywords: computer science
technical report
Issue Date: Jun-1992
Publisher: Cornell University
Abstract: In many operational environments, large text files must be processed covering a wide variety of different topic areas. Aids must then be provided to the user that permit collection browsing and make it possible to locate particular items on demand. The conventional text analysis methods based on preconstructed knowledge-bases and other vocabulary-control tools are difficult to apply when the subject coverage is unrestricted. An alternative approach, applicable to text collections in any subject area, is introduced which uses the document collections themselves as a basis for the text analysis, together with sophisticated text matching operations carried out at several levels of detail. Methods are described for relating semantically similar pieces of text, and for using the resulting hypertext structures for collection browsing and information retrieval.
Appears in Collections:Computer Science Technical Reports

Files in This Item:

File Description SizeFormat
92-1286.pdf3.75 MBAdobe PDFView/Open
92-1286.ps826.88 kBPostscriptView/Open

Refworks Export

Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.


© 2014 Cornell University Library Contact Us