Skip to main content


eCommons@Cornell >
College of Engineering >
Computer Science >
Computer Science Technical Reports >

Please use this identifier to cite or link to this item:
Title: Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval
Authors: Voorhees, Ellen M.
Keywords: computer science
technical report
Issue Date: Jul-1986
Publisher: Cornell University
Abstract: Searching hierarchically clustered document collections can be effective, but creating the cluster hierarchies is expensive since there are both many documents and many terms. However, the information in the document-term matrix is sparse: documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents.
Appears in Collections:Computer Science Technical Reports

Files in This Item:

File Description SizeFormat
86-765.pdf1.58 MBAdobe PDFView/Open
86-765.ps446.29 kBPostscriptView/Open

Refworks Export

Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.


© 2014 Cornell University Library Contact Us