Document Length Normalization
Loading...
No Access Until
Permanent Link(s)
Collections
Other Titles
Abstract
In the TREC collection -- a large full-text experimental text collection with widely varying document lengths -- we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
1995-07
Publisher
Cornell University
Keywords
computer science; technical report
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Degree Discipline
Degree Name
Degree Level
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1529
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
technical report