Please use this identifier to cite or link to this item: http://hdl.handle.net/1813/6956
 Title: Directory Design and Record Allocation for List and Cluster Files Authors: Yang, C. S. Keywords: computer sciencetechnical report Issue Date: Mar-1976 Publisher: Cornell University Citation: http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR76-272 Abstract: Most file organizations for on-line econdary key retrieval consist of two subcomponents - a structural file and a data file. The structural file provides, a physical access path in the data base for each query, so that searches can be restricted to small portions of the data file. The data file contains the data records where information is stored. Hence, the file design problem consists of (1) an efficient design of the structural file, and (2) an allocation of data records in the data file so that a given set of data records can be jointly retrieved at mimimum cost. List structure organizations and clustered organizations are shown to be important structures for secondary key retrieval. This thesis studies the design aspects of the structural and data files for both of the above organizations. The most popular list structure files are the inverted list and the multilist organizations. It is shown that either organization is a special case of a new class of hybrid list organizations. File elements in this class are characterized by a list length parameter $\cal l_{th}$, and a list is stored as an inverted list or as a multilist depending on whether its length is larger than $\cal l_{th}$ or not. Analytical and simulation results indicate that neither a pure inverted list organization nor a pure multilist organization is normally the best choice for all elements in the class. A new method is also introduced for the structure of combined indices for list structure files. A combined index is created only if its component keys co-occur frequently in the queries. Experimental results show that such combined indices do improve the search performances for both inverted list and multilist organizations. Search methods and physical implementations of clustered organizations are discussed. The balancing of cluster trees is shown to be an important concept. A search model is established to obtain optimal branching ratios for cluster trees. The optimal branching ratio is seen to be dependent on certain statistical characteristics of the data base. The last part of the thesis concerns itself with the arrangement of records in the dataa file. The data records are assigned to blocks in the disk like devices. The goal is to minimize the average number of block accesses in processing the query set in the system. This problem turns out to be a polynomial complete problem. Heuristic algorithms are therefore used for the record block assignment. Experimental results show that a record organization produced by the heuristic algorithms is more efficient than a random assignment of records to the blocks. URI: http://hdl.handle.net/1813/6956 Appears in Collections: Computer Science Technical Reports

Files in This Item:

File Description SizeFormat