Cornell University Graduate School >
Cornell Theses and Dissertations >
Please use this identifier to cite or link to this item:
|Title: ||Modeling Additive Structure and Detecting Interactions with Groves of Trees|
|Authors: ||Sorokina, Daria|
|Keywords: ||Computer Science|
|Issue Date: ||30-Jul-2008|
|Abstract: ||Discovery of additive structure is an important step towards understanding a complex multi-dimensional function, because it allows for expressing this function as the sum of lower-dimensional or otherwise simpler components. Modeling additive structure also opens up opportunities for learning better regression models.
The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers.
This dissertation analyzes benefits of modelling additive structure for prediction and interaction detection problems. It describes a new learning algorithm called Groves, which is an ensemble of additive regression trees. Groves is based on such existing techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. Regression version of the algorithm, Additive Groves, and its classification counterpart, Gradient Groves, yield consistently high performance across a variety of problems, outperforming on average a large number of other algorithms.
Additive nature of Groves makes it particularly useful for interaction detection. This dissertation introduces a new approach to interaction detection: it is based on comparing the performance of restricted and unrestricted predictive models. Groves of trees allow variable interactions to be carefully controlled and therefore are especially useful for this framework.
The details of proposed practical approach to interaction detection analysis are demonstrated on real data describing the abundance of different species of birds in the prairies east of the southern Rocky Mountains.|
|Appears in Collections:||Cornell Theses and Dissertations|
Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.