eCommons

 

A Statistical Model Of Transcription Factor Binding Site Gain And Loss

Other Titles

Author(s)

Abstract

Gene regulation is a critical determinant of an organism's phenotype. Furthermore, there is mounting evidence that gene regulation, or rather gradual changes in gene regulation as a result regulatory sequence turnover, has played an important role in evolution and speciation. Given the remarkable phenotypic diversity among mammals, it is surprising to learn that all share roughly the same set of around 20,000 genes, most of which are highly conserved. Indeed, the degree of divergence at the gene level fails to explain the diversity observed among mammals, suggesting that it is not differences in genes that explain the balance of the phenotypic changes, but when and where genes are used in different species. The spatiotemporal expression patterns of genes are intricately controlled through the process of transcriptional regulation, a multifactorial process involving interactions between a host of regulatory sequences, DNA-binding proteins and cofactors, signaling pathways and epigenetic factors. Transcription factor binding sites (TFBS's) are an important class of regulatory sequences involved in the gene regulatory process and it is known that TFBS's are frequently gained and lost in mammalian genomes. This is consistent with an important role of TFBS's in gene regulatory evolution. However, little is known about the TFBS turnover process and its relationship to gene regulatory evolution and, by extension, phenotypic change and the adaptive evolutionary process. In order to gain insight into the process of TFBS turnover, it is necessary to reliably identify TFBS's that have been gained or lost in a lineage-specific manner through the process of regulatory evolution. Here I present a phylogenetic hidden-Markov model (phylo-HMM) that describes the process of lineage-specific TFBS gain and loss and test its performance on simulated and biological datasets using two methods: a Viterbi algorithm implementation and a Gibbs sampler. In both contexts, the model performs well on simulated data but does not appear robust to violations of the model assumptions that are present in biological datasets. With further refinement, the model and methods may yield better performance on real data. However, key limitations include large memory and computational requirements and a need to simplify the model and restrict the dataset size to ensure tractability. These shortcomings increase the user inputs required to apply the methods and complicate data interpretation and generalization, thus limiting the utility of the methods.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

2010-10-20

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record