Skip to main content


eCommons@Cornell >
College of Engineering >
Computer Science >
Computer Science Technical Reports >

Please use this identifier to cite or link to this item:
Title: Efficient Algorithms for Protein Sequence Design and the Analysis ofCertain Evolutionary Fitness Landscapes
Authors: Kleinberg, Jon
Keywords: computer science
technical report
Issue Date: Oct-1998
Publisher: Cornell University
Abstract: Protein sequence design is a natural inverse problem to protein structure prediction: given a target structure in three dimensions, we wish to design an amino acid sequence that is likely fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence which achieves a dense hydrophobic core with few solvent-exposed hydrophobic residues. Sun et al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimal sequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remains possible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may have many sequences that are optimal in the model of Sun et al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are ``connected'' by successive point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressed in terms of submodular functions, a fundamental object of study in combinatorial optimization.
Appears in Collections:Computer Science Technical Reports

Files in This Item:

File Description SizeFormat
98-1714.pdf187.32 kBAdobe PDFView/Open
98-1714.ps376.65 kBPostscriptView/Open

Refworks Export

Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.


© 2014 Cornell University Library Contact Us