Domain Fusion Analysis

Finding Functionally-Related Proteins by Analysis of Domain Fusions


We use a computational method for inferring protein function and interactions from genome sequences based on the observation that some pairs of interacting proteins have homologs in another organism fused into a single protein chain. A comparison of sequence homologs from multiple organisms can reveal thesed fused sequences, called Rosetta Stone sequences because they decipher the interactions between the protein pairs.

For example:

Here, A-B is the Rosetta Stone protein that suggests that proteins A and B are functionally related and have a better-than-random chance of interacting.

A total of 6,809 such putative protein- protein interactions were found in E. coli and 45,502 were found in yeast. Members of these protein pairs are generally functionally related, as determined by automatic comparisons of their SwissProt annotation, when known. The method is described in Detecting Protein Function and Protein-Protein Interactions from Genome Sequences, by Marcotte, E. M., Pellegrini, M., Ng, H.-L., Rice, D. W., Yeates, T., & Eisenberg, D. ( Science 285:751-753 (1999), link to Science abstract or Medline abstract).

Because some domains (e.g., SH3 domains) pair with large numbers of other domain types, we can filter predictions involving these "promiscuous" domains out to improve the signal-to-noise ratio of the method. Predictions are available with & without this filtration.

Together with another method, that of protein phylogenetic profiles, the Rosetta Stone analysis has been crucial for predicting the function of large numbers of genes that little else was known about. Phylogenetic profiles are described in Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles, Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. and Yeates, T. O.. (PNAS 96:4285-4288 (1999), link to the MEDLINE citation).


List of nonhomologous E. coli protein pairs linked by Rosetta Stone proteins

List of 749 nonhomologous E. coli protein pairs linked by Rosetta Stone proteins following removal of promiscuous domain-dependent predictions

List of protein domain pairs linked by Rosetta Stone proteins as calculated from ProDom domains

List of protein domain pairs linked by Rosetta Stone proteins following removal of promiscuous domain-dependent predictions (again, as calculated using ProDom domains)

List of promiscuous protein domains.

To identify domains unambiguously where used, each domain is accompanied by the consensus sequence for that domain as calculated in the ProDom domain database. This is included only as a convenience, and we recommend you check the ProDom web page for their most recent analysis.

Experimentally-observed interacting protein pairs for which a Rosetta Stone protein can be identified will be included in the upcoming release of the Database of Interacting Proteins.

Coming soon: the list of Rosetta Stone proteins and which domains they link
& some estimate of confidence in the pairwise predictions


Please address comments to:
Edward Marcotte
Last modified: Fri Aug 27 10:35:20 PST 1999
Copyright © 1999 Edward Marcotte & David Eisenberg