Classification of Transmembrane Protein Families in the Caenorhabditis elegans Genome and Identification of Human Orthologs

  1. Maido Remm1,2 and
  2. Erik Sonnhammer1,3
  1. 1Center for Genomics Research, Karolinska Institute, Stockholm, 17177 Sweden; 2Estonian Biocentre, Tartu, 51010 Estonia

Abstract

The complete genome sequence of the nematode Caenorhabditis elegans provides an excellent basis for studying the distribution and evolution of protein families in higher eukaryotes. Three fundamental questions are as follows: How many paralog clusters exist in one species, how many of these are shared with other species, and how many proteins can be assigned a functional counterpart in other species? We have addressed these questions in a detailed study of predicted membrane proteins in C. elegans and their mammalian homologs. All worm proteins predicted to contain at least two transmembrane segments were clustered on the basis of sequence similarity. This resulted in 189 groups with two or more sequences, containing, in total, 2647 worm proteins. Hidden Markov models (HMMs) were created for each family, and were used to retrieve mammalian homologs from the SWISSPROT, TREMBL, and VTS databases. About one-half of these clusters had mammalian homologs. Putative worm-mammalian orthologs were extracted by use of nine different phylogenetic methods and BLAST. Eight clusters initially thought to be worm-specific were assigned mammalian homologs after searching EST and genomic sequences. A compilation of 174 orthology assignments made with high confidence is presented. [Tables describing transmembrane protein families and orthology assignments are available from ftp.cgr.ki.se/pub/data/worm.]

Footnotes

  • 3 Corresponding author.

  • E-MAIL erik.sonnhammer{at}cgr.ki.se; FAX 46-8-337983.

  • Article published online before print: Genome Res.,10.1101/gr.149100.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.149100.

    • Received May 24, 2000.
    • Accepted August 11, 2000.
| Table of Contents

Preprint Server