'Unknown' proteins and 'orphan' enzymes: the missing half of the engineering parts list--and how to find it

Biochem J. 2009 Dec 14;425(1):1-11. doi: 10.1042/BJ20091328.

Abstract

Like other forms of engineering, metabolic engineering requires knowledge of the components (the 'parts list') of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of 'unknown' proteins and 'orphan' enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the 'missing parts list' problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Animals
  • Databases, Protein*
  • Enzymes / classification
  • Enzymes / genetics*
  • Enzymes / metabolism
  • Genomics / methods
  • Humans
  • Open Reading Frames / genetics*
  • Phylogeny
  • Proteins / classification
  • Proteins / genetics*
  • Proteins / metabolism
  • Proteomics / methods

Substances

  • Enzymes
  • Proteins