Abstract
NC-IUPHAR (International Union of Pharmacology Committee on Receptor Nomenclature and Drug Classification) and its subcommittees provide authoritative reports on the nomenclature and pharmacology of G protein-coupled receptors (GPCRs) that summarize their structure, pharmacology, and roles in physiology and pathology. These reports are published in Pharmacological Reviews (http://www.iuphar.org/nciuphar_arti.html) and through the International Union of Pharmacology (IUPHAR) Receptor Database web site (http://www.iuphar-db.org/iuphar-rd). The essentially complete sequencing of the human genome has allowed the cataloging of all of the human gene sequences potentially encoding GPCRs. The IUPHAR Receptor List (http://www.iuphar-db.org/iuphar-rd/list/index.htm) presents this catalog giving IUPHAR-approved nomenclature (where available), known ligands, and gene names for all of these potential receptors (excluding sensory receptors and pseudogenes) together with links to curated sequence, descriptive information, and additional links in the Entrez Gene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene). This list is a major new initiative of NC-IUPHAR that, through continuing curation, defines the target of our ongoing receptor classification and invites further input from the scientific community.
I. Introduction
Although NC-IUPHAR2 has well advanced projects to cover nuclear receptors, voltage-gated ion channels, and ligand-gated ion channels, its efforts have focused on GPCRs primarily because they represent a very large family of proteins that control many major physiological processes and are the targets of many effective drugs. The recent completion of the human genome sequence at 99% coverage (International Human Genome Sequencing Consortium, 2004) allows the identification of essentially all the GPCR genes that should be included in the IUPHAR receptor classification. Many of these genes are potential GPCRs in the sense that their sequences look like known GPCRs, but their activating ligands and signaling mechanisms are unknown. The characterization of these orphan receptors will be a major focus of pharmacology in the near future, and a well curated public list should be a very valuable resource.
The characteristic feature of all known GPCR proteins is that they have seven α-helical transmembrane domains. There are also extensive amino acid sequence similarities that divide them into several classes, each with characteristic highly conserved residues distributed throughout the molecule, which define identifying motifs, such as the DRY motif at the cytoplasmic end of the third transmembrane domain and prolines at specific positions in helices 5, 6, and 7 of the very large class related to rhodopsin. Analysis of the human genome sequence using a variety of techniques ranging from simple BLAST searches for genes with sequences similar to known GPCRs to the use of various gene prediction algorithms which are then filtered for the presence of GPCR motifs have generated well over 1000 candidate genes. A critical but not uniformly successful step in the analysis is to remove the false positives, i.e., the nonfunctional or incomplete genes known as pseudogenes. In some cases in the literature, these predictions are based on inaccurate early versions of the genomic sequence, whereas in others, the predictions do not encode full seven transmembrane domain proteins but retain enough of a GPCR motif for the prediction to be labeled a GPCR.
Pseudogenes are genes that, in the absence of selective pressure to maintain the gene on an evolutionary time scale, have accumulated disabling mutations (Harrison and Gerstein, 2002; Harrison et al., 2002). Most arise from a continuing evolutionary process of gene duplication in which a small fraction of the duplicated genes find new functions and are maintained while the vast majority decay through accumulated mutations. Severely disabled genes are easy to recognize since they have accumulated multiple disruptions of the coding sequence. At the other end of the spectrum are genes where only a single nucleotide creates a frameshift or in-frame termination codon leading to a truncated protein. In the latter cases, one must be concerned whether the reference human genomic sequence is truly representative of the human population or represents a polymorphism or sequencing error. A known example of such a polymorphism is the trace amine family receptor TRAR3, which has a polymorphic premature stop codon with an allele frequency of about 20% (Vanti et al., 2003). It is inevitable that some pseudogenes will not be excluded because their disabling mutations are too subtle to be easily recognized. In addition, we have chosen to include some pseudogenes with appropriate annotation when their omission could cause confusion. Some pseudogenes are included because their DNA sequence is so close to that of a functional GPCR that they might confound assays of mRNA expression. More commonly, confusion could arise as to whether a gene is functional or a pseudogene, if the basis for calling it a pseudogene, is relatively subtle. We hope to document these issues for each of these pseudogenes in the IUPHAR receptor database as it progresses.
II. The Scope of the International Union of Pharmacology Receptor List
Based on a number of phylogenetic analyses, the GPCRs are divided into three main classes based on protein sequence similarity, i.e., classes 1, 2, and 3 whose prototypes are rhodopsin, the secretin receptor, and the metabotropic glutamate receptors, respectively (Bockaert and Pin, 1999; Josefsson, 1999; Graul and Sadee, 2001; Joost and Methner, 2002; Fredriksson et al., 2003). About half of class 1 are presumed to be involved in the detection of odor, taste, or light (Adler et al., 2000; Zozulya et al., 2001; Niimura and Nei, 2003; Zhang et al., 2003). Relatively few of these “sensory receptors” have been shown experimentally to respond to sensory stimuli or to be expressed in sensory organs. The vast majority have been classified as sensory solely because they share significant sequence identity to known sensory receptors. In our first iteration of the list, we have included only “nonsensory” GPCRs. This list omits 7 opsin-like receptors, 39 members of the taste receptor family, and roughly 400 potentially functional olfactory receptors. We expect to add to the next version of the list the opsins, the taste receptors, and those olfactory family receptors with well documented expression in nonolfactory tissues.
After extensive curation, our current list includes 276 functional genes from class 1, 53 from class 2, and 19 from class 3 (Tables 1, 2, 3). We have also listed 11 frizzled and smoothened receptors as a separate class (Table 4). The frizzled receptors have been extensively studied, and G protein coupling appears to be a feature of some, but not all, family members (Winklbauer et al., 2001). They most closely resemble class 2. In each table, receptors are listed alphabetically, in families, according to the descriptions of their ligands in common usage or, in the case of orphans, according to the phylogenetic clustering of Vassilatis et al. (2003). From left to right are listed the IUPHAR receptor code (Humphrey and Barnard, 1998); the endogenous ligands associated with each receptor; the official or proposed IUPHAR receptor nomenclature; the human gene symbol assigned by the HUGO Gene Nomenclature Committee (http://www.gene.ucl.ac.uk/nomenclature/); and the unique identifiers (GeneIDs) assigned to the human, mouse, and rat genes in the Entrez Gene (formerly LocusLink) database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) where more detailed information about each gene can be found.
A regularly updated, hyperlinked version of the receptor list can be found at http://www.iuphar-db.org/iupharrd/list/index.htm. The receptor list is also available for download either as tab-delimited text files or as an Excel spreadsheet from http://www.iuphar-db.org/iuphar-rd/list/downloads.htm. In addition to the information displayed on the Web, these files also include the human chromosomal location and two sequence standards. The RefSeq identifier supplied and curated by the National Centre for Biotechnology Information (NCBI: http://www.ncbi.nih.gov/RefSeq/) represents the nucleotide sequence, and the SwissProt identifier (http://www.ebi.ac.uk/swissprot/index.html) represents the protein sequence. The SwissProt Protein Knowledgebase is a highly curated and annotated protein sequence database established in 1986. It is maintained collaboratively by the Swiss Institute for Bioinformatics and the European Bioinformatics Institute.
There are a number of caveats with regard to this list. First, it is complete only to the extent that the human genomic sequence is complete and adequately annotated. The current version, NCBI build 35, dates from July 2004, and much of the sequence analysis predates its release which filled quite a few gaps, including one which contained the well known vasopressin V1B receptor. Thus, there may be some genes in the new regions that have not been incorporated in the list. Furthermore, the newest version still contains 308 gaps containing 28 Mb of euchromatic sequence that could easily contain a number of relatively compact GPCR genes. Second, the receptor list is human-centric, partially reflecting a bias in favor of human pharmacology, but also reflecting the incomplete sequencing of other mammalian genomes. There are quite a few examples where the mouse or rat genomes contain genes without a functional human counterpart (Vassilatis et al., 2003). Many of the genes are missing in human, due to recent expansions in the rodent lineage of clustered gene families such as the trace amine receptor family. Many of the remainder are cases where the human ortholog is a pseudogene. Neither of these categories are listed unless the human pseudogene is potentially confusing. At the time of writing, the gaps in the mouse and rat lists have more to do with the completeness of the genomic sequences than a lack of mouse and rat orthologs for human genes. Third, not all 7TM proteins need be GPCRs, and we have only included 7TM proteins where at least one member of a phylogenetic cluster is known to be G protein-coupled. Fourth, as annotated in the list, some GPCRs are composed of multiple distinct protein subunits not all of which have a 7TM structure (Bockaert and Pin, 1999; Kniazeff et al., 2002; Poyner et al., 2002; Zhang et al., 2003).
III. Maintenance of the List
Over the past 10 years, the ligands for orphan receptors have been discovered at a steady rate of about six per year. Such discoveries are generally supported by other laboratories and become seminal publications in the pharmacological literature. At some point, however, these publications will always be isolated reports. NC-IUPHAR, through the receptor list, will maintain the current consensus as perceived by its members and correspondents, yet also reflect the invaluable efforts of the teams associated with the HUGO, NCBI, and SwissProt nomenclature groups. This will be monitored by an “evolving pharmacology committee” consisting of the authors.
It is in everyone's interest that the receptor list is public, that the names are consistent, and that the entries are displayed in an organized and recognizable way. By definition, this list is a “moving target” and will be modified based on feedback from scientists who may address comments to comments{at}iupharbb.org. The receptor list will be updated every 6 months.
Footnotes
-
↵2 Abbreviations: NC-IUPHAR, International Union of Pharmacology Committee on Receptor Nomenclature and Drug Classification; GPCR, G protein-coupled receptor; HUGO, The Human Genome Organization; NCBI, National Centre for Biotechnology Information; 7TM, seven transmembrane.
-
Article, publication date, and citation information can be found at http://pharmrev.aspetjournals.org.
-
doi:10.1124/pr.57.2.5.
-
↵1 This article was written in a personal capacity and does not represent the opinions of the National Institutes of Health, Department of Health and Human Services, or the Federal Government.
- The American Society for Pharmacology and Experimental Therapeutics