Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Unravelling the human genome–phenome relationship using phenome-wide association studies

Key Points

  • Cross-phenotype associations and pleiotropy have been identified in human studies.

  • Phenome-wide association studies (PheWAS) are an emerging method to identify cross-phenotype associations.

  • Phenomes have been characterized using electronic health records, which provide a real-time clinical representation of an individual's health conditions.

  • Phenomes have also been developed by aggregating data from traditional epidemiological studies, which provide a snapshot of a participant's health, lifestyle and environmental exposures.

  • PheWAS have been performed using data from clinical health records or from epidemiological studies; given that neither approach is designed to fully capture all aspects of the human phenome these approaches should be considered complementary.

  • Active areas of research for PheWAS include the addition of diverse populations, establishment of 'genome–phenome-wide' significance, and development of methods for the analysis and visualization of these complex associations.

Abstract

Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome–phenome relationship.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: From GWAS in EHRs and epidemiological studies to PheWAS.
Figure 2: Anatomy of the ICD-CM codes.
Figure 3: An example of PheWAS phenotyping using the electronic health record.
Figure 4: Possible interpretations of PheWAS findings.

Similar content being viewed by others

References

  1. Sturtevant, A. J. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J. Exp. Zool. 14, 59 (1913).

    Article  Google Scholar 

  2. Gough, S. C. & Simmonds, M. J. The HLA rgion and autoimmune disease: associations and mechanisms of action. Curr. Genom. 8, 453–465 (2007).

    Article  CAS  Google Scholar 

  3. Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).

    Article  CAS  PubMed  Google Scholar 

  4. Criswell, L. A. et al. Analysis of families in the multiple autoimmune disease genetics consortium (MADGC) collection: the PTPN22 620W allele associates with multiple autoimmune phenotypes. Am. J. Hum. Genet. 76, 561–571 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Zhernakova, A., van Diemen, C. C. & Wijmenga, C. Detecting shared pathogenesis from the shared genetics of immune-related diseases. Nat. Rev. Genet. 10, 43–55 (2009). This review highlights the shared influence of genetic variants for autoimmune diseases.

    Article  CAS  PubMed  Google Scholar 

  6. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).

  7. McPherson, R. et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488–1491 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).

    Article  CAS  PubMed  Google Scholar 

  9. Samani, N. J. et al. Genomewide association analysis of coronary artery disease. N. Engl. J. Med. 357, 443–453 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009). This is the first comprehensive characterization of GWAS-identified variants from the literature.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Stearns, F. W. One hundred years of pleiotropy: a retrospective. Genetics 186, 767–773 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Wagner, G. P. & Zhang, J. The pleiotropic structure of the genotype-phenotype map: the evolvability of complex organisms. Nat. Rev. Genet. 12, 204–213 (2011). This is an excellent review of pleiotropy.

    Article  CAS  PubMed  Google Scholar 

  15. Tyler, A. L., Crawford, D. C. & Pendergrass, S. A. The detection and characterization of pleiotropy. discovery, progress, and promise. Brief. Bioinform. 17, 13–22 (2016).

    Article  CAS  PubMed  Google Scholar 

  16. Rastegar-Mojarad, M., Ye, Z., Kolesar, J. M., Hebbring, S. J. & Lin, S. M. Opportunities for drug repositioning from phenome-wide association studies. Nat. Biotechnol. 33, 342–345 (2015).

    Article  CAS  PubMed  Google Scholar 

  17. Collins, F. S. & Varmus, H. A. New initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Pendergrass, S. A. & Ritchie, M. Phenome-wide association studies: leveraging comprehensive phenotypic and genotypic data for discovery. Curr. Genet. Med. Rep. 3, 92–100 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hebbring, S. J. The challenges, advantages and future of phenome-wide association studies. Immunology 141, 157–165 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Pendergrass, S. A. et al. Phenome-wide association studies: embracing complexity for discovery. Hum. Hered. 3–4, 111–123 (2015).

    Article  CAS  Google Scholar 

  21. Stranger, B. E. et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8, e1002639 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Veyrieras, J. B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Pai, A. A. et al. The contribution of RNA decay quantitative trait loci to inter-individual variation in steady-state gene expression levels. PLoS Genet. 8, e1003000 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Gaffney, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Battle, A. et al. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015). This is a systematic study of the ways in which genetic variants influence the expression of transcripts and proteins.

    Article  CAS  PubMed  Google Scholar 

  27. Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hause, R. et al. Identification and validation of genetic variants that influence transcription factor and cell signaling protein levels. Am. J. Hum. Genet. 95, 194–208 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  PubMed  Google Scholar 

  33. Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011).

    Article  CAS  PubMed  Google Scholar 

  34. Bush, W. S. & Moore, J. H. Chapter 11: genome-wide association studies. PLoS Comput. Biol. 8, e1002822 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Witte, J. S. Genome-wide association studies and beyond. Annu. Rev. Publ. Health 31, 9–20 (2010).

    Article  Google Scholar 

  36. Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  PubMed  Google Scholar 

  38. Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).

    Article  CAS  PubMed  Google Scholar 

  39. Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011). This review is an excellent overview of existing and potential uses of EHRs in the context of genomics.

    Article  CAS  PubMed  Google Scholar 

  40. Banda, Y. et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics 200, 1285–1295 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Kvale, M. N. et al. Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics 200, 1051–1060 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).

    Article  PubMed  Google Scholar 

  43. McCarty, C. et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Medical Genomics 4, 13 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Ritchie, M. D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Denny, J. C. et al. Identification of genomic predictors of atrioventricular conduction. Circulation 122, 2016–2021 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Ritchie, M. D. et al. Electronic medical records and genomics (eMERGE) network exploration in cataract: several new potential susceptbility loci. Mol. Vis. 20, 1281–1295 (2014).

    PubMed  PubMed Central  Google Scholar 

  47. McDavid, A. et al. Enhancing the power of genetic association studies through the use of silver standard cases derived from electronic medical records. PLoS ONE 8, e63481 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Turner, S. D. et al. Knowledge-driven multi-locus analysis reveals gene–gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks. PLoS ONE 6, e19586 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Kullo, I. J. et al. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J. Am. Med. Inform. Assoc. 17, 568–574 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Kho, A. N. et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 212–218 (2012).

    Article  PubMed  Google Scholar 

  51. Ober, C. & Vercelli, D. Gene-environment interactions in human disease: nuisance or opportunity? Trends Genet. 27, 107–115 (2011). This is an excellent review of the role of gene–environment interactions in the context of human disease.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Jones, R., Pembrey, M., Golding, J. & Herrick, D. The search for genenotype/phenotype associations and the phenome scan. Paediatr. Perinatal Epidemiol. 19, 264–275 (2005).

    Article  Google Scholar 

  53. Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).

    Article  CAS  PubMed  Google Scholar 

  54. Denny, J. C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010). This is the first published PheWAS performed in a biorepository linked to EHRs.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. International Multiple Sclerosis Genetics Consortium et al. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851–862 (2007).

  56. De Jager, P. L. et al. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat. Genet. 41, 776–782 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. WTCCC Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  58. Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).

    Article  CAS  PubMed  Google Scholar 

  59. Gudbjartsson, D. F. et al. A sequence variant in ZFHX3 on 16q22 associates with atrial fibrillation and ischemic stroke. Nat. Genet. 41, 876–878 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Raychaudhuri, S. et al. Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat. Genet. 40, 1216–1223 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1111 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Cheng, I. et al. Pleiotropic effects of genetic risk variants for other cancers on colorectal cancer risk: PAGE, GECCO and CCFR consortia. Gut 63, 800–807 (2014).

    Article  CAS  PubMed  Google Scholar 

  63. Park, S. L. et al. Pleiotropic associations of risk variants identified for other cancers with lung cancer risk: the PAGE and TRICL consortia. J. Natl Cancer Inst. 106, dju061 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Setiawan, V. W. et al. Cross-cancer pleiotropic analysis of endometrial cancer: PAGE and E2C2 consortia. Carcinogenesis 35, 2068–2073 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Park, S. L. et al. Association of cancer susceptibility variants with risk of multiple primary cancers: the Population Architecture using Genomics and Epidemiology study. Cancer Epidemiol. Biomarkers Prev. 23, 2568–2578 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Kocarnik, J. M. et al. Pleiotropic and sex-specific effects of cancer GWAS SNPs on melanoma risk in the Population Architecture Using Genomics and Epidemiology (PAGE) study. PLoS ONE 10, e0120491 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Pierce, B. L. & Ahsan, H. Genome-wide pleiotropy scan identifies HNF1A region as a novel pancreatic cancer susceptibility locus. Cancer Res. 71, 4352–4358 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Campa, D. et al. A genome-wide pleiotropy scan does not identify new susceptibility for estrogen receptor negative breast cancer. PLoS ONE 9, e85955 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. Panagiotou, O. A. et al. A genome-wide pleiotropy scan for prostate cancer risk. Eur. Urol. 67, 649–657 (2015).

    Article  PubMed  Google Scholar 

  70. Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011). This study highlights the shared complex architecture of genetic factors influencing autoimmune diseases.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Pendergrass, S. A. et al. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet. Epidemiol. 35, 410–422 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Carroll, R. J., Bastarache, L. & Denny, J. C. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics 30, 2375–2376 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Millard, L. A. C. et al. MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization. Sci. Rep. 5, 16645 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Matise, T. C. et al. The next PAGE in understanding complex traits: design for the analysis of population architecture using genetics and epidemiology (PAGE) study. Am. J. Epidemiol. 174, 849–859 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Zeggini, E. & Ioannidis, J. P. Meta-analysis in genome-wide association studies. Pharmacogenomics 10, 191–201 (2009).

    Article  PubMed  Google Scholar 

  76. Evangelou, E. & Ioannidis, J. P. A. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).

    Article  CAS  PubMed  Google Scholar 

  77. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).

  78. Dumitrescu, L. et al. Genetic determinants of lipid traits in diverse populations from the Population Architecture using Genomics and Epidemiology (PAGE) study. PLoS Genet. 7, e1002138 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Kathiresan, S. et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 56–65 (2009).

    Article  CAS  PubMed  Google Scholar 

  80. Hall, M. A. et al. Detection of pleiotropy through a phenome-wide association study (PheWAS) of epidemiologic data as part of the Environmental Architecture for Genes Linked to Environment (EAGLE) Study. PLoS Genet. 10, e1004678 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Mitchell, S. et al. Investigating the relationship between mitochondrial genetic variation and cardiovascular-related traits to develop a framework for mitochondrial phenome-wide association studies. BioData Min. 7, 6 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  82. Pendergrass, S., Dudek, S., Crawford, D. & Ritchie, M. Visually integrating and exploring high throughput phenome-wide association study (PheWAS) results using PheWAS-View. BioData Min. 5, 5 (2014).

    Article  Google Scholar 

  83. Xing, E. P. et al. GWAS in a box: statistical and visual analytics of structured associations via GenAMap. PLoS ONE 9, e97524 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  84. Moore, C. B., Wallace, J. R., Frase, A. T., Pendergrass, S. A. & Ritchie, M. D. BioBin: a bioinformatics tools for automating the binning of rare variants using publicly available biological knowledge. BMC Med Genomics 6, S6 (2013).

    PubMed  PubMed Central  Google Scholar 

  85. Kraja, A. T. et al. Pleiotropic genes for metabolic syndrome and inflammation. Mol. Genet. Metab. 112, 317–338 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Pendergrass, S. A. et al. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet. 9, e1003087 (2013). This study is the first epidemiologically based PheWAS.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Dumitrescu, L. et al. Towards a phenome-wide catalog of human clinical traits impacted by genetic ancestry. BioData Min. 8, 35 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  88. Rosenberg, N. A. et al. Genome-wide association studies in diverse populations. Nat. Rev. Genet. 11, 356–366 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Jaffe, S. Planning for US Precision Medicine Initiative underway. Lancet 385, 2448–2449 (2015).

    Article  PubMed  Google Scholar 

  90. Flohil, S. C. et al. Prevalence of actinic keratosis and its risk factors in the general population: The Rotterdam Study. J. Invest. Dermatol. 133, 1971–1978 (2013).

    Article  CAS  PubMed  Google Scholar 

  91. Han, J. et al. A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4, e1000074 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  92. Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010). This study explores the potential of commercial web-based surveys for study participants.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  93. Zhang, M. et al. Genome-wide association studies identify several new loci associated with pigmentation traits and skin cancer risk in European Americans. Hum. Mol. Genet. 22, 2948–2959 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Jacobs, L. C. et al. IRF4, MC1R and TYR genes are risk factors for actinic keratosis independent of skin color. Hum. Mol. Genet. 24, 3296–3303 (2015).

    Article  CAS  PubMed  Google Scholar 

  95. Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 12, 628–640 (2011).

    Article  CAS  PubMed  Google Scholar 

  96. Namjou, B. et al. A GWAS study on liver function test using eMERGE network participants. PLoS ONE 10, e0138677 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  97. Denny, J. C. et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Hebbring, S. J. et al. PheWAS approach in studying HLA-DRB1*1501. Genes Immun. 14, 187–191 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Cronin, R. M. et al. Phenome wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index. Front. Genet. 5, 250 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  100. Shameer, K. et al. A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects. Hum. Genet. 133, 95–109 (2014).

    Article  PubMed  Google Scholar 

  101. Namjou, B. et al. Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to eosinophilic esophagitis. Front. Genet. 5, 401 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  102. Ye, Z. et al. Phenome-wide association studies (PheWASs) for functional variants. Eur. J. Hum. Genet. 23, 523–529 (2015).

    Article  CAS  PubMed  Google Scholar 

  103. Liao, K. P. et al. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheum. 65, 571–581 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Neuraz, A. et al. Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput. Biol. 9, e1003405 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  105. Boyd, A. D. et al. Metrics and tools for consistent cohort discovery and financial analyses post-transition to ICD-10-CM. J. Am. Med. Inform. Assoc. 22, 730–737 (2015).

    PubMed  PubMed Central  Google Scholar 

  106. Turer, R. W., Zuckowsky, T. D., Causey, H. J. & Rosenbloom, S. T. ICD-10-CM Crosswalks in the primary care setting: assessing reliability of the GEMs and reimbursement mappings. J. Am. Med. Inform. Assoc. 22, 417–425 (2015).

    Article  PubMed  Google Scholar 

  107. Hebbring, S. J. et al. Application of clinical text data for phenome-wide association studies (PheWASs). Bioinformatics 31, 1981–1987 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Rhodes, E. T., Laffel, L. M. B., Gonzalez, T. V. & Ludwig, D. S. Accuracy of administrative coding for type 2 diabetes in children, adolescents, and young adults. Diabetes Care 30, 141–143 (2007).

    Article  PubMed  Google Scholar 

  109. Richesson, R. L. et al. A comparison of phenotype definitions for diabetes mellitus. J. Am. Med. Inform. Assoc. 20, e319–e326 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  110. Ritchie, M. D. et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86, 560–572 (2010). This study demonstrates that the phenotypes defined by billing codes in the EHRs can replicate known genotype–phenotype associations, suggesting that EHRs in general can be used for genomic discovery.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Dumitrescu, L., Diggins, K. E., Goodloe, R. & Crawford, D. C. Testing population-specific quantitative trait associations for clinical outcome relevance in a biorepository linked to electronic health records: LPA and myocardial infarction in African Americans. Pac. Symp. Biocomput. 21, 96–107 (2016).

    PubMed  PubMed Central  Google Scholar 

  112. Moriyama, I. M., Loy, R. M. & Robb-Smith, A. H. T. History of the Statistical Classification of Diseases and Causes of Death [online] (CDC — National Center for Health Statistics, 2011).

    Google Scholar 

  113. Wiley, L. K., Shah, A., Xu, H. & Bush, W. S. ICD-9 tobacco use codes are effective identifiers of smoking status. J. Am. Med. Inform. Assoc. 20, 652–658 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Oetjens, M. et al. Utilization of an EMR-biorepository to identify the genetic predictors of calcineurin-inhibitor toxicity in heart transplant recipients. Pac. Symp. Biocomput 2014, 253–264 (2014).

    Google Scholar 

  115. Restrepo, N. A., Farber-Eger, E., Goodloe, R., Haines, J. L. & Crawford, D. C. Extracting primary open-angle glaucoma from electronic medical records for genetic association studies. PLoS ONE 10, e0127817 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  116. Davis, M. F. Sriram, S., Bush, W. S., Denny, J. C. & Haines, J. L. Automated extraction of clinical traits of multiple sclerosis in electronic medical records. J. Am. Med. Inform. Assoc. 20, e334–e340 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  117. Peissig, P. et al. Construction of atorvastatin dose-response relationships using data from a large population-based DNA biobank. Bas. Clin. Pharmacol. Toxicol. 100, 286–288 (2007).

    Article  CAS  Google Scholar 

  118. Warner, J. L., Denny, J. C., Kreda, D. A. & Alterovitz, G. Seeing the forest through the trees: uncovering phenomic complexity through interactive network visualization. J. Am. Med. Inform. Assoc. 22, 324–329 (2015).

    Article  PubMed  Google Scholar 

  119. Yu, S. et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc. 22, 993–1000 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Lasko, T. A., Denny, J. C. & Levy, M. A. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS ONE 8, e66341 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Deans, A. R. et al. Finding our way through phenotypes. PLoS Biol. 13, e1002033 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  122. Bennett, S. N. et al. Phenotype harmonization and cross-study collaboration in GWAS consortia: the GENEVA experience. Genet. Epidemiol. 35, 159–173 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  123. Doiron, D., Raina, P., Ferretti, V., L' Heureux, F. & Fortier, I. Facilitating collaborative research: implementing a platform supporting data harmonization and pooling. Nor. Epidemiol. 21, 221–224 (2012).

    Google Scholar 

  124. Wells, B. J., Chagin, K. M., Nowacki, A. S. & Kattan, M. W. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash. DC) 1, 1035 (2013).

    Google Scholar 

  125. Avery, C. L. et al. A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains. PLoS Genet. 7, e1002322 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Plomin, R., Haworth, C. M. A. & Davis, O. S. P. Common disorders are quantitative traits. Nat. Rev. Genet. 10, 872–878 (2009).

    Article  CAS  PubMed  Google Scholar 

  127. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Muthalagu, A. et al. A rigorous algorithm to detect and clean inaccurate adult height records within EHR systems. Appl. Clin. Inform. 5, 118–126 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Wells, Q., Farber-Eger, E. & Crawford, D. Extraction of echocardiographic data from the electronic medical record is a rapid and efficient method for study of cardiac structure and function. J. Clin. Bioinforma. 4, 12 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  131. National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Final Report. Circulation 106, 3143–3421 (2002).

  132. Uzuner, O., Goldstein, I., Luo, Y. & Kohane, I. Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc. 15, 14–24 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  133. Kravets, N. & Parker, J. D. Linkage of the Third National Health and Nutrition Examination Survey to air quality data. Vital Health Stat 2 149, 1–16, (2008).

    Google Scholar 

  134. Parker, J. D., Kravets, N., Nachman, K. & Sapkota, A. Linkage of the 1999–2008 National Health and Nutrition Examination Surveys to traffic indicators from the National Highway Planning Network. Natl Health Stat. Rep. 45, 1–16 (2012).

    Google Scholar 

  135. McCarty, C. et al. Validation of PhenX measures in the personalized medicine research project for use in gene/environment studies. BMC Medical Genomics 7, 3 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  136. Strobush, L. et al. Dietary intake in the Personalized Medicine Research Project: a resource for studies of gene-diet interaction. Nutr. J. 10, 13 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Roth, C., Foraker, R., Payne, P. & Embi, P. Community-level determinants of obesity: harnessing the power of electronic health records for retrospective data analysis. BMC Med. Inform. Decis. Mak. 14, 36 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  138. Schwartz, B. S. et al. Body mass index and the built and social environments in children and adolescents using electronic health records. Am. J. Prev. Med. 41, e17–e28 (2011).

    Article  PubMed  Google Scholar 

  139. Hall, M. A. et al. Environment-wide association study (EWAS) for type 2 diabetes in the Marshfield Personalized Medicine Research Project Biobank. Pac. Symp. Biocomput. 2014, 200–211 (2014).

    Google Scholar 

  140. Patel, C. J., Bhattacharya, J. & Butte, A. J. An environment-wide association study (EWAS) on type 2 diabetes mellitus. PLoS ONE 5, e10746 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  141. Patel, C., Chen, R., Kodama, K., Ioannidis, J. & Butte, A. Systematic identification of interaction effects between genome- and environment-wide associations in type 2 diabetes mellitus. Hum. Genet. 132, 495–508 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Patel, C. J. & Manrai, A. K. Development of exposome correlation globes to map out environment-wide associations. Pac. Symp. Biocomput 2015, 231–242 (2015).

    Google Scholar 

  143. Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Singh, A. et al. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J. Biomed. Inform. 53, 220–228 (2015).

    Article  PubMed  Google Scholar 

  145. Sitlani, C. M. et al. Generalized estimating equations for genome-wide association studies using longitudinal phenotype data. Stat. Med. 34, 118–130 (2015).

    Article  PubMed  Google Scholar 

  146. Moore, C. B. et al. Phenome-wide association study relating pretreatment laboratory parameters with human genetic variants in AIDS clinical trails group protocols. Open Forum Infect. Dis. 2, ofu113 (2015).

    Article  CAS  PubMed  Google Scholar 

  147. Xu, H. et al. MedEx: a medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17, 19–24 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Sohn, S. et al. MedXN: an open source medication extraction and normalization tool for clinical text. J. Am. Med. Inform. Assoc. 21, 858–865 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  149. Nelson, S. J., Zeng, K., Kilbourne, J., Powell, T. & Moore, R. Normalized names for clinical drugs: RxNorm at 6 years. J. Am. Med. Inform. Assoc. 18, 441–448 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  150. McCarty C. A., Garber, A., Reeser, J. C., Fost, N. C. & Personalized Medicine Research Project Community Advisory Group and Ethics and Security Advisory Board. Study newsletters, community and ethics advisory boards, and focus group discussions provide ongoing feedback for a large biobank. Am. J. Med. Genet. 155, 737–741 (2011).

    Article  Google Scholar 

  151. Hayden, E. C. Informed consent: a broken contract. Nature 486, 312–314 (2012).

    Article  CAS  PubMed  Google Scholar 

  152. Emanuel, E. J. Reform of clinical research regulations, finally. N. Engl. J. Med. 373, 2296–2299 (2015).

    Article  PubMed  Google Scholar 

  153. Hazin, R. et al. Ethical, legal, and social implications of incorporating genomic information into electronic health records. Genet. Med. 15, 810–816 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  154. Malin, B., Loukides, G., Benitez, K. & Clayton, E. Identifiability in biobanks: models, measures, and mitigation strategies. Hum. Genet. 130, 383–392 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324.

  156. Jarvik, G. P. et al. Return of genomic results to research participants: the floor, the ceiling, and the choices in between. Am. J. Hum. Genet. 94, 818–826 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Fullerton, S. M. et al. Return of individual research results from genome-wide association studies: experience of the Electronic Medical Records and Genomics (eMERGE) Network. Genet. Med. 14, 424–431 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  158. Alipanah, N., Kim, H. & Ohno-Machado, L. Building an ontology of phentoypes for exsiting GWAS studies. AMIA Jt Summits. Transl. Sci. Proc. 2013, 4–8 (2013).

    PubMed  PubMed Central  Google Scholar 

  159. Hsu, C.-N. et al. Learning phenotype mapping for integrating large genetic data. Proceedings of BioNLP 2011 Workshop [online], (2011).

    Google Scholar 

  160. Kohler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–D974 (2014).

    Article  CAS  PubMed  Google Scholar 

  161. Groza, T. et al. The Human Phenotype Ontology: semantic unification of common and rare disease. Am. J. Hum. Genet. 97, 111–124 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Tryka, K. A. et al. NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 42, D975–D979 (2014).

    Article  CAS  PubMed  Google Scholar 

  164. Hamilton, C. M. et al. The PhenX Toolki: get the most from your measures. Am. J. Epidemiol. 174, 253–260 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  165. Pan, H. et al. Using PhenX measures to identify opportunities for cross-study analysis. Hum. Mutat. 33, 849–857 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  166. O'Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7, e34861 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Ferreira, M. A. R. & Purcell, S. M. A multivariate test of association. Bioinformatics 25, 132–133 (2009).

    Article  CAS  PubMed  Google Scholar 

  168. Stephens, M. A unified framework for association analysis with multiple related phenotypes. PLoS ONE 8, e65245 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Klei, L., Luca, D., Devlin, B. & Roeder, K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet. Epidemiol. 32, 9–19 (2008).

    Article  PubMed  Google Scholar 

  170. van der Sluis, S., Posthuma, D. & Dolan, C. V. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 9, e1003235 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  171. Galesloot, T. E., van Steen, K., Kiemeney, L. A.L. M., Janss, L. L. & Vermeulen, S. H. A. Comparison of multivariate genome-wide association methods. PLoS ONE 9, e95923 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  172. Liu, J., Pei, Y., Chris, J. & Deng, H. W. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet. Epidemiol. 33, 217–227 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  173. Precision Medicine Initiative (PMI) Working Group. The precision medicine initiative cohort program — building a research foundation for 21st century medicine. National Institutes of Health [online], (2015).

  174. Riley, W. T., Nilsen, W. J., Manolio, T. A., Masys, D. R. & Lauer, M. News from the NIH: potential contributions of the behavioral and social sciences to the precision medicine initiative. Transl. Behav. Med. 5, 243–246 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  175. Collins, R. What makes UK Biobank special? Lancet 379, 1173–1174 (2012).

    Article  PubMed  Google Scholar 

  176. Crawford, D. C. et al. eMERGEing progress in genomics — the first seven years. Front. Genet. 5, 184 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  177. Hudson, K. L. & Collins, F. S. Bringing the Common Rule into the 21st Century. N. Engl. J. Med. 373, 2293–2296 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dana C. Crawford.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Genotype frequency

Humans are diploid and therefore have two copies of each chromosome, representing maternal and paternal contributions. When a gene or locus is polymorphic (having more than one allele in a population), the genotype represents the maternal and paternal allele at that locus. For example, for a diallelic locus with alleles 'A' and 'a', the possible genotypes at that site are 'AA', 'Aa' and 'aa'. The genotype frequency is therefore the frequency of these combinations in the population.

Genome-wide association studies

(GWAS). Studies wherein common genetic variants across the genome (hundreds of thousands to millions) are each tested for an association with one or a handful of common human diseases or traits.

Cross-phenotype associations

A phenomenon whereby multiple phenotypes are associated with the same gene or genetic variant. Cross-phenotype associations may be due to pleiotropy or other underlying causes.

Pleiotropy

A genetic variant or gene that affects more than one distinct phenotype.

Precision medicine

Often described as prescribing the right drug at the right dose for the first time in an individual patient. In general, precision medicine is the use of multiple data types (genomics, electronic health records, environmental exposures, and so on) to determine the best prevention or treatment options for an individual patient.

Phenome-wide association study

(PheWAS). A study wherein a single genetic variant or a set of genetic variants are tested for an association with an assemblage of human diseases and/or traits (the phenome). The genetic variants often considered in PheWAS already have a known statistical association with a phenotype or are otherwise functional.

Phenome

The set of all phenotypes expressed by a cell, tissue, organ, organism or species.

Regression modelling

A statistical approach to assess the relationship between variables. In genetic association studies, the relationship between genetic polymorphisms (the independent variable) and disease status (the dependent variable) is often assessed using logistic regression, and association with a continuous value (such as blood lipid levels) is often assessed using linear regression.

Multiple testing

The process of using statistical analysis to assess the potential association of a single variant is often formulated as a hypothesis test and has a specified false positive rate (usually 5%). As tens of thousands of such tests may be performed in the analysis of genetic data, adjustments of the P values resulting from assessments of individual variations are required to avoid numerous false positive results — a procedure known as multiple testing correction.

Electronic health record

(EHR). A digital version of a patient's paper medical chart. An EHR can be distinguished from an electronic medical record (EMR) in that EHRs also include information relevant to the total health of the patient as opposed to being limited mostly to diagnosis and treatment of the patient.

Biorepository

A biological materials repository that collects, processes, stores and distributes biospecimens to support future scientific investigation.

Effect sizes

The percentages of genetic variance or risk explained by a specific locus, ranging from less than 1% for many common traits up to 100% for some Mendelian diseases.

eMERGE Network

The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration in the United States of biobanks linked to electronic health records (EHRs). eMERGE was established in 2007 to explore the utility of EHRs in genomic research through funding from the US National Human Genome Research Institute with five biorepositories. The eMERGE Network is now in its third cycle with nine biobanks linked to EHRs, and the research scope has expanded to include domains in genomic medicine implementation.

Billing codes

Codes that are assigned to services rendered in the clinic for reimbursement purposes. Billing codes include diagnostic codes, procedure codes, and pharmaceutical codes to name a few.

Chi-squared tests

Statistical tests that are used to determine whether the observed frequencies are different to those expected. For typical case–control genetic association studies, the frequency of alleles or genotypes at each locus in cases with disease is compared with the frequency of alleles or genotypes at that same locus in controls without disease.

Population stratification

The presence of a systematic difference in allele frequencies between subpopulations from a larger population, possibly owing to different ancestry, especially in the context of association studies. (Population stratification is also referred to as population structure in this context.) If not properly accounted for in association studies, population stratification can lead to spurious associations.

CPT codes

Current procedural terminology (CPT) codes represent medical, surgical and diagnostic services rendered in the clinic for reimbursement purposes. CPT codes differ from International Classification of Disease (ICD) codes in that they are assigned when the service is performed as opposed to being assigned as part of a diagnosis.

PAGE

Population Architecture using Genomics and Epidemiology (PAGE) is a collaborative network of large epidemiological and clinic-based studies with an emphasis on racially or ethnically diverse populations. The PAGE I study was funded in 2008 by the US National Human Genome Research Institute and consisted of four study sites with access to seven epidemiological studies and one clinic-based study. The research focus of PAGE I was the generalization of genome-wide association findings to diverse populations and the identification of environmental modifiers. PAGE is currently in its second cycle with a research focus on genomic discovery in diverse populations for common human diseases including cancers (for example, breast cancer, melanoma or prostate cancer) and cardiovascular disease.

Comorbidity

The co-occurrence of two or more chronic diseases or conditions in a patient.

Electronic phenotyping

The secondary use of electronic health records (EHRs) to define cases, controls, environmental exposures and other covariates for genetic association studies. Electronic phenotyping typically requires semi-automated or automated algorithms for accessing structured and unstructured data in the EHR.

Positive predictive value

(PPV). The PPV is used to calculate the probability that a patient identified as a case is truly a case. The PPV is often calculated as a metric for diagnostic testing. In electronic phenotyping, the PPV is used to assess the performance of algorithms designed to use EHR data to identify cases and controls for downstream genetic association studies.

Allele frequency

A gene or locus can have different forms, termed alleles. Genes or loci with more than one form are said to be polymorphic. The allele frequency is the frequency at which a particular form of the gene or locus is found in the population. If only one form of the gene is found in the population, the locus is said to be monomorphic.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bush, W., Oetjens, M. & Crawford, D. Unravelling the human genome–phenome relationship using phenome-wide association studies. Nat Rev Genet 17, 129–145 (2016). https://doi.org/10.1038/nrg.2015.36

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg.2015.36

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing