Abstract
In addition to the differences between populations in transcriptional and translational regulation of genes, alternative pre-mRNA splicing (AS) is also likely to play an important role in regulating gene expression and generating variation in mRNA and protein isoforms. Recently, the genetic contribution to transcript isoform variation has been reported in individuals of recent European descent. We report here results of an investigation of the differences in AS patterns between human populations. AS patterns in 176 HapMap lymphoblastoid cell lines derived from individuals of European and African ancestry were evaluated using the Affymetrix GeneChip® Human Exon 1.0 ST Array. A variety of biological processes such as response to stimulus and transcription were found to be enriched among the differentially spliced genes. The differentially spliced genes also include some involved in human diseases that have different prevalence or susceptibility between populations. The genetic contribution to the population differences in transcript isoform variation was then evaluated by a genome-wide association using the HapMap genotypic data on single nucleotide polymorphisms (SNPs). The results suggest that local and distant genetic variants account for a substantial fraction of the observed transcript isoform variation between human populations. Our findings provide new insights into the complexity of the human genome as well as the health disparities between the two populations.
Similar content being viewed by others
References
Abecasis GR, Cardon LR, Cookson WO (2000a) A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66:279–292
Abecasis GR, Cookson WO, Cardon LR (2000b) Pedigree tests of transmission disequilibrium. Eur J Hum Genet 8:545–551
Affymetrix Inc. (2006) Identifying and validating alternative splicing events. Affymetrix Technical Note
Affymetrix Inc. (2007) Human Gene 1.0 ST Array Performance. Affymetrix GeneChip Gene and Exon Array Whitepaper Collection
Alberts R, Terpstra P, Li Y, Breitling R, Nap JP, Jansen RC (2007) Sequence polymorphisms cause many false cis eQTLs. PLoS ONE 2:e622
Applied Biosystems (2004) Guide to performing relative qualification of gene expression using Real-Time quantitative PCR. Technical Note
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B (57):289–300
Brinkman BM (2004) Splice variants as cancer biomarkers. Clin Biochem 37:584–594
Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33:422–425
Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3
Diabetes Epidemiology Research International Study Group (1988) Geographic patterns of childhood insulin-dependent diabetes mellitus. Diabetes Epidemiology Research International Group. Diabetes 37:1113–1119
Duan S, Huang RS, Zhang W, Bleibel WK, Roe CA, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME (2008a) Genetic architecture of transcript-level variation in humans. Am J Hum Genet 82: 1101–13
Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME (2008b) SNPinProbe_1.0: a database for filtering out probes in the Affymetrix GeneChip® Human Exon 1.0 ST array potentially affected by SNPs. Bioinformation 2:469–470
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868
Faustino NA, Cooper TA (2003) Pre-mRNA splicing and human disease. Genes Dev 17:419–437
Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861
Gardina PJ, Clark TA, Shimada B, Staples MK, Yang Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S, Davies C, Williams A, Turpaz Y (2006) Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array. BMC Genomics 7:325
Gilad Y, Rifkin SA, Bertone P, Gerstein M, White KP (2005) Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles. Genome Res 15:674–680
Huang da W, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA (2007) The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8:R183
Huang RS, Kistner EO, Bleibel WK, Shukla SJ, Dolan ME (2007) Effect of population and gender on chemotherapeutic agent-induced cytotoxicity. Mol Cancer Ther 6:31–36
Hull J, Campino S, Rowlands K, Chan MS, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J, Kwiatkowski D (2007) Identification of common genetic variation that modulates alternative splicing. PLoS Genet 3:e99
International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796
International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320
Ioannidis JP, Ntzani EE, Trikalinos TA (2004) ‘Racial’ differences in genetic effects for complex diseases. Nat Genet 36:1312–1318
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264
Jordan WJ, Eskdale J, Lennon GP, Pestoff R, Wu L, Fine DH, Gallagher G (2005) A non-conservative, coding single-nucleotide polymorphism in the N-terminal region of lactoferrin is associated with aggressive periodontitis in an African-American, but not a Caucasian population. Genes Immun 6:632–635
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32:D277–D280
Kurian AK, Cardarelli KM (2007) Racial and ethnic differences in cardiovascular disease risk factors: a systematic review. Ethn Dis 17:143–152
Kwan T, Benovoy D, Dias C, Gurd S, Serre D, Zuzan H, Clark TA, Schweitzer A, Staples MK, Wang H, Blume JE, Hudson TJ, Sladek R, Majewski J (2007) Heritability of alternative splicing in the human genome. Genome Res 17:1210–1218
Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R, Majewski J (2008) Genome-wide analysis of transcript isoform variation in humans. Nat Genet 40:225–231
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
McGarvey PB, Huang H, Barker WC, Orcutt BC, Garavelli JS, Srinivasarao GY, Yeh LS, Xiao C, Wu CH (2000) PIR: a new resource for bioinformatics. Bioinformatics 16:290–291
McKusick VA (1998) Mendelian inheritance in man. A catalog of human genes and genetic disorders, 12th edn. Johns Hopkins University Press, Baltimore
Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430:743–747
Noble RC, Miller BR (1980) Auxotypes and antimicrobial susceptibilities of Neisseria gonorrhoeae in black and white patients. Br J Vener Dis 56:26–30
Novoyatleva T, Tang Y, Rafalska I, Stamm S (2006) Pre-mRNA missplicing as a cause of human disease. Prog Mol Subcell Biol 44:27–46
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65
R Development Core Team (2005) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365–386
Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J (2003) TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34:374–378
Sorek R, Shamir R, Ast G (2004) How prevalent is functional alternative splicing in the human genome? Trends Genet 20:68–71
Sorof JM, Hawkins EP, Brewer ED, Boydstun II, Kale AS, Powell DR (1998) Age and ethnicity affect the risk and outcome of focal segmental glomerulosclerosis. Pediatr Nephrol 12:764–768
Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39:226–231
Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM (2007) Gene-expression variation within and among human populations. Am J Hum Genet 80:502–509
Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavare S, Deloukas P, Dermitzakis ET (2005) Genome-wide associations of gene expression variation in humans. PLoS Genet 1:e78
Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET (2007) Population genomics of human gene expression. Nat Genet 39:1217–1224
Tessier MC, Qu HQ, Frechette R, Bacot F, Grabs R, Taback SP, Lawson ML, Kirsch SE, Hudson TJ, Polychronakos C (2006) Type 1 diabetes and the OAS gene cluster: association with splicing polymorphism or haplotype? J Med Genet 43:129–132
Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A (2003) PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13:2129–2141
Thorisson GA, Smith AV, Krishnan L, Stein LD (2005) The International HapMap Project Web site. Genome Res 15:1592–1593
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C et al (2001) The sequence of the human genome. Science 291:1304–1351
Westfall PH, Young SS (1993) Resampling-based multiple testing: examples and methods for p-value adjustment. Wiley Publishers, New York
Wright S (1950) Genetical structure of populations. Nature 166:247–249
Zhang W, Dolan ME (2008a) Ancestry-related differences in gene expression: findings may enhance understanding of health disparities between populations. Pharmacogenomics 9:489–492
Zhang W, Dolan ME (2008b) Beyond the HapMap genotypic data: prospects of deep resequencing projects. Curr Bioinform 3
Zhang W, Dolan ME (2008c) On the challenges of the HapMap resource. Bioinformation 2:238–239
Zhang W, Bleibel WK, Roe CA, Cox NJ, Eileen Dolan M (2007) Gender-specific differences in expression in human lymphoblastoid cell lines. Pharmacogenet Genomics 17:447–450
Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME (2008a) Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet 82:631–640
Zhang W, Ratain MJ, Dolan ME (2008b) The HapMap resource is providing new insights into ourselves and its application to pharmacogenomics. Bioinform Biol Insights 2:15–23
Acknowledgments
This Pharmacogenetics of Anticancer Agents Research (PAAR) Group (http://www.pharmacogenetics.org) study was supported by NIH/NIGMS grants U01 GM61393 and U01 GM61374. We are grateful to Dr. Jeong-Ah Kang for maintaining cell lines, Cheryl A. Roe for reviewing the manuscript and Drs. James Fackenthal and Emily Kistner for helpful discussion. T.A.C., T.X.C., A.C.S., and J.E.B. are employees of Affymetrix, Inc.
Author information
Authors and Affiliations
Corresponding author
Additional information
W. Zhang and S. Duan contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
439_2008_601_MOESM1_ESM.xls
Supplemental Table S1. Differentially spliced probesets between the CEU and YRI samples, Supplemental Table S2. Associated SNP’s with the differentially spliced probesets, Supplemental Table S3. Primers used in the validation of differentially spliced probesets (XLS 746 kb)
439_2008_601_MOESM2_ESM.tif
Supplemental Fig. 1 The workflow chart. Exon-level expression of the CEU and YRI trios was measured using theAffymetrix Human Exon 1.0ST array. Splicing index values were compared between the twopopulations to identify differential probesets. Genetic variants associated with the differentialprobesets and enriched pathways and/or Gene Ontology terms were then identified.254x190mm (72 x 72 DPI) (TIFF 110 kb)
439_2008_601_MOESM3_ESM.tif
Supplemental Fig. 2 PS3527423 (PARP2) is differentially spliced among 54 unrelated CEU samples. Blue indicates thepresence of both spliced and unspliced isoforms (300bp and 400bp); Red indicates unsplicedisoforms (400bp); Black indicates spliced isoforms (300bp).254x190mm (72 x 72 DPI) (TIFF 116 kb)
439_2008_601_MOESM4_ESM.tif
Supplemental Fig. 3 Validation of transcript isoform variation between populations using quantitative Real-Time PCR. Therelative expression between the exon of interest (EOI) and the neighboring, housekeeping exon(HKE) was shown. (A) MPRL43 (probeset: 3303658) has a lower ratio (one-tail t test p = 0.02) inCEU, consistent with the trend of SI (CEU SI mean = -0.02; YRI SI mean = 0.06). (B) MTMR4(probeset: 3764493) has a lower ratio (one-tail t test p = 0.05) in CEU consistent with the trend ofSI (CEU SI mean = 0.004; YRI SI mean = 0.024).254x190mm (72 x 72 DPI) (TIFF 64 kb)
Rights and permissions
About this article
Cite this article
Zhang, W., Duan, S., Bleibel, W.K. et al. Identification of common genetic variants that account for transcript isoform variation between human populations. Hum Genet 125, 81–93 (2009). https://doi.org/10.1007/s00439-008-0601-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-008-0601-x