Search tips
Search criteria

Results 1-25 (1074136)

Clipboard (0)

Related Articles

1.  An excess of rare genetic variation in ABCE1 among Yorubans and African-American individuals with HIV-1 
Genes and immunity  2009;10(8):715-721.
Signatures of natural selection occur throughout the human genome and can be detected at the sequence level. We have re-sequenced ABCE1, a host candidate gene essential for HIV-1 capsid assembly, in European- (n=23) and African-descent (Yoruban; n=24) reference populations for genetic variation discovery. We identified an excess of rare genetic variation in Yoruban samples, and the resulting Tajima’s D was low (−2.27). The trend of excess rare variation persisted in flanking candidate genes ANAPC10 and OTUD4, suggesting that this pattern of positive selection can be detected across the 184.5kb examined on chromosome 4. Because of ABCE1’s role in HIV-1 replication, we re-sequenced the candidate gene in three small cohorts of HIV-1-infected or resistant individuals. We were able to confirm the excess of rare genetic variation among HIV-1 positive African-American individuals (n=53; Tajima’s D = −2.34). These results highlight the potential importance of ABCE1’s role in infectious diseases such as HIV-1.
PMCID: PMC2829431  PMID: 19657357
ABCE1; African-Americans; single nucleotide polymorphisms; HIV-1
2.  Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility 
Common genetic variations in the IL4 gene have been associated with asthma and atopy in European and Asian populations, but not in African Americans.
Because populations of African descent have increased levels of genetic variation compared to other populations, particularly with respect to low frequency or rare variants, we hypothesized that rare variants in the IL4 gene contribute to the development of asthma in African Americans.
To test this hypothesis, we sequenced the IL4 locus in 72 African Americans with asthma and 70 African American non-asthmatic controls to identify novel and rare polymorphisms in the IL4 gene that may be contributing to asthma susceptibility.
We report an excess of private non-coding SNPs in the subjects with asthma compared to non-asthmatic control subjects (P=0.031). Tajima’s D is significantly more negative in cases (−0.375) compared to controls (−0.073) (P=0.04), reflecting an excess of rare variants in the cases.
Our findings indicate that SNPs at the IL4 locus that are potentially exclusive to African Americans are associated with susceptibility to asthma. Only three of the 26 private SNPs (i.e., SNPs present only in the cases or only in the controls) are tagged by single SNPs on one of the common genotyping platforms used in genome-wide association studies. We also find that most of the private SNPs cannot be reliably imputed, highlighting the importance of sequencing to identify genetic variants contributing to common diseases in African Americans.
PMCID: PMC3984460  PMID: 19910025
Rare variants; Private alleles; Asthma; IL4; IgE; African Americans
3.  Trans-Ethnic Fine-Mapping of Lipid Loci Identifies Population-Specific Signals and Allelic Heterogeneity That Increases the Trait Variance Explained 
Wu, Ying | Waite, Lindsay L. | Jackson, Anne U. | Sheu, Wayne H-H. | Buyske, Steven | Absher, Devin | Arnett, Donna K. | Boerwinkle, Eric | Bonnycastle, Lori L. | Carty, Cara L. | Cheng, Iona | Cochran, Barbara | Croteau-Chonka, Damien C. | Dumitrescu, Logan | Eaton, Charles B. | Franceschini, Nora | Guo, Xiuqing | Henderson, Brian E. | Hindorff, Lucia A. | Kim, Eric | Kinnunen, Leena | Komulainen, Pirjo | Lee, Wen-Jane | Le Marchand, Loic | Lin, Yi | Lindström, Jaana | Lingaas-Holmen, Oddgeir | Mitchell, Sabrina L. | Narisu, Narisu | Robinson, Jennifer G. | Schumacher, Fred | Stančáková, Alena | Sundvall, Jouko | Sung, Yun-Ju | Swift, Amy J. | Wang, Wen-Chang | Wilkens, Lynne | Wilsgaard, Tom | Young, Alicia M. | Adair, Linda S. | Ballantyne, Christie M. | Bůžková, Petra | Chakravarti, Aravinda | Collins, Francis S. | Duggan, David | Feranil, Alan B. | Ho, Low-Tone | Hung, Yi-Jen | Hunt, Steven C. | Hveem, Kristian | Juang, Jyh-Ming J. | Kesäniemi, Antero Y. | Kuusisto, Johanna | Laakso, Markku | Lakka, Timo A. | Lee, I-Te | Leppert, Mark F. | Matise, Tara C. | Moilanen, Leena | Njølstad, Inger | Peters, Ulrike | Quertermous, Thomas | Rauramaa, Rainer | Rotter, Jerome I. | Saramies, Jouko | Tuomilehto, Jaakko | Uusitupa, Matti | Wang, Tzung-Dau | Boehnke, Michael | Haiman, Christopher A. | Chen, Yii-Der I. | Kooperberg, Charles | Assimes, Themistocles L. | Crawford, Dana C. | Hsiung, Chao A. | North, Kari E. | Mohlke, Karen L.
PLoS Genetics  2013;9(3):e1003379.
Genome-wide association studies (GWAS) have identified ∼100 loci associated with blood lipid levels, but much of the trait heritability remains unexplained, and at most loci the identities of the trait-influencing variants remain unknown. We conducted a trans-ethnic fine-mapping study at 18, 22, and 18 GWAS loci on the Metabochip for their association with triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C), respectively, in individuals of African American (n = 6,832), East Asian (n = 9,449), and European (n = 10,829) ancestry. We aimed to identify the variants with strongest association at each locus, identify additional and population-specific signals, refine association signals, and assess the relative significance of previously described functional variants. Among the 58 loci, 33 exhibited evidence of association at P<1×10−4 in at least one ancestry group. Sequential conditional analyses revealed that ten, nine, and four loci in African Americans, Europeans, and East Asians, respectively, exhibited two or more signals. At these loci, accounting for all signals led to a 1.3- to 1.8-fold increase in the explained phenotypic variance compared to the strongest signals. Distinct signals across ancestry groups were identified at PCSK9 and APOA5. Trans-ethnic analyses narrowed the signals to smaller sets of variants at GCKR, PPP1R3B, ABO, LCAT, and ABCA1. Of 27 variants reported previously to have functional effects, 74% exhibited the strongest association at the respective signal. In conclusion, trans-ethnic high-density genotyping and analysis confirm the presence of allelic heterogeneity, allow the identification of population-specific variants, and limit the number of candidate SNPs for functional studies.
Author Summary
Lipid traits are heritable, but many of the DNA variants that influence lipid levels remain unknown. In a genomic region, more than one variant may affect gene expression or function, and the frequencies of these variants can differ across populations. Genotyping densely spaced variants in individuals with different ancestries may increase the chance of identifying variants that affect gene expression or function. We analyzed high-density genotyped variants for association with TG, HDL-C, and LDL-C in African Americans, East Asians, and Europeans. At several genomic regions, we provide evidence that two or more variants can influence lipid traits; across loci, these additional signals increase the proportion of trait variation that can be explained by genes. At some association signals shared across populations, combining data from individuals of different ancestries narrowed the set of likely functional variants. At PCSK9 and APOA5, the data suggest that different variants influence trait levels in different populations. Variants previously reported to alter gene expression or function frequently exhibited the strongest association at those signals. The multiple signals and population-specific characteristics of the loci described here may be shared by genetic loci for other complex traits.
PMCID: PMC3605054  PMID: 23555291
4.  Genetics of Ischaemic Stroke among Persons of Non-European Descent: A Meta-Analysis of Eight Genes Involving ∼ 32,500 Individuals 
PLoS Medicine  2007;4(4):e131.
Ischaemic stroke in persons of European descent has a genetic basis, but whether the stroke-susceptibility alleles, the strength of any association, and the extent of their attributable risks are the same in persons of non-European descent remains unanswered. Whether ethnicity itself has a relevant or substantial contribution on those effect estimates is controversial. Comparative analyses between the ethnic groups may allow general conclusions to be drawn about polygenic disorders.
Methods and Findings
We performed a literature-based systematic review of genetic association studies in stroke in persons of non-European descent. Odds ratios (ORs) and 95% confidence intervals (CIs) were determined for each gene–disease association using fixed and random effect models. We further performed a comparative genetic analysis across the different ethnic groups (including persons of European descent derived from our previous meta-analysis) to determine if genetic risks varied by ethnicity. Following a review of 500 manuscripts, eight candidate gene variants were analysed among 32,431 individuals (12,883 cases and 19,548 controls), comprising mainly Chinese, Japanese, and Korean individuals. Of the eight candidate genes studied, three were associated with ischaemic stroke: the angiotensin I converting enzyme (ACE) insertion/deletion (I/D) polymorphism with a mean OR of 1.90 (95% CI 1.23–2.93) in the Chinese and 1.74 (95% CI 0.88–3.42) in the Japanese; the summary OR for the C677T variant of 5,10-methylenetetrahydrofolate reductase (MTHFR) was 1.18 (95% CI 0.90–1.56) in Chinese and 1.34 (95% CI 0.87–2.06) in Koreans; and the pooled OR for the apolipoprotein E (APOE) gene was 2.18 (95% CI 1.52–3.13) in Chinese and 1.51 (95% CI 0.93–2.45) in Japanese. Comparing the commonly investigated stroke genes among the Asian groups against studies in persons of European descent, we found an absence of any substantial qualitative or quantitative interaction for ORs by ethnicity. However, the number of individuals recruited per study in the studies of persons of non-European descent was significantly smaller compared to studies of persons of European descent, despite a similar number of studies conducted per gene.
These data suggest that genetic associations studied to date for ischaemic stroke among persons of non-European descent are similar to those for persons of European descent. Claims of differences in genetic effects among different ethnic populations for complex disorders such as stroke may be overstated. However, due to the limited number of gene variants evaluated, the relatively smaller number of individuals included in the meta-analyses of persons of non-European descent in stroke, and the possibility of publication bias, the existence of allele variants with differential effects by ethnicity cannot be excluded.
This meta-analysis found that genetic associations so far studied for ischemic stroke among non-Europeans are similar to those found for persons of European descent.
Editors' Summary
A stroke occurs when the blood supply to part of the brain is interrupted, either because a blood vessel supplying the brain becomes blocked or because one ruptures. Strokes are a substantial cause of death and disability worldwide, with most of the burden affecting people living in developed countries. Most strokes fall into a category termed ischemic stroke. This type is caused by blockages in the blood vessels supplying the brain, which can happen when there is a buildup of fatty deposits or clots within the blood vessels. Many of the risk factors for this particular type of stroke are affected by an individual's behavior, including for example smoking, high blood pressure, diabetes, inactivity, and so on. In addition, variations in an individual's genetic makeup might affect his or her chance of having a stroke. Previous research studies have shown that variants in many different genes are likely to be involved in determining the overall risk of having a stroke, each variant contributing in a small way to the risk.
Why Was This Study Done?
The group performing this study had previously carried out a systematic review of existing research, looking specifically at the genetics of ischemic stroke among people of European origin (often called “Caucasians”). However, it was not obvious whether the genetic risk factors for stroke they found would be the same for people from a different ethnic background. Therefore the research group wanted to find out what the genetic risk factors were for stroke among people of non-European origin and to compare these findings with those of their previous systematic review. This research might help to find out whether the genetic risk factors for stroke were different in people from different parts of the world.
What Did the Researchers Do and Find?
As a starting point, these researchers wanted to find all the different studies that had already been carried out examining the effect of genetic risk factors on stroke among people of non-European origin. To do this, searches were carried out of electronic databases using a particular set of terms. All resulting studies that involved genetic research in people of non-European origin and in which strokes were confirmed by brain scanning were then evaluated in more detail. The findings of different studies were combined if at least three studies were available for the same genetic variant. Eventually 60 studies were found that looked at the association between eight specific gene variants and stroke. The only data that could be included in a combined analysis came from Chinese, Japanese, and Korean populations. Three of the eight gene variants were associated with an increased risk of stroke. Those three gene variants were ACE I/D (a variant in the gene coding for angiotensin 1-converting enzyme, which is involved in controlling blood pressure); a variant in MTHFR (which codes for the enzyme methylenetetrahydrofolate reductase, and which converts certain amino acids within cells); and a variant in the gene APOE, which codes for a protein that plays a role in breaking down fats. The researchers then compared their findings from this study with the findings of a previous systematic review they had carried out among people of European origin. Overall, each gene studied seemed to have a similar effect in the different populations, with the exception of APOE, which seemed to be associated with stroke in the Asian studies but not in the studies from people of non-European origin. The researchers also found that generally the Asian studies suggested a slightly greater effect of each gene variant than the studies in people of non-European origin did.
What Do These Findings Mean?
These findings suggest that, with the possible exception of APOE, similar gene variants play a role in determining stroke risk in people of European origin and Asian populations. Although generally the studies examined here suggested a slightly greater effect of these gene variants in Asian populations, this is not necessarily a real finding. This greater effect may just be due to small-study bias. Small-study bias describes the observation that small research studies are more likely to produce a false positive result than are large research studies. Therefore, future studies that examine the genetic basis of stroke should recruit much larger numbers of participants from populations made up of people of non-European origin than has previously been the case.
Additional Information.
Please access these Web sites via the online version of this summary at
Health Encyclopedia entry on stroke from NHS Direct (UK National Health Service patient information)
Stroke Information page from the National Institute of Neurological Disorders and Stroke (provided by the US National Institutes of Health)
The Stroke Association, a UK charity funding this study
Information from the World Health Organization on the distribution and burden of stroke worldwide
The WHO has a world atlas of heart disease and stroke
PMCID: PMC1876409  PMID: 17455988
5.  Signatures of Natural Selection at the FTO (Fat Mass and Obesity Associated) Locus in Human Populations 
PLoS ONE  2015;10(2):e0117093.
Background and aims
Polymorphisms in the first intron of FTO have been robustly replicated for associations with obesity. In the Sorbs, a Slavic population resident in Germany, the strongest effect on body mass index (BMI) was found for a variant in the third intron of FTO (rs17818902). Since this may indicate population specific effects of FTO variants, we initiated studies testing FTO for signatures of selection in vertebrate species and human populations.
First, we analyzed the coding region of 35 vertebrate FTO orthologs with Phylogenetic Analysis by Maximum Likelihood (PAML, ω = dN/dS) to screen for signatures of selection among species. Second, we investigated human population (Europeans/CEU, Yoruba/YRI, Chinese/CHB, Japanese/JPT, Sorbs) SNP data for footprints of selection using DnaSP version 4.5 and the Haplotter/PhaseII. Finally, using ConSite we compared transcription factor (TF) binding sites at sequences harbouring FTO SNPs in intron three.
PAML analyses revealed strong conservation in coding region of FTO (ω<1). Sliding-window results from population genetic analyses provided highly significant (p<0.001) signatures for balancing selection specifically in the third intron (e.g. Tajima’s D in Sorbs = 2.77). We observed several alterations in TF binding sites, e.g. TCF3 binding site introduced by the rs17818902 minor allele.
Population genetic analysis revealed signatures of balancing selection at the FTO locus with a prominent signal in intron three, a genomic region with strong association with BMI in the Sorbs. Our data support the hypothesis that genes associated with obesity may have been under evolutionary selective pressure.
PMCID: PMC4315420  PMID: 25647475
6.  A Neutrality Test for Detecting Selection on DNA Methylation Using Single Methylation Polymorphism Frequency Spectrum 
Genome Biology and Evolution  2014;7(1):154-171.
Inheritable epigenetic mutations (epimutations) can contribute to transmittable phenotypic variation. Thus, epimutations can be subject to natural selection and impact the fitness and evolution of organisms. Based on the framework of the modified Tajima’s D test for DNA mutations, we developed a neutrality test with the statistic “Dm” to detect selection forces on DNA methylation mutations using single methylation polymorphisms. With computer simulation and empirical data analysis, we compared the Dm test with the original and modified Tajima’s D tests and demonstrated that the Dm test is suitable for detecting selection on epimutations and outperforms original/modified Tajima’s D tests. Due to the higher resetting rate of epimutations, the interpretation of Dm on epimutations and Tajima’s D test on DNA mutations could be different in inferring natural selection. Analyses using simulated and empirical genome-wide polymorphism data suggested that genes under genetic and epigenetic selections behaved differently. We applied the Dm test to recently originated Arabidopsis and human genes, and showed that newly evolved genes contain higher level of rare epialleles, suggesting that epimutation may play a role in origination and evolution of genes and genomes. Overall, we demonstrate the utility of the Dm test to detect whether the loci are under selection regarding DNA methylation. Our analytical metrics and methodology could contribute to our understanding of evolutionary processes of genes and genomes in the field of epigenetics. The Perl script for the “Dm” test is available at (last accessed December 18, 2014).
PMCID: PMC4316624  PMID: 25539727
epigenetics; epimutation; neutrality test; single methylation polymorphism; site frequency; Tajima’s D
7.  Balancing Selection on a Regulatory Region Exhibiting Ancient Variation That Predates Human–Neandertal Divergence 
PLoS Genetics  2013;9(4):e1003404.
Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10−15). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human–Neandertal divergence and is evolving under balancing selection, especially among European populations.
Author Summary
Natural selection shapes the genome in a non-random way, as an allele that contributes more to the reproductive fitness of a species increases in frequency within the population. Under balancing selection, a particular kind of natural selection, more than one allele increases in frequency in the population, likely due to a reproductive advantage of individuals carrying both alleles. Only a handful of loci have been well documented to evolve under balancing selection, with the HBB gene (sickle cell locus) being the best studied. Here, we report a non-coding (but putatively functional) locus that has maintained two divergent alleles in the human population since before the Human–Neandertal divergence and is therefore likely to be under balancing selection. These findings also provide a clear example for ancient African substructure.
PMCID: PMC3623772  PMID: 23593015
8.  Efficient Utilization of Rare Variants for Detection of Disease-Related Genomic Regions 
PLoS ONE  2010;5(12):e14288.
When testing association between rare variants and diseases, an efficient analytical approach involves considering a set of variants in a genomic region as the unit of analysis. One factor complicating this approach is that the vast majority of rare variants in practical applications are believed to represent background neutral variation. As a result, analyzing a single set with all variants may not represent a powerful approach. Here, we propose two alternative strategies. In the first, we analyze the subsets of rare variants exhaustively. In the second, we categorize variants selectively into two subsets: one in which variants are overrepresented in cases, and the other in which variants are overrepresented in controls. When the proportion of neutral variants is moderate to large we show, by simulations, that the both proposed strategies improve the statistical power over methods analyzing a single set with total variants. When applied to a real sequencing association study, the proposed methods consistently produce smaller p-values than their competitors. When applied to another real sequencing dataset to study the difference of rare allele distributions between ethnic populations, the proposed methods detect the overrepresentation of variants between the CHB (Chinese Han in Beijing) and YRI (Yoruba people of Ibadan) populations with small p-values. Additional analyses suggest that there is no difference between the CHB and CHD (Chinese Han in Denver) datasets, as expected. Finally, when applied to the CHB and JPT (Japanese people in Tokyo) populations, existing methods fail to detect any difference, while it is detected by the proposed methods in several regions.
PMCID: PMC3000820  PMID: 21170328
9.  Genetic Differences between the Determinants of Lipid Profile Phenotypes in African and European Americans: The Jackson Heart Study 
PLoS Genetics  2009;5(1):e1000342.
Genome-wide association analysis in populations of European descent has recently found more than a hundred genetic variants affecting risk for common disease. An open question, however, is how relevant the variants discovered in Europeans are to other populations. To address this problem for cardiovascular phenotypes, we studied a cohort of 4,464 African Americans from the Jackson Heart Study (JHS), in whom we genotyped both a panel of 12 recently discovered genetic variants known to predict lipid profile levels in Europeans and a panel of up to 1,447 ancestry informative markers allowing us to determine the African ancestry proportion of each individual at each position in the genome. Focusing on lipid profiles—HDL-cholesterol (HDL-C), LDL-cholesterol (LDL-C), and triglycerides (TG)—we identified the lipoprotein lipase (LPL) locus as harboring variants that account for interethnic variation in HDL-C and TG. In particular, we identified a novel common variant within LPL that is strongly associated with TG (p = 2.7×10−6) and explains nearly 1% of the variability in this phenotype, the most of any variant in African Americans to date. Strikingly, the extensively studied “gain-of-function” S447X mutation at LPL, which has been hypothesized to be the major determinant of the LPL-TG genetic association and is in trials for human gene therapy, has a significantly diminished strength of biological effect when it is found on a background of African rather than European ancestry. These results suggest that there are other, yet undiscovered variants at the locus that are truly causal (and are in linkage disequilibrium with S447X) or that work synergistically with S447X to modulate TG levels. Finally, we find systematically lower effect sizes for the 12 risk variants discovered in European populations on the African local ancestry background in JHS, highlighting the need for caution in the use of genetic variants for risk assessment across different populations.
Author Summary
Single-base changes in DNA can affect biochemical measures, such as blood cholesterol or lipid levels. Such changes or “variants” can be associated with a trait either because they cause the trait or because they are linked to other causal variants. In either case, the associated variant(s) may be useful in predicting the trait. The chromosomes in which DNA is packaged cross over and recombine with each other in each generation, so that in historically separate populations, such as Africans and Europeans, the patterns of genetic linkage between variants differ. In the current study, we analyzed a large group of African Americans, testing genetic variants that had been associated with cholesterol and lipid levels in European-derived populations to assess their predictive value on two different genetic backgrounds within the same cohort. The ability of some variants to predict cholesterol or lipid traits was strongly dependent on genetic background, indicating that they may be tightly linked to other causal variant(s) in European populations and may not, themselves, be directly responsible for trait variability. We conclude that the predictive value of specific variants for risk assessment can differ critically across populations.
PMCID: PMC2613537  PMID: 19148283
10.  From DNA to Fitness Differences: Sequences and Structures of Adaptive Variants of Colias Phosphoglucose Isomerase (PGI) 
Molecular biology and evolution  2005;23(3):499-512.
Colias eurytheme butterflies display extensive allozyme polymorphism in the enzyme phosphoglucose isomerase (PGI). Earlier studies on biochemical and fitness effects of these genotypes found evidence of strong natural selection maintaining this polymorphism in the wild. Here we analyze the molecular features of this polymorphism by sequencing multiple alleles and modeling their structures. PGI is a dimer with rotational symmetry. Each monomer provides a critical residue to the other monomer’s catalytic center. Sequenced alleles differ at multiple amino acid positions, including cryptic charge-neutral variation, but most consistent differences among the electromorph alleles are at the charge-changing amino acid sites. Principal candidate sites of selection, identified by structural and functional analyses and by their variants’ population frequencies, occur in interpenetrating loops across the interface between monomers, where they may alter subunit interactions and catalytic center geometry. Comparison to a second (and basal) species, Colias meadii, also polymorphic for PGI under natural selection, reveals one fixed amino acid difference between their PGIs, which is located in the interpenetrating loop and accompanies functional differences among their variants. We also study nucleotide variability among the PGI alleles, comparing these data to similar data from another glycolytic enzyme gene, glyceraldehyde-3-phosphate dehydrogenase. Despite extensive nonsynonymous and synonymous polymorphism at PGI in each species, the only base changes fixed between species are the two causing the amino acid replacement; this absence of synonymous fixation yields a significant McDonald-Kreitman test. Analyses of these data suggest historical population expansion. Positive peaks of Tajima’s D statistic, representing regions of neutral “hitchhiking,” are found around the principal candidate sites of selection. This study provides novel views of molecular-structural mechanisms, and beginnings of historical evidence, for a long-persistent balanced enzyme polymorphism at PGI in these and perhaps other species.
PMCID: PMC2943955  PMID: 16292000
adaptive evolution; G3PD; balancing selection; dimeric enzyme evolution; molecular tests of selection; structural basis of heterosis
11.  Sequence variation in human succinate dehydrogenase genes: evidence for long-term balancing selection on SDHA 
BMC Biology  2007;5:12.
Balancing selection operating for long evolutionary periods at a locus is characterized by the maintenance of distinct alleles because of a heterozygote or rare-allele advantage. The loci under balancing selection are distinguished by their unusually high polymorphism levels. In this report, we provide statistical and comparative genetic evidence suggesting that the SDHA gene is under long-term balancing selection. SDHA encodes the major catalytical subunit (flavoprotein, Fp) of the succinate dehydrogenase enzyme complex (SDH; mitochondrial complex II). The inhibition of Fp by homozygous SDHA mutations or by 3-nitropropionic acid poisoning causes central nervous system pathologies. In contrast, heterozygous mutations in SDHB, SDHC, and SDHD, the other SDH subunit genes, cause hereditary paraganglioma (PGL) tumors, which show constitutive activation of pathways induced by oxygen deprivation (hypoxia).
We sequenced the four SDH subunit genes (10.8 kb) in 24 African American and 24 European American samples. We also sequenced the SDHA gene (2.8 kb) in 18 chimpanzees. Increased nucleotide diversity distinguished the human SDHA gene from its chimpanzee ortholog and from the PGL genes. Sequence analysis uncovered two common SDHA missense variants and refuted the previous suggestions that these variants originate from different genetic loci. Two highly dissimilar SDHA haplotype clusters were present in intermediate frequencies in both racial groups. The SDHA variation pattern showed statistically significant deviations from neutrality by the Tajima, Fu and Li, Hudson-Kreitman-Aguadé, and Depaulis haplotype number tests. Empirically, the elevated values of the nucleotide diversity (% π = 0.231) and the Tajima statistics (D = 1.954) in the SDHA gene were comparable with the most outstanding cases for balancing selection in the African American population.
The SDHA gene has a strong signature of balancing selection. The SDHA variants that have increased in frequency during human evolution might, by influencing the regulation of cellular oxygen homeostasis, confer protection against certain environmental toxins or pathogens that are prevalent in Africa.
PMCID: PMC1852088  PMID: 17376234
12.  Replication of genetic loci for ages at menarche and menopause in the multi-ethnic Population Architecture using Genomics and Epidemiology (PAGE) study 
Human Reproduction (Oxford, England)  2013;28(6):1695-1706.
Do genetic associations identified in genome-wide association studies (GWAS) of age at menarche (AM) and age at natural menopause (ANM) replicate in women of diverse race/ancestry from the Population Architecture using Genomics and Epidemiology (PAGE) Study?
We replicated GWAS reproductive trait single nucleotide polymorphisms (SNPs) in our European descent population and found that many SNPs were also associated with AM and ANM in populations of diverse ancestry.
Menarche and menopause mark the reproductive lifespan in women and are important risk factors for chronic diseases including obesity, cardiovascular disease and cancer. Both events are believed to be influenced by environmental and genetic factors, and vary in populations differing by genetic ancestry and geography. Most genetic variants associated with these traits have been identified in GWAS of European-descent populations.
A total of 42 251 women of diverse ancestry from PAGE were included in cross-sectional analyses of AM and ANM.
SNPs previously associated with ANM (n = 5 SNPs) and AM (n = 3 SNPs) in GWAS were genotyped in American Indians, African Americans, Asians, European Americans, Hispanics and Native Hawaiians. To test SNP associations with ANM or AM, we used linear regression models stratified by race/ethnicity and PAGE sub-study. Results were then combined in race-specific fixed effect meta-analyses for each outcome. For replication and generalization analyses, significance was defined at P < 0.01 for ANM analyses and P < 0.017 for AM analyses.
We replicated findings for AM SNPs in the LIN28B locus and an intergenic region on 9q31 in European Americans. The LIN28B SNPs (rs314277 and rs314280) were also significantly associated with AM in Asians, but not in other race/ethnicity groups. Linkage disequilibrium (LD) patterns at this locus varied widely among the ancestral groups. With the exception of an intergenic SNP at 13q34, all ANM SNPs replicated in European Americans. Three were significantly associated with ANM in other race/ethnicity populations: rs2153157 (6p24.2/SYCP2L), rs365132 (5q35/UIMC1) and rs16991615 (20p12.3/MCM8). While rs1172822 (19q13/BRSK1) was not significant in the populations of non-European descent, effect sizes showed similar trends.
Lack of association for the GWAS SNPs in the non-European American groups may be due to differences in locus LD patterns between these groups and the European-descent populations included in the GWAS discovery studies; and in some cases, lower power may also contribute to non-significant findings.
The discovery of genetic variants associated with the reproductive traits provides an important opportunity to elucidate the biological mechanisms involved with normal variation and disorders of menarche and menopause. In this study we replicated most, but not all reported SNPs in European descent populations and examined the epidemiologic architecture of these early reported variants, describing their generalizability and effect size across differing ancestral populations. Such data will be increasingly important for prioritizing GWAS SNPs for follow-up in fine-mapping and resequencing studies, as well as in translational research.
The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI), supported by U01HG004803 (CALiCo), U01HG004798 (EAGLE), U01HG004802 (MEC), U01HG004790 (WHI) and U01HG004801 (Coordinating Center), and their respective NHGRI ARRA supplements. The authors report no conflicts of interest.
PMCID: PMC3657124  PMID: 23508249
menopause; menarche; genome-wide association study; race/ethnicity; single nucleotide polymorphism
13.  Rare SERINC2 variants are specific for alcohol dependence in subjects of European descent 
Pharmacogenetics and genomics  2013;23(8):395-402.
We previously reported a top-ranked risk gene [i.e., serine incorporator 2 gene (SERINC2)] for alcohol dependence in the subjects of European descent by analyzing the common variants in a genome-wide association study. In the present study, we comprehensively examined the rare variants [minor allele frequency (MAF) < 0.05] in the NKAIN1-SERINC2 region, in order to confirm our previous finding.
A discovery sample (1,409 European-American cases with alcohol dependence and 1,518 European-American controls) and a replication sample (6,438 European-Australian family subjects with 1,645 alcohol dependent probands) underwent association analysis. A total of 39,903 subjects from 19 other cohorts with 11 different neuropsychiatric and neurological disorders served as contrast groups. The entire NKAIN1-SERINC2 region was imputed in all cohorts using the same reference panels of genotypes that included rare variants from the whole-genome sequencing data. We stringently cleaned the phenotype and genotype data, and obtained a total of about 220 SNPs in the subjects with European descent and about 450 SNPs in the subjects with African descent with 0
Using a weighted regression analysis implemented in the program SCORE-Seq, we found a rare variant constellation across the entire NKAIN1-SERINC2 region that was associated with alcohol dependence in European-Americans (Fp: overall p=1.8×10−4; VT: overall p=1.4×10−4; Collapsing p=6.5×10−5) and European-Australians (Fp: overall p=0.028; Collapsing p=0.025), but not African-Americans, and not associated with any other disorder examined. Association signals in this region came mainly from SERINC2, a gene that codes for an activity-regulated protein expressed in brain that incorporates serine into lipids. Additionally, 26 individual rare variants were nominally associated with alcohol dependence in European-Americans (p<0.05). The associations of 5 of these rare variants that lay within SERINC2 exhibited region-wide significance (p<α=0.0006); and 25 associations survived correction for false discovery rate (q<0.05). The associations of 2 rare variants at SERINC2 were replicated in European-Australians (p<0.05).
We concluded that SERINC2 was a replicable and significant risk gene specific for alcohol dependence in the subjects of European descent.
PMCID: PMC4287355  PMID: 23778322
SERINC2; alcohol dependence; rare variant constellations; European descent; association
PLoS Genetics  2010;6(10):e1001156.
There is solid evidence that rare variants contribute to complex disease etiology. Next-generation sequencing technologies make it possible to uncover rare variants within candidate genes, exomes, and genomes. Working in a novel framework, the kernel-based adaptive cluster (KBAC) was developed to perform powerful gene/locus based rare variant association testing. The KBAC combines variant classification and association testing in a coherent framework. Covariates can also be incorporated in the analysis to control for potential confounders including age, sex, and population substructure. To evaluate the power of KBAC: 1) variant data was simulated using rigorous population genetic models for both Europeans and Africans, with parameters estimated from sequence data, and 2) phenotypes were generated using models motivated by complex diseases including breast cancer and Hirschsprung's disease. It is demonstrated that the KBAC has superior power compared to other rare variant analysis methods, such as the combined multivariate and collapsing and weight sum statistic. In the presence of variant misclassification and gene interaction, association testing using KBAC is particularly advantageous. The KBAC method was also applied to test for associations, using sequence data from the Dallas Heart Study, between energy metabolism traits and rare variants in ANGPTL 3,4,5 and 6 genes. A number of novel associations were identified, including the associations of high density lipoprotein and very low density lipoprotein with ANGPTL4. The KBAC method is implemented in a user-friendly R package.
Author Summary
It has been demonstrated that both rare and common variants are involved in complex disease etiology. Until recently it was only possible to perform large scale analysis of common variants. With the development of next-generation sequencing technologies, detection and mapping of rare variants have been made possible. However, methods used to analyze common variants are not powerful for the analysis of rare variants. To address the problems of rare variant analysis working in a novel framework, the kernel-based adaptive cluster (KBAC) method was developed to perform gene/locus based analysis. The KBAC combines variant classification and association testing in a coherent framework. Through simulations motivated by population genetic and disease data, it is demonstrated that the KBAC has superior power to other rare variant analysis methods, especially in the presence of variant misclassification and gene interaction. Using data from the Dallas Heart Study, the KBAC method was applied to test for associations between energy metabolism traits and rare variants in ANGPTL 3,4,5 and 6 genes. A number of novel associations were identified. The KBAC method is implemented in a user-friendly R package.
PMCID: PMC2954824  PMID: 20976247
PLoS Genetics  2010;6(8):e1001078.
It has been recently hypothesized that many of the signals detected in genome-wide association studies (GWAS) to T2D and other diseases, despite being observed to common variants, might in fact result from causal mutations that are rare. One prediction of this hypothesis is that the allelic associations should be population-specific, as the causal mutations arose after the migrations that established different populations around the world. We selected 19 common variants found to be reproducibly associated to T2D risk in European populations and studied them in a large multiethnic case-control study (6,142 cases and 7,403 controls) among men and women from 5 racial/ethnic groups (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In analysis pooled across ethnic groups, the allelic associations were in the same direction as the original report for all 19 variants, and 14 of the 19 were significantly associated with risk. In summing the number of risk alleles for each individual, the per-allele associations were highly statistically significant (P<10−4) and similar in all populations (odds ratios 1.09–1.12) except in Japanese Americans the estimated effect per allele was larger than in the other populations (1.20; Phet = 3.8×10−4). We did not observe ethnic differences in the distribution of risk that would explain the increased prevalence of type 2 diabetes in these groups as compared to European Americans. The consistency of allelic associations in diverse racial/ethnic groups is not predicted under the hypothesis of Goldstein regarding “synthetic associations” of rare mutations in T2D.
Author Summary
Single rare causal alleles and/or collections of multiple rare alleles have been suggested to create “synthetic associations” with common variants in genome-wide association studies (GWAS). This model predicts that associations with common variants will not be consistent across populations. In this study, we examined 19 T2D variants for association with T2D risk in 6,142 cases and 7,403 controls from five racial/ethnic populations in the Multiethnic Cohort (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In racial/ethnic pooled analysis, all 19 variants were associated with T2D risk in the same direction as previous reports in Europeans, and the sum total of risk variants was significantly associated with T2D risk in each racial/ethnic group. The consistent associations across populations do not support the Goldstein hypothesis that rare causal alleles underlie GWAS signals. We also did not find evidence that these markers underlie racial/ethnic disparities in T2D prevalence. Large-scale GWAS and sequencing studies in these populations are necessary in order to both improve the current set of markers at these risk loci and identify new risk variants for T2D that may be difficult, or impossible, to detect in European populations.
PMCID: PMC2928808  PMID: 20865176
PLoS Genetics  2013;9(12):e1003959.
Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.
Author Summary
Low frequency variants are likely to play an important role in uncovering complex trait heritability; however, they are often continent or population specific. This specificity complicates genetic analyses investigating low frequency variants for two reasons: low frequency variant signals in an association test are often difficult to generalize beyond a single population or continental group, and there is an increase in false positive results in association analyses due to underlying population stratification. In order to reveal the magnitude of low frequency population stratification, we performed pairwise population comparisons using the 1000 Genomes Project Phase I data to investigate differences in low frequency variant burden across multiple biological features. We found that low frequency variant confounding is much more prevalent than one might expect, even within continental groups. The proportion of significant differences in low frequency variant burden was also dependent on the region of interest; for example, annotated regulatory regions showed fewer low frequency burden differences between populations than intergenic regions. Knowledge of population structure and the genomic landscape in a region of interest are important factors in determining the extent of confounding due to population stratification in a low frequency genomic analysis.
PMCID: PMC3873241  PMID: 24385916
PLoS Genetics  2006;2(3):e27.
The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90–120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies.
The recent completion of the Haplotype Map (HapMap) project of the human genome provides considerable information on the patterns of variation in the genome of four populations. One of the applications is a description of a set of tags that act as proxies for many other surrounding variants. This will greatly help researchers in their quest to find complex disease genes by reducing the number of genetic variants to test in association studies. To evaluate its usefulness, several aspects of the map, including its transferability to other populations, still needed to be verified experimentally. Using genomic regions where variants had been thoroughly documented in Caucasian samples from Estonia, the researchers found that the transferability of tags is extremely good. The researchers also found that variants with low frequency in the general population (i.e., less than 5%) could not be accurately captured with tags, and that the regional density of variants in the HapMap project had a major impact on the performance of the tags. This research indicates that the HapMap project will be useful, but that careful consideration of hypotheses and study design will be essential for the success of association studies.
PMCID: PMC1391920  PMID: 16532062
PLoS Genetics  2011;7(6):e1002138.
For the past five years, genome-wide association studies (GWAS) have identified hundreds of common variants associated with human diseases and traits, including high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglyceride (TG) levels. Approximately 95 loci associated with lipid levels have been identified primarily among populations of European ancestry. The Population Architecture using Genomics and Epidemiology (PAGE) study was established in 2008 to characterize GWAS–identified variants in diverse population-based studies. We genotyped 49 GWAS–identified SNPs associated with one or more lipid traits in at least two PAGE studies and across six racial/ethnic groups. We performed a meta-analysis testing for SNP associations with fasting HDL-C, LDL-C, and ln(TG) levels in self-identified European American (∼20,000), African American (∼9,000), American Indian (∼6,000), Mexican American/Hispanic (∼2,500), Japanese/East Asian (∼690), and Pacific Islander/Native Hawaiian (∼175) adults, regardless of lipid-lowering medication use. We replicated 55 of 60 (92%) SNP associations tested in European Americans at p<0.05. Despite sufficient power, we were unable to replicate ABCA1 rs4149268 and rs1883025, CETP rs1864163, and TTC39B rs471364 previously associated with HDL-C and MAFB rs6102059 previously associated with LDL-C. Based on significance (p<0.05) and consistent direction of effect, a majority of replicated genotype-phentoype associations for HDL-C, LDL-C, and ln(TG) in European Americans generalized to African Americans (48%, 61%, and 57%), American Indians (45%, 64%, and 77%), and Mexican Americans/Hispanics (57%, 56%, and 86%). Overall, 16 associations generalized across all three populations. For the associations that did not generalize, differences in effect sizes, allele frequencies, and linkage disequilibrium offer clues to the next generation of association studies for these traits.
Author Summary
Low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride (TG) levels are well known independent risk factors for cardiovascular disease. Lipid-associated genetic variants are being discovered in genome-wide association studies (GWAS) in samples of European descent, but an insufficient amount of data exist in other populations. Therefore, there is a strong need to characterize the effect of these GWAS–identified variants in more diverse cohorts. In this study, we selected over forty genetic loci previously associated with lipid levels and tested for replication in a large European American cohort. We also investigated if the effect of these variants generalizes to non-European descent populations, including African Americans, American Indians, and Mexican Americans/Hispanics. A majority of these GWAS–identified associations replicated in our European American cohort. However, the ability of associations to generalize across other racial/ethnic populations varied greatly, indicating that some of these GWAS–identified variants may not be functional and are more likely to be in linkage disequilibrium with the functional variant(s).
PMCID: PMC3128106  PMID: 21738485
BMC Medical Genomics  2013;6(Suppl 2):S6.
With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways.
We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls.
The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study.
We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.
PMCID: PMC3654874  PMID: 23819467
PLoS Genetics  2013;9(8):e1003694.
Multiple rare variants either within or across genes have been hypothesised to collectively influence complex human traits. The increasing availability of high throughput sequencing technologies offers the opportunity to study the effect of rare variants on these traits. However, appropriate and computationally efficient analytical methods are required to account for collections of rare variants that display a combination of protective, deleterious and null effects on the trait. We have developed a novel method for the analysis of rare genetic variation in a gene, region or pathway that, by simply aggregating summary statistics at each variant, can: (i) test for the presence of a mixture of effects on a trait; (ii) be applied to both binary and quantitative traits in population-based and family-based data; (iii) adjust for covariates to allow for non-genetic risk factors and; (iv) incorporate imputed genetic variation. In addition, for preliminary identification of promising genes, the method can be applied to association summary statistics, available from meta-analysis of published data, for example, without the need for individual level genotype data. Through simulation, we show that our method is immune to the presence of bi-directional effects, with no apparent loss in power across a range of different mixtures, and can achieve greater power than existing approaches as long as summary statistics at each variant are robust. We apply our method to investigate association of type-1 diabetes with imputed rare variants within genes in the major histocompatibility complex using genotype data from the Wellcome Trust Case Control Consortium.
Author Summary
Rapid advances in sequencing technology mean that it is now possible to directly assay rare genetic variation. In addition, the availability of almost fully sequenced human genomes by the 1000 Genomes Project allows genotyping at rare variants that are not present on arrays commonly used in genome-wide association studies. Rare variants within a gene or region may act to collectively influence a complex trait. Methods for testing these rare variants should be able to account for a combination of those that serve to either increase, decrease or have no effect on the trait of interest. Here, we introduce a method for the analysis of a collection of rare genetic variants, within a gene or region, which assesses evidence for a mixture of effects. Our method simply aggregates summary statistics at each variant and, as such, can be applied to both population and family-based data, to binary or quantitative traits and to either directly genotyped or imputed data. In addition, it does not require individual level genotype or phenotype data, and can be adjusted for non-genetic risk factors. We illustrate our approach by examining imputed rare variants in the major histocompatibility complex for association with type-1 diabetes using genotype data from the Wellcome Trust case Control Consortium.
PMCID: PMC3744430  PMID: 23966874
Environmental Health Perspectives  2009;117(10):1541-1548.
The human CYP3A gene cluster codes for cytochrome P450 (CYP) subfamily enzymes that catalyze the metabolism of various exogenous and endogenous chemicals and is an obvious candidate for evolutionary and environmental genomic study. Functional variants in the CYP3A locus may have undergone a selective sweep in response to various environmental conditions.
The goal of this study was to profile the allelic structure across the human CYP3A locus and investigate natural selection on that locus.
From the CYP3A locus spanning 231 kb, we resequenced 54 genomic DNA fragments (a total of 43,675 bases) spanning four genes (CYP3A4, CYP3A5, CYP3A7, and CYP3A43) and two pseudogenes (CYP3AP1 and CYP3AP2), and randomly selected intergenic regions at the CYP3A locus in Africans (24 individuals), Caucasians (24 individuals), and Chinese (29 individuals). We comprehensively investigated the nucleotide diversity and haplotype structure and examined the possible role of natural selection in shaping the sequence variation throughout the gene cluster.
Neutrality tests with Tajima’s D, Fu and Li’s D* and F*, and Fay and Wu’s H indicated possible roles of positive selection on the entire CYP3A locus in non-Africans. Sliding-window analyses of nucleotide diversity and frequency spectrum, as well as haplotype diversity and phylogenetically inferred haplotype structure, revealed that CYP3A4 and CYP3A7 had recently undergone or were undergoing a selective sweep in all three populations, whereas CYP3A43 and CYP3A5 were undergoing a selective sweep in non-Africans and Caucasians, respectively.
The refined allelic architecture and selection spectrum for the human CYP3A locus highlight that evolutionary dynamics of molecular adaptation may underlie the phenotypic variation of the xenobiotic disposition system and varied predisposition to complex disorders in which xenobiotics play a role.
PMCID: PMC2790508  PMID: 20019904
CYP3A; environmental genomics; genetic polymorphism; positive selection
Heredity  2010;106(5):775-787.
Nucleotide polymorphism at 12 nuclear loci was studied in Scots pine populations across an environmental gradient in Scotland, to evaluate the impacts of demographic history and selection on genetic diversity. At eight loci, diversity patterns were compared between Scottish and continental European populations. At these loci, a similar level of diversity (θsil=∼0.01) was found in Scottish vs mainland European populations, contrary to expectations for recent colonization, however, less rapid decay of linkage disequilibrium was observed in the former (ρ=0.0086±0.0009, ρ=0.0245±0.0022, respectively). Scottish populations also showed a deficit of rare nucleotide variants (multi-locus Tajima's D=0.316 vs D=−0.379) and differed significantly from mainland populations in allelic frequency and/or haplotype structure at several loci. Within Scotland, western populations showed slightly reduced nucleotide diversity (πtot=0.0068) compared with those from the south and east (0.0079 and 0.0083, respectively) and about three times higher recombination to diversity ratio (ρ/θ=0.71 vs 0.15 and 0.18, respectively). By comparison with results from coalescent simulations, the observed allelic frequency spectrum in the western populations was compatible with a relatively recent bottleneck (0.00175 × 4Ne generations) that reduced the population to about 2% of the present size. However, heterogeneity in the allelic frequency distribution among geographical regions in Scotland suggests that subsequent admixture of populations with different demographic histories may also have played a role.
PMCID: PMC3186241  PMID: 20823905
adaptation; bottleneck; nucleotide diversity; population differentiation; linkage disequilibrium; recolonization
Molecular Biology and Evolution  2014;31(6):1490-1499.
Locally varying selection on pathogens may be due to differences in drug pressure, host immunity, transmission opportunities between hosts, or the intensity of between-genotype competition within hosts. Highly recombining populations of the human malaria parasite Plasmodium falciparum throughout West Africa are closely related, as gene flow is relatively unrestricted in this endemic region, but markedly varying ecology and transmission intensity should cause distinct local selective pressures. Genome-wide analysis of sequence variation was undertaken on a sample of 100 P. falciparum clinical isolates from a highly endemic region of the Republic of Guinea where transmission occurs for most of each year and compared with data from 52 clinical isolates from a previously sampled population from The Gambia, where there is relatively limited seasonal malaria transmission. Paired-end short-read sequences were mapped against the 3D7 P. falciparum reference genome sequence, and data on 136,144 single nucleotide polymorphisms (SNPs) were obtained. Within-population analyses identifying loci showing evidence of recent positive directional selection and balancing selection confirm that antimalarial drugs and host immunity have been major selective agents. Many of the signatures of recent directional selection reflected by standardized integrated haplotype scores were population specific, including differences at drug resistance loci due to historically different antimalarial use between the countries. In contrast, both populations showed a similar set of loci likely to be under balancing selection as indicated by very high Tajima’s D values, including a significant overrepresentation of genes expressed at the merozoite stage that invades erythrocytes and several previously validated targets of acquired immunity. Between-population FST analysis identified exceptional differentiation of allele frequencies at a small number of loci, most markedly for five SNPs covering a 15-kb region within and flanking the gdv1 gene that regulates the early stages of gametocyte development, which is likely related to the extreme differences in mosquito vector abundance and seasonality that determine the transmission opportunities for the sexual stage of the parasite.
PMCID: PMC4032133  PMID: 24644299
pathogen; balancing selection; directional selection; population genomics; immunity; transmission
PLoS Genetics  2011;7(7):e1002144.
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.
Author Summary
The recent availability of almost fully sequenced human genomes by the 1000 genomes project allows the direct study of genetic variants that influence levels of gene expression in the cell. In this study, we explore the effect of rare and common variants on levels of gene expression. We show that the availability of a more comprehensive list of variants brings us closer to the likely causal variants, and we discuss their genomic and evolutionary properties. We also demonstrate the effects of variants that change splicing patterns or length of the protein product, the putative joint impacts of variants that affect gene expression, and those that affect protein structure. Finally, we show the impact of rare regulatory variants that cannot be detected by the conventional methodologies of association and require the interrogation of full genome sequencing and full transcriptome sequencing. These approaches bring us closer to the implementation of these data and methodologies to a direct clinical application.
PMCID: PMC3141000  PMID: 21811411
Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution.
Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies.
We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.
PMCID: PMC3727976  PMID: 23848512

Results 1-25 (1074136)