PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1352996)

Clipboard (0)
None

Related Articles

1.  Applying genome-wide gene-based expression quantitative trait locus mapping to study population ancestry and pharmacogenetics 
BMC Genomics  2014;15:319.
Background
Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL).
Results
In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher’s exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher’s exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher’s exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations.
Conclusions
In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
doi:10.1186/1471-2164-15-319
PMCID: PMC4236814  PMID: 24779372
Gene-based approach; Expression quantitative trait locus (eQTL); Partial least squares (PLS); Ancestry-informative marker (AIM); Pharmacogenetics; Adverse drug reaction; Drug response; Drug biotransformation
2.  Investigating the Role of Mitochondrial Haplogroups in Genetic Predisposition to Meningococcal Disease 
PLoS ONE  2009;4(12):e8347.
Background and Aims
Meningococcal disease remains one of the most important infectious causes of death in industrialized countries. The highly diverse clinical presentation and prognosis of Neisseria meningitidis infections are the result of complex host genetics and environmental interactions. We investigated whether mitochondrial genetic background contributes to meningococcal disease (MD) susceptibility.
Methodology/Principal Findings
Prospective controlled study was performed through a national research network on MD that includes 41 Spanish hospitals. Cases were 307 paediatric patients with confirmed MD, representing the largest series of MD patients analysed to date. Two independent sets of ethnicity-matched control samples (CG1 [N = 917]), and CG2 [N = 616]) were used for comparison. Cases and controls underwent mtDNA haplotyping of a selected set of 25 mtDNA SNPs (mtSNPs), some of them defining major European branches of the mtDNA phylogeny. In addition, 34 ancestry informative markers (AIMs) were genotyped in cases and CG2 in order to monitor potential hidden population stratification. Samples of known African, Native American and European ancestry (N = 711) were used as classification sets for the determination of ancestral membership of our MD patients. A total of 39 individuals were eliminated from the main statistical analyses (including fourteen gypsies) on the basis of either non-Spanish self-reported ancestry or the results of AIMs indicating a European membership lower than 95%. Association analysis of the remaining 268 cases against CG1 suggested an overrepresentation of the synonym mtSNP G11719A variant (Pearson's chi-square test; adjusted P-value = 0.0188; OR [95% CI] = 1.63 [1.22–2.18]). When cases were compared with CG2, the positive association could not be replicated. No positive association has been observed between haplogroup (hg) status of cases and CG1/CG2 and hg status of cases and several clinical variants.
Conclusions
We did not find evidence of association between mtSNPs and mtDNA hgs with MD after carefully monitoring the confounding effect of population sub-structure. MtDNA variability is particularly stratified in human populations owing to its low effective population size in comparison with autosomal markers and therefore, special care should be taken in the interpretation of seeming signals of positive associations in mtDNA case-control association studies.
doi:10.1371/journal.pone.0008347
PMCID: PMC2790606  PMID: 20019817
3.  Comparison of measures of marker informativeness for ancestry and admixture mapping 
BMC Genomics  2011;12:622.
Background
Admixture mapping is a powerful gene mapping approach for an admixed population formed from ancestral populations with different allele frequencies. The power of this method relies on the ability of ancestry informative markers (AIMs) to infer ancestry along the chromosomes of admixed individuals. In this study, more than one million SNPs from HapMap databases and simulated data have been interrogated in admixed populations using various measures of ancestry informativeness: Fisher Information Content (FIC), Shannon Information Content (SIC), F statistics (FST), Informativeness for Assignment Measure (In), and the Absolute Allele Frequency Differences (delta, δ). The objectives are to compare these measures of informativeness to select SNP markers for ancestry inference, and to determine the accuracy of AIM panels selected by each measure in estimating the contributions of the ancestors to the admixed population.
Results
FST and In had the highest Spearman correlation and the best agreement as measured by Kappa statistics based on deciles. Although the different measures of marker informativeness performed comparably well, analyses based on the top 1 to 10% ranked informative markers of simulated data showed that In was better in estimating ancestry for an admixed population.
Conclusions
Although millions of SNPs have been identified, only a small subset needs to be genotyped in order to accurately predict ancestry with a minimal error rate in a cost-effective manner. In this article, we compared various methods for selecting ancestry informative SNPs using simulations as well as SNP genotype data from samples of admixed populations and showed that the In measure estimates ancestry proportion (in an admixed population) with lower bias and mean square error.
doi:10.1186/1471-2164-12-622
PMCID: PMC3276602  PMID: 22185208
4.  Evaluation of approaches for identifying population informative markers from high density SNP Chips 
BMC Genetics  2011;12:45.
Background
Genetic markers can be used to identify and verify the origin of individuals. Motivation for the inference of ancestry ranges from conservation genetics to forensic analysis. High density assays featuring Single Nucleotide Polymorphism (SNP) markers can be exploited to create a reduced panel containing the most informative markers for these purposes. The objectives of this study were to evaluate methods of marker selection and determine the minimum number of markers from the BovineSNP50 BeadChip required to verify the origin of individuals in European cattle breeds. Delta, Wright's FST, Weir & Cockerham's FST and PCA methods for population differentiation were compared. The level of informativeness of each SNP was estimated from the breed specific allele frequencies. Individual assignment analysis was performed using the ranked informative markers. Stringency levels were applied by log-likelihood ratio to assess the confidence of the assignment test.
Results
A 95% assignment success rate for the 384 individually genotyped animals was achieved with < 80, < 100, < 140 and < 200 SNP markers (with increasing stringency threshold levels) across all the examined methods for marker selection. No further gain in power of assignment was achieved by sampling in excess of 200 SNP markers. The marker selection method that required the lowest number of SNP markers to verify the animal's breed origin was Wright's FST (60 to 140 SNPs depending on the chosen degree of confidence). Certain breeds required fewer markers (< 100) to achieve 100% assignment success. In contrast, closely related breeds require more markers (~200) to achieve > 95% assignment success. The power of assignment success, and therefore the number of SNP markers required, is dependent on the levels of genetic heterogeneity and pool of samples considered.
Conclusions
While all SNP selection methods produced marker panels capable of breed identification, the power of assignment varied markedly among analysis methods. Thus, with effective exploration of available high density genetic markers, a diagnostic panel of highly informative markers can be produced.
doi:10.1186/1471-2156-12-45
PMCID: PMC3118130  PMID: 21569514
5.  Accuracy of various human NAT2 SNP genotyping panels to infer rapid, intermediate and slow acetylator phenotypes 
Pharmacogenomics  2011;13(1):31-41.
Aim
Humans exhibit genetic polymorphism in NAT2 resulting in rapid, intermediate and slow acetylator phenotypes. Over 65 NAT2 variants possessing one or more SNPs in the 870-bp NAT2 coding region have been reported. The seven most frequent SNPs are rs1801279 (191G>A), rs1041983 (282C>T), rs1801280 (341T>C), rs1799929 (481C>T), rs1799930 (590G>A), rs1208 (803A>G) and rs1799931 (857G>A). The majority of studies investigate the NAT2 genotype assay for three SNPs: 481C>T, 590G>A and 857G>A. A tag-SNP (rs1495741) recently identified in a genome-wide association study has also been proposed as a biomarker for the NAT2 phenotype.
Materials & methods
Sulfamethazine N-acetyltransferase catalytic activities were measured in cryopreserved human hepatocytes from a convenience sample of individuals in the USA with an ethnic frequency similar to the 2010 US population census. These activities were segregated by the tag-SNP rs1495741 and each of the seven SNPs described above. We assessed the accuracy of the tag-SNP and various two-, three-, four- and seven-SNP genotyping panels for their ability to accurately infer NAT2 phenotype.
Results
The accuracy of the various NAT2 SNP genotype panels to infer NAT2 phenotype were as follows: seven-SNP: 98.4%; tag-SNP: 77.7%; two-SNP: 96.1%; three-SNP: 92.2%; and four-SNP: 98.4%.
Conclusion
A NAT2 four-SNP genotype panel of rs1801279 (191G>A), rs1801280 (341T>C), rs1799930 (590G>A) and rs1799931 (857G>A) infers NAT2 acetylator phenotype with high accuracy, and is recommended over the tag-, two-, three- and (for economy of scale) the seven-SNP genotyping panels, particularly in populations of non-European ancestry.
doi:10.2217/pgs.11.122
PMCID: PMC3285565  PMID: 22092036
acetylator genotype; acetylator phenotype; cryopreserved; human hepatocyte; NAT2; SNP
6.  Improved estimation of inbreeding and kinship in pigs using optimized SNP panels 
BMC Genetics  2013;14:92.
Background
Traditional breeding programs consider an average pairwise kinship between sibs. Based on pedigree information, the relationship matrix is used for genetic evaluations disregarding variation due to Mendelian sampling. Therefore, inbreeding and kinship coefficients are either over or underestimated resulting in reduction of accuracy of genetic evaluations and genetic progress. Single nucleotide polymorphism (SNPs) can be used to estimate pairwise kinship and individual inbreeding more accurately. The aim of this study was to optimize the selection of markers and determine the required number of SNPs for estimation of kinship and inbreeding.
Results
A total of 1,565 animals from three commercial pig populations were analyzed for 28,740 SNPs from the PorcineSNP60 Beadchip. Mean genomic inbreeding was higher than pedigree-based estimates in lines 2 and 3, but lower in line 1. As expected, a larger variation of genomic kinship estimates was observed for half and full sibs than for pedigree-based kinship reflecting Mendelian sampling. Genomic kinship between father-offspring pairs was lower (0.23) than the estimate based on pedigree (0.26). Bootstrap analyses using six reduced SNP panels (n = 500, 1000, 1500, 2000, 2500 and 3000) showed that 2,000 SNPs were able to reproduce the results very close to those obtained using the full set of unlinked markers (n = 7,984-10,235) with high correlations (inbreeding r > 0.82 and kinship r > 0.96) and low variation between different sets with the same number of SNPs.
Conclusions
Variation of kinship between sibs due to Mendelian sampling is better captured using genomic information than the pedigree-based method. Therefore, the reduced sets of SNPs could generate more accurate kinship coefficients between sibs than the pedigree-based method. Variation of genomic kinship of father-offspring pairs is recommended as a parameter to determine accuracy of the method rather than correlation with pedigree-based estimates. Inbreeding and kinship coefficients can be estimated with high accuracy using ≥2,000 unlinked SNPs within all three commercial pig lines evaluated. However, a larger number of SNPs might be necessary in other populations or across lines.
doi:10.1186/1471-2156-14-92
PMCID: PMC3849284  PMID: 24063757
Linkage equilibrium; Bootstrap; Pedigree; Genomic selection; Relationship
7.  Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas 
PLoS Genetics  2012;8(3):e1002554.
Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R2>0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.
Author Summary
Individuals from Latin America are descendants of multiple ancestral populations, primarily Native American, European, and African ancestors. The relative proportions of these ancestries can be estimated using genetic markers, known as ancestry informative markers (AIMs), whose allele frequency varies between the ancestral groups. Once determined, these ancestral proportions can be correlated with normal phenotypes, can be associated with disease, can be used to control for confounding due to population stratification, or can inform on the history of admixture in a population. In this study, we identified a panel of AIMs relevant to Latin American populations, validated the panel by comparing estimates of ancestry using the panel to ancestry determined from genome-wide data, and tested the panel in a diverse set of populations from the Americas. The panel of AIMs produces ancestry estimates that are highly accurate and appropriately controlled for population stratification, and it was used to genotype 18 populations from throughout Latin America. We have made the panel of AIMs available to any researcher interested in estimating ancestral proportions for populations from the Americas.
doi:10.1371/journal.pgen.1002554
PMCID: PMC3297575  PMID: 22412386
8.  Ancestry-Shift Refinement Mapping of the C6orf97-ESR1 Breast Cancer Susceptibility Locus 
PLoS Genetics  2010;6(7):e1001029.
We used an approach that we term ancestry-shift refinement mapping to investigate an association, originally discovered in a GWAS of a Chinese population, between rs2046210[T] and breast cancer susceptibility. The locus is on 6q25.1 in proximity to the C6orf97 and estrogen receptor α (ESR1) genes. We identified a panel of SNPs that are correlated with rs2046210 in Chinese, but not necessarily so in other ancestral populations, and genotyped them in breast cancer case∶control samples of Asian, European, and African origin, a total of 10,176 cases and 13,286 controls. We found that rs2046210[T] does not confer substantial risk of breast cancer in Europeans and Africans (OR = 1.04, P = 0.099, and OR = 0.98, P = 0.77, respectively). Rather, in those ancestries, an association signal arises from a group of less common SNPs typified by rs9397435. The rs9397435[G] allele was found to confer risk of breast cancer in European (OR = 1.15, P = 1.2×10−3), African (OR = 1.35, P = 0.014), and Asian (OR = 1.23, P = 2.9×10−4) population samples. Combined over all ancestries, the OR was 1.19 (P = 3.9×10−7), was without significant heterogeneity between ancestries (Phet = 0.36) and the SNP fully accounted for the association signal in each ancestry. Haplotypes bearing rs9397435[G] are well tagged by rs2046210[T] only in Asians. The rs9397435[G] allele showed associations with both estrogen receptor positive and estrogen receptor negative breast cancer. Using early-draft data from the 1,000 Genomes project, we found that the risk allele of a novel SNP (rs77275268), which is closely correlated with rs9397435, disrupts a partially methylated CpG sequence within a known CTCF binding site. These studies demonstrate that shifting the analysis among ancestral populations can provide valuable resolution in association mapping.
Author Summary
In genome-wide association studies of disease susceptibility, there is no particular expectation that a genotyped SNP showing an association is itself a pathogenic variant. Rather, it is more likely that a SNP giving a signal does so because it is in linkage disequilibrium (LD) with a pathogenic variant. When the analysis is shifted to a population of another ancestry, the tagging relationship between the genotyped SNP and the pathogenic variant may be disrupted, due to differing patterns of LD between populations. Thus, it is not straightforward to determine whether a susceptibility locus identified in one ancestral population is also associated with risk in another. Moreover, the differing patterns of LD between ancestral populations can be used to gain resolution in genetic mapping. We refer to this approach as ancestry-shift refinement mapping. Here, we apply it to a breast cancer risk variant near the estrogen receptor α gene that was initially described in a Chinese population. We show that the tagging relationship between the originally described SNP rs2046210 and the pathogenic variant(s) is not maintained in Europeans and Africans. We identify a SNP, rs9397435, that is associated with breast cancer risk in populations of Asian, European, and African ancestry.
doi:10.1371/journal.pgen.1001029
PMCID: PMC2908678  PMID: 20661439
9.  SAQC: SNP Array Quality Control 
BMC Bioinformatics  2011;12:100.
Background
Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed.
Results
We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples.
Conclusions
This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm).
doi:10.1186/1471-2105-12-100
PMCID: PMC3101186  PMID: 21501472
10.  Clinical validation of a genetic model to estimate the risk of developing choroidal neovascular age-related macular degeneration 
Human Genomics  2011;5(5):420-440.
Predictive tests for estimating the risk of developing late-stage neovascular age-related macular degeneration (AMD) are subject to unique challenges. AMD prevalence increases with age, clinical phenotypes are heterogeneous and control collections are prone to high false-negative rates, as many control subjects are likely to develop disease with advancing age. Risk prediction tests have been presented previously, using up to ten genetic markers and a range of self-reported non-genetic variables such as body mass index (BMI) and smoking history. In order to maximise the accuracy of prediction for mainstream genetic testing, we sought to derive a test comparable in performance to earlier testing models but based purely on genetic markers, which are static through life and not subject to misreporting. We report a multicentre assessment of a larger panel of single nucleotide polymorphisms (SNPs) than previously analysed, to improve further the classification performance of a predictive test to estimate the risk of developing choroidal neovascular (CNV) disease. We developed a predictive model based solely on genetic markers and avoided inclusion of self-reported variables (eg smoking history) or non-static factors (BMI, education status) that might otherwise introduce inaccuracies in calculating individual risk estimates. We describe the performance of a test panel comprising 13 SNPs genotyped across a consolidated collection of four patient cohorts obtained from academic centres deemed appropriate for pooling. We report on predictive effect sizes and their classification performance. By incorporating multiple cohorts of homogeneous ethnic origin, we obtained >80 per cent power to detect differences in genetic variants observed between cases and controls. We focused our study on CNV, a subtype of advanced AMD associated with a severe and potentially treatable form of the disease. Lastly, we followed a two-stage strategy involving both test model development and test model validation to present estimates of classification performance anticipated in the larger clinical setting. The model contained nine SNPs tagging variants in the regulators of complement activation (RCA) locus spanning the complement factor H (CFH), complement factor H-related 4 (CFHR4), complement factor H-related 5 (CFHR5) and coagulation factor XIII B subunit (F13B) genes; the four remaining SNPs targeted polymorphisms in the complement component 2 (C2), complement factor B (CFB), complement component 3 (C3) and age-related maculopathy susceptibility protein 2 (ARMS2) genes. The pooled sample size (1,132 CNV cases, 822 controls) allowed for both model development and model validation to confirm the accuracy of risk prediction. At the validation stage, our test model yielded 82 per cent sensitivity and 63 per cent specificity, comparable with metrics reported with earlier testing models that included environmental risk factors. Our test had an area under the curve of 0.80, reflecting a modest improvement compared with tests reported with fewer SNPs.
doi:10.1186/1479-7364-5-5-420
PMCID: PMC3525964  PMID: 21807600
age-related macular degeneration (AMD); choroidal neovascularisation (CNV); complement factor H (CFH); genetic testing
11.  Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples 
PLoS Genetics  2010;6(3):e1000866.
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).
Author Summary
Many association studies have been published looking for genetic variants contributing to a variety of human traits such as obesity, diabetes, and height. Because the frequency of genetic variants can differ across populations, it is important to have estimates of genetic ancestry in the individuals being studied. In this study, we were able to measure genetic ancestry in populations of mixed ancestry by genotyping pooled, rather than individual, DNA samples. This represents a rapid and inexpensive means for modeling genetic ancestry and thus could facilitate future association or population-genetic studies in populations of unknown ancestry for which whole-genome data do not already exist.
doi:10.1371/journal.pgen.1000866
PMCID: PMC2832667  PMID: 20221249
12.  Replication of genetic loci for ages at menarche and menopause in the multi-ethnic Population Architecture using Genomics and Epidemiology (PAGE) study 
Human Reproduction (Oxford, England)  2013;28(6):1695-1706.
STUDY QUESTION
Do genetic associations identified in genome-wide association studies (GWAS) of age at menarche (AM) and age at natural menopause (ANM) replicate in women of diverse race/ancestry from the Population Architecture using Genomics and Epidemiology (PAGE) Study?
SUMMARY ANSWER
We replicated GWAS reproductive trait single nucleotide polymorphisms (SNPs) in our European descent population and found that many SNPs were also associated with AM and ANM in populations of diverse ancestry.
WHAT IS KNOWN ALREADY
Menarche and menopause mark the reproductive lifespan in women and are important risk factors for chronic diseases including obesity, cardiovascular disease and cancer. Both events are believed to be influenced by environmental and genetic factors, and vary in populations differing by genetic ancestry and geography. Most genetic variants associated with these traits have been identified in GWAS of European-descent populations.
STUDY DESIGN, SIZE, DURATION
A total of 42 251 women of diverse ancestry from PAGE were included in cross-sectional analyses of AM and ANM.
MATERIALS, SETTING, METHODS
SNPs previously associated with ANM (n = 5 SNPs) and AM (n = 3 SNPs) in GWAS were genotyped in American Indians, African Americans, Asians, European Americans, Hispanics and Native Hawaiians. To test SNP associations with ANM or AM, we used linear regression models stratified by race/ethnicity and PAGE sub-study. Results were then combined in race-specific fixed effect meta-analyses for each outcome. For replication and generalization analyses, significance was defined at P < 0.01 for ANM analyses and P < 0.017 for AM analyses.
MAIN RESULTS AND THE ROLE OF CHANCE
We replicated findings for AM SNPs in the LIN28B locus and an intergenic region on 9q31 in European Americans. The LIN28B SNPs (rs314277 and rs314280) were also significantly associated with AM in Asians, but not in other race/ethnicity groups. Linkage disequilibrium (LD) patterns at this locus varied widely among the ancestral groups. With the exception of an intergenic SNP at 13q34, all ANM SNPs replicated in European Americans. Three were significantly associated with ANM in other race/ethnicity populations: rs2153157 (6p24.2/SYCP2L), rs365132 (5q35/UIMC1) and rs16991615 (20p12.3/MCM8). While rs1172822 (19q13/BRSK1) was not significant in the populations of non-European descent, effect sizes showed similar trends.
LIMITATIONS, REASONS FOR CAUTION
Lack of association for the GWAS SNPs in the non-European American groups may be due to differences in locus LD patterns between these groups and the European-descent populations included in the GWAS discovery studies; and in some cases, lower power may also contribute to non-significant findings.
WIDER IMPLICATIONS OF THE FINDINGS
The discovery of genetic variants associated with the reproductive traits provides an important opportunity to elucidate the biological mechanisms involved with normal variation and disorders of menarche and menopause. In this study we replicated most, but not all reported SNPs in European descent populations and examined the epidemiologic architecture of these early reported variants, describing their generalizability and effect size across differing ancestral populations. Such data will be increasingly important for prioritizing GWAS SNPs for follow-up in fine-mapping and resequencing studies, as well as in translational research.
STUDY FUNDING/COMPETING INTEREST(S)
The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI), supported by U01HG004803 (CALiCo), U01HG004798 (EAGLE), U01HG004802 (MEC), U01HG004790 (WHI) and U01HG004801 (Coordinating Center), and their respective NHGRI ARRA supplements. The authors report no conflicts of interest.
doi:10.1093/humrep/det071
PMCID: PMC3657124  PMID: 23508249
menopause; menarche; genome-wide association study; race/ethnicity; single nucleotide polymorphism
13.  Self-reported Ethnicity, Genetic Structure and the Impact of Population Stratification in a Multiethnic Study 
Human genetics  2010;128(2):165-177.
It is well-known that population substructure may lead to confounding in case-control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of 5 ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed 4 important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case-control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.
doi:10.1007/s00439-010-0841-4
PMCID: PMC3057055  PMID: 20499252
AIMs; African American; Native Hawaiian; Latino; admixture; principal component analysis
14.  Evaluation of Common Type 2 Diabetes Risk Variants in a South Asian Population of Sri Lankan Descent 
PLoS ONE  2014;9(6):e98608.
Introduction
Most studies seeking common variant associations with type 2 diabetes (T2D) have focused on individuals of European ancestry. These discoveries need to be evaluated in other major ancestral groups, to understand ethnic differences in predisposition, and establish whether these contribute to variation in T2D prevalence and presentation. This study aims to establish whether common variants conferring T2D-risk in Europeans contribute to T2D-susceptibility in the South Asian population of Sri Lanka.
Methodology
Lead single nucleotide polymorphism (SNPs) at 37 T2D-risk loci attaining genome-wide significance in Europeans were genotyped in 878 T2D cases and 1523 normoglycaemic controls from Sri Lanka. Association testing was performed by logistic regression adjusting for age and sex and by the Cochran-Mantel-Haenszel test after stratifying according to self-identified ethnolinguistic subgroup. A weighted genetic risk score was generated to examine the combined effect of these SNPs on T2D-risk in the Sri Lankan population.
Results
Of the 36 SNPs passing quality control, sixteen showed nominal (p<0.05) association in Sri Lankan samples, fifteen of those directionally-consistent with the original signal. Overall, these association findings were robust to analyses that accounted for membership of ethnolinguistic subgroups. Overall, the odds ratios for 31 of the 36 SNPs were directionally-consistent with those observed in Europeans (p = 3.2×10−6). Allelic odds ratios and risk allele frequencies in Sri Lankan subjects were not systematically different to those reported in Europeans. Genetic risk score and risk of T2D were strongly related in Sri Lankans (per allele OR 1.10 [95%CI 1.08–1.13], p = 1.2×10−17).
Conclusion
Our data indicate that most T2D-risk variants identified in Europeans have similar effects in South Asians from Sri Lanka, and that systematic difference in common variant associations are unlikely to explain inter-ethnic differences in prevalence or presentation of T2D.
doi:10.1371/journal.pone.0098608
PMCID: PMC4057178  PMID: 24926958
15.  Straightforward Inference of Ancestry and Admixture Proportions through Ancestry-Informative Insertion Deletion Multiplexing 
PLoS ONE  2012;7(1):e29684.
Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies.
This work presents a set of 46 ancestry-informative insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins (African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs) through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach.
HGDP-CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of the assay in clustering populations from different continental origins and to establish reference databases. In addition, other populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good estimates of ancestry proportions at the population level.
In conclusion, we have optimized the multiplexed genotyping of 46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping technologies.
doi:10.1371/journal.pone.0029684
PMCID: PMC3260179  PMID: 22272242
16.  Development of admixture mapping panels for African Americans from commercial high-density SNP arrays 
BMC Genomics  2010;11:417.
Background
Admixture mapping is a powerful approach for identifying genetic variants involved in human disease that exploits the unique genomic structure in recently admixed populations. To use existing published panels of ancestry-informative markers (AIMs) for admixture mapping, markers have to be genotyped de novo for each admixed study sample and samples representing the ancestral parental populations. The increased availability of dense marker data on commercial chips has made it feasible to develop panels wherein the markers need not be predetermined.
Results
We developed two panels of AIMs (~2,000 markers each) based on the Affymetrix Genome-Wide Human SNP Array 6.0 for admixture mapping with African American samples. These two AIM panels had good map power that was higher than that of a denser panel of ~20,000 random markers as well as other published panels of AIMs. As a test case, we applied the panels in an admixture mapping study of hypertension in African Americans in the Washington, D.C. metropolitan area.
Conclusions
Developing marker panels for admixture mapping from existing genome-wide genotype data offers two major advantages: (1) no de novo genotyping needs to be done, thereby saving costs, and (2) markers can be filtered for various quality measures and replacement markers (to minimize gaps) can be selected at no additional cost. Panels of carefully selected AIMs have two major advantages over panels of random markers: (1) the map power from sparser panels of AIMs is higher than that of ~10-fold denser panels of random markers, and (2) clusters can be labeled based on information from the parental populations. With current technology, chip-based genome-wide genotyping is less expensive than genotyping ~20,000 random markers. The major advantage of using random markers is the absence of ascertainment effects resulting from the process of selecting markers. The ability to develop marker panels informative for ancestry from SNP chip genotype data provides a fresh opportunity to conduct admixture mapping for disease genes in admixed populations when genome-wide association data exist or are planned.
doi:10.1186/1471-2164-11-417
PMCID: PMC2996945  PMID: 20602785
17.  Population Structure of Hispanics in the United States: The Multi-Ethnic Study of Atherosclerosis 
PLoS Genetics  2012;8(4):e1002640.
Using ∼60,000 SNPs selected for minimal linkage disequilibrium, we perform population structure analysis of 1,374 unrelated Hispanic individuals from the Multi-Ethnic Study of Atherosclerosis (MESA), with self-identification corresponding to Central America (n = 93), Cuba (n = 50), the Dominican Republic (n = 203), Mexico (n = 708), Puerto Rico (n = 192), and South America (n = 111). By projection of principal components (PCs) of ancestry to samples from the HapMap phase III and the Human Genome Diversity Panel (HGDP), we show the first two PCs quantify the Caucasian, African, and Native American origins, while the third and fourth PCs bring out an axis that aligns with known South-to-North geographic location of HGDP Native American samples and further separates MESA Mexican versus Central/South American samples along the same axis. Using k-means clustering computed from the first four PCs, we define four subgroups of the MESA Hispanic cohort that show close agreement with self-identification, labeling the clusters as primarily Dominican/Cuban, Mexican, Central/South American, and Puerto Rican. To demonstrate our recommendations for genetic analysis in the MESA Hispanic cohort, we present pooled and stratified association analysis of triglycerides for selected SNPs in the LPL and TRIB1 gene regions, previously reported in GWAS of triglycerides in Caucasians but as yet unconfirmed in Hispanic populations. We report statistically significant evidence for genetic association in both genes, and we further demonstrate the importance of considering population substructure and genetic heterogeneity in genetic association studies performed in the United States Hispanic population.
Author Summary
Using genotype data from about 60,000 distinct genetic markers, we examined population structure in 1,374 unrelated Hispanic individuals from the Multi-Ethnic Study of Atherosclerosis (MESA), with self-identification corresponding to Central America (n = 93), Cuba (n = 50), the Dominican Republic (n = 203), Mexico (n = 708), Puerto Rico (n = 192), and South America (n = 111). By comparing genetic ancestry of MESA Hispanic participants to reference samples representing worldwide diversity, we show major differences in ancestry of MESA Hispanics reflecting their Caucasian, African, and Native American origins, with finer differences corresponding to North-South geographic origins that separate MESA Mexican versus Central/South American samples. Based on our analysis, we define four subgroups of the MESA Hispanic cohort that show close agreement with the following self-identified regions of origin: Dominican/Cuban, Mexican, Central/South American, and Puerto Rican. We examine association of triglycerides with selected genetic markers, and we further demonstrate the importance of considering differences in genetic ancestry (or factors associated with genetic ancestry) when performing genetic studies of the United States Hispanic population.
doi:10.1371/journal.pgen.1002640
PMCID: PMC3325201  PMID: 22511882
18.  Replication of genome-wide association studies (GWAS) loci for fasting plasma glucose in African-Americans 
Diabetologia  2010;54(4):783-788.
Aims/hypothesis
Chronically elevated blood glucose (hyperglycaemia) is the primary indicator of type 2 diabetes, which has a prevalence that varies considerably by ethnicity in the USA, with African-Americans disproportionately affected. Genome-wide association studies (GWASs) have significantly enhanced our understanding of the genetic basis of diabetes and related traits, including fasting plasma glucose (FPG). However, the majority of GWASs have been conducted in populations of European ancestry. Thus, it is important to conduct replication analyses in populations with non-European ancestry to identify shared loci associated with FPG across populations.
Methods
We used data collected from non-diabetic unrelated African-American individuals (n = 927) who participated in the Howard University Family Study to attempt to replicate previously published GWASs of FPG. Of the 29 single nucleotide polymorphisms (SNPs) previously reported, we directly tested 20 in this study. In addition to the direct test, we queried a 500 kb window centred on all 29 reported SNPs for local replication of additional markers in linkage disequilibrium (LD).
Results
Using direct SNP and LD-based comparisons, we replicated multiple SNPs previously associated with FPG and strongly associated with type 2 diabetes in populations with European ancestry. The replicated SNPs included those in or near TCF7L2, SLC30A8, G6PC2, MTNR1B, DGKB-TMEM195 and GCKR. We also replicated additional variants in LD with the reported SNPs in ZMAT4 and adjacent to IRS1.
Conclusions/interpretation
We identified multiple GWAS variants for FPG in our cohort of African-Americans. Using an LD-based strategy we also identified SNPs not previously reported, demonstrating the utility of using diverse populations for replication analysis.
Electronic supplementary material
The online version of this article (doi:10.1007/s00125-010-2002-7) contains supplementary material, which is available to authorised users.
doi:10.1007/s00125-010-2002-7
PMCID: PMC3052446  PMID: 21188353
African-American; Association; GWAS; Replication; Type 2 diabetes
19.  The Effect of Chromosome 9p21 Variants on Cardiovascular Disease May Be Modified by Dietary Intake: Evidence from a Case/Control and a Prospective Study 
PLoS Medicine  2011;8(10):e1001106.
Ron Do and colleagues find that a prudent diet high in raw vegetables may modify the increased genetic risk of cardiovascular disease conferred by the chromosome 9p21 SNP.
Background
One of the most robust genetic associations for cardiovascular disease (CVD) is the Chromosome 9p21 region. However, the interaction of this locus with environmental factors has not been extensively explored. We investigated the association of 9p21 with myocardial infarction (MI) in individuals of different ethnicities, and tested for an interaction with environmental factors.
Methods and Findings
We genotyped four 9p21 SNPs in 8,114 individuals from the global INTERHEART study. All four variants were associated with MI, with odds ratios (ORs) of 1.18 to 1.20 (1.85×10−8≤p≤5.21×10−7). A significant interaction (p = 4.0×10−4) was observed between rs2383206 and a factor-analysis-derived “prudent” diet pattern score, for which a major component was raw vegetables. An effect of 9p21 on MI was observed in the group with a low prudent diet score (OR = 1.32, p = 6.82×10−7), but the effect was diminished in a step-wise fashion in the medium (OR = 1.17, p = 4.9×10−3) and high prudent diet scoring groups (OR = 1.02, p = 0.68) (p = 0.014 for difference). We also analyzed data from 19,129 individuals (including 1,014 incident cases of CVD) from the prospective FINRISK study, which used a closely related dietary variable. In this analysis, the 9p21 risk allele demonstrated a larger effect on CVD risk in the groups with diets low or average for fresh vegetables, fruits, and berries (hazard ratio [HR] = 1.22, p = 3.0×10−4, and HR = 1.35, p = 4.1×10−3, respectively) compared to the group with high consumption of these foods (HR = 0.96, p = 0.73) (p = 0.0011 for difference). The combination of the least prudent diet and two copies of the risk allele was associated with a 2-fold increase in risk for MI (OR = 1.98, p = 2.11×10−9) in the INTERHEART study and a 1.66-fold increase in risk for CVD in the FINRISK study (HR = 1.66, p = 0.0026).
Conclusions
The risk of MI and CVD conferred by Chromosome 9p21 SNPs appears to be modified by a prudent diet high in raw vegetables and fruits.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Cardiovascular diseases (CVDs)—diseases that affect the heart and/or the blood vessels—are a leading cause of illness and death worldwide. In the United States, for example, the leading cause of death is coronary heart disease, a CVD in which narrowing of the heart's blood vessels by fatty deposits slows the blood supply to the heart and may eventually cause a heart attack (myocardial infarction, or MI); the third leading cause of death in the US is stroke, a CVD in which the brain's blood supply is interrupted. Environmental factors such as diet, physical activity, and smoking alter a person's risk of developing CVD. In addition, certain genetic variants (alterations in the DNA that forms the body's blueprint; DNA is packed into structures called chromosomes) alter the risk of developing CVD and are passed from parent to child. Thus, in CVD, as in most common diseases, both genetics and the environment play a role.
Why Was This Study Done?
Recent studies have identified several genetic variants that are associated with an increased risk of developing CVD. One of the most robust of these genetic associations is a cluster of single nucleotide polymorphisms (SNPs, differences in a single DNA building block) in a chromosomal region (locus) called 9p21. So far, this association has been mainly studied in European populations. Moreover, the interaction of this locus with environmental factors has not been extensively studied. A better understanding of how 9p21 variants affect CVD risk in people of different ethnicities and of the interaction between this locus and environmental factors could allow the development of targeted strategies for the prevention of CVD. In this study, the researchers investigate the association of 9p21 risk variants with CVD in people of different ethnicities and test for an interaction between this locus and environmental factors.
What Did the Researchers Do and Find?
The researchers assessed four 9p21 SNPs in people enrolled in the INTERHEART study, a global retrospective case-control study that investigated potential MI risk factors by comparing people who had had an acute non-fatal MI with similar people without heart disease. All four SNP risk variants increased the risk of MI by about a fifth. However, the effect of the SNPs on MI was influenced by the “prudent” diet pattern score of the INTERHEART participants, a score that includes fresh fruit and vegetable intake as recorded in food frequency questionnaires. That is, the risk of MI in people carrying SNP risk variants was influenced by their diet. The strongest interaction was seen with an SNP called rs2383206, but although rs2383206 carriers who ate a diet poor in fruits and vegetables had a higher risk of MI than people with a similar diet who did not carry this SNP, rs2383206 carriers and non-carriers who ate a fruit- and vegetable-rich diet had a comparable MI risk. Overall, the combination of the least “prudent” diet and two copies of the risk variant (human cells contain two complete sets of chromosomes) was associated with a two-fold increase in risk for MI in the INTERHEART study. Additionally, data collected in the FINRISK study, which characterized healthy individuals living in Finland at baseline and then followed them to see whether they developed CVD, revealed a similar interaction between diet and 9p21 SNPs.
What Do These Findings Mean?
These findings suggest that the risk of CVD conferred by chromosome 9p21 SNPs may be influenced by diet in multiple ethnic groups. Importantly, they suggest that the deleterious effect of 9p21 SNPs on CVD might be mitigated by consuming a diet rich in fresh fruits and vegetables. The accuracy of these findings may be affected by recall bias in the INTERHEART study (that is, some people may not have remembered their diet accurately) and by the small number of CVD cases in the FINRISK study. Nevertheless, these findings suggest that gene–environment interactions are important drivers of CVD, and they raise the possibility that a sound diet can mediate the effects of 9p21 SNPs.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001106.
The American Heart Association provides information about many types of cardiovascular disease for patients, caregivers, and professionals and tips on keeping the heart healthy
The UK National Health Service Choices website provides information about cardiovascular disease and stroke
Information is available from the British Heart Foundation on heart disease and keeping the heart healthy
The US National Heart Lung and Blood Institute provides information on a wide range of cardiovascular diseases
MedlinePlus provides links to many other sources of information on heart diseases, vascular diseases, and stroke (in English and Spanish)
The US Centers for Disease Control and Prevention has a simple fact sheet on gene-environment interactions; the US National Institute of Environmental Health Sciences provides links to other information on gene-environment interactions
More information is available on the INTERHEART study and on the FINRISK study
doi:10.1371/journal.pmed.1001106
PMCID: PMC3191151  PMID: 22022235
20.  Uniparental Markers of Contemporary Italian Population Reveals Details on Its Pre-Roman Heritage 
PLoS ONE  2012;7(12):e50794.
Background
According to archaeological records and historical documentation, Italy has been a melting point for populations of different geographical and ethnic matrices. Although Italy has been a favorite subject for numerous population genetic studies, genetic patterns have never been analyzed comprehensively, including uniparental and autosomal markers throughout the country.
Methods/Principal Findings
A total of 583 individuals were sampled from across the Italian Peninsula, from ten distant (if homogeneous by language) ethnic communities — and from two linguistic isolates (Ladins, Grecani Salentini). All samples were first typed for the mitochondrial DNA (mtDNA) control region and selected coding region SNPs (mtSNPs). This data was pooled for analysis with 3,778 mtDNA control-region profiles collected from the literature. Secondly, a set of Y-chromosome SNPs and STRs were also analyzed in 479 individuals together with a panel of autosomal ancestry informative markers (AIMs) from 441 samples. The resulting genetic record reveals clines of genetic frequencies laid according to the latitude slant along continental Italy – probably generated by demographical events dating back to the Neolithic. The Ladins showed distinctive, if more recent structure. The Neolithic contribution was estimated for the Y-chromosome as 14.5% and for mtDNA as 10.5%. Y-chromosome data showed larger differentiation between North, Center and South than mtDNA. AIMs detected a minor sub-Saharan component; this is however higher than for other European non-Mediterranean populations. The same signal of sub-Saharan heritage was also evident in uniparental markers.
Conclusions/Significance
Italy shows patterns of molecular variation mirroring other European countries, although some heterogeneity exists based on different analysis and molecular markers. From North to South, Italy shows clinal patterns that were most likely modulated during Neolithic times.
doi:10.1371/journal.pone.0050794
PMCID: PMC3519480  PMID: 23251386
21.  Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records 
Purpose
The Vanderbilt DNA Databank (BioVU) is a biorepository that currently contains >80,000 DNA samples linked to electronic medical records. While BioVU is a valuable source of samples and phenotypes for genetic association studies, it is unclear whether the administratively assigned race/ethnicity in BioVU can accurately describe and be used as a proxy for genetic ancestry.
Methods
We genotyped 360 SNPs on the Illumina DNA Test Panel containing ancestry informative markers (AIMs) in 1910 BioVU samples with observer-reported ancestry and 384 samples from the Multiple Sclerosis Genetics Group with self-reported ancestry. Genetic ancestry was inferred for all individuals using STRUCTURE 2.2.
Results
More than 98% of observer-reported European Americans (EA) was genetically inferred to have at least 60% European ancestry. Ninety-three of observer-reported African Americans (AA) was genetically inferred to be predominantly of African ancestry. We determined that the concordance of observer-reported race/ethnicity and inferred genetic ancestry was not significantly different from that of self-reported race/ethnicity in either population (p=0.09 and 0.94 in European Americans and African Americans, respectively).
Conclusions
Observer-reported race/ethnicity for European Americans and African Americans approximates genetic ancestry as well as self-reported race/ethnicity, making biorepositories linked to EMRs such as BioVU a viable source of DNA samples for future large-scale genetic association studies.
doi:10.1097/GIM.0b013e3181efe2df
PMCID: PMC2952033  PMID: 20733501
biorepositories; admixture; ancestry; electronic medical record; population stratification
22.  Replication of GWAS “Hits” by Race for Breast and Prostate Cancers in European Americans and African Americans 
In this study, we assessed association of genome-wide association studies (GWAS) “hits” by race with adjustment for potential population stratification (PS) in two large, diverse study populations; the Carolina Breast Cancer Study (CBCS; N total = 3693 individuals) and the University of Pennsylvania Study of Clinical Outcomes, Risk, and Ethnicity (SCORE; N total = 1135 individuals). In both study populations, 136 ancestry information markers and GWAS “hits” (CBCS: FGFR2, 8q24; SCORE: JAZF1, MSMB, 8q24) were genotyped. Principal component analysis was used to assess ancestral differences by race. Multivariable unconditional logistic regression was used to assess differences in cancer risk with and without adjustment for the first ancestral principal component (PC1) and for an interaction effect between PC1 and the GWAS “hit” (SNP) of interest. PC1 explained 53.7% of the variance for CBCS and 49.5% of the variance for SCORE. European Americans and African Americans were similar in their ancestral structure between CBCS and SCORE and cases and controls were well matched by ancestry. In the CBCS European Americans, 9/11 SNPs were significant after PC1 adjustment, but after adjustment for the PC1 by SNP interaction effect, only one SNP remained significant (rs1219648 in FGFR2); for CBCS African Americans, 6/11 SNPs were significant after PC1 adjustment and after adjustment for the PC1 by SNP interaction effect, all six SNPs remained significant and an additional SNP now became significant. In the SCORE European Americans, 0/9 SNPs were significant after PC1 adjustment and no changes were seen after additional adjustment for the PC1 by SNP interaction effect; for SCORE African Americans, 2/9 SNPs were significant after PC1 adjustment and after adjustment for the PC1 by SNP interaction effect, only one SNP remained significant (rs16901979 at 8q24). We show that genetic associations by race are modified by interaction between individual SNPs and PS.
doi:10.3389/fgene.2011.00037
PMCID: PMC3268591  PMID: 22303333
population stratification; ancestry; prostate cancer; breast cancer; GWAS “hits”
23.  SNP-VISTA: An interactive SNP visualization tool 
BMC Bioinformatics  2005;6:292.
Background
Recent advances in sequencing technologies promise to provide a better understanding of the genetics of human disease as well as the evolution of microbial populations. Single Nucleotide Polymorphisms (SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it has become possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease in an attempt to identify causative mutations. In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples enables more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at [1].
Results
We have developed and present two modifications of an interactive visualization tool, SNP-VISTA, to aid in the analyses of the following types of data: A. Large-scale re-sequence data of disease-related genes for discovery of associated and/or causative alleles (GeneSNP-VISTA). B. Massive amounts of ecogenomics data for studying homologous recombination in microbial populations (EcoSNP-VISTA). The main features and capabilities of SNP-VISTA are: 1) mapping of SNPs to gene structure; 2) classification of SNPs, based on their location in the gene, frequency of occurrence in samples and allele composition; 3) clustering, based on user-defined subsets of SNPs, highlighting haplotypes as well as recombinant sequences; 4) integration of protein evolutionary conservation visualization; and 5) display of automatically calculated recombination points that are user-editable.
Conclusion
The main strength of SNP-VISTA is its graphical interface and use of visual representations, which support interactive exploration and hence better understanding of large-scale SNP data by the user.
doi:10.1186/1471-2105-6-292
PMCID: PMC1325058  PMID: 16336665
24.  Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions 
PLoS ONE  2010;5(11):e13734.
Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA) has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs) without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph Laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the similar underlying population structure efficiently. Employing a standard Support Vector Machine (SVM) to predict individuals' continental memberships on HGDP dataset of seven continents, we demonstrate that the selected SNPs by our method are more informative but less redundant than those selected by PCA. Our algorithm is a promising tool in genome-wide association studies and population genetics, facilitating the selection of structure informative markers, efficient detection of population substructure and ancestral inference.
doi:10.1371/journal.pone.0013734
PMCID: PMC2973949  PMID: 21079796
25.  Genetic Association for Renal Traits among Participants of African Ancestry Reveals New Loci for Renal Function 
PLoS Genetics  2011;7(9):e1002264.
Chronic kidney disease (CKD) is an increasing global public health concern, particularly among populations of African ancestry. We performed an interrogation of known renal loci, genome-wide association (GWA), and IBC candidate-gene SNP association analyses in African Americans from the CARe Renal Consortium. In up to 8,110 participants, we performed meta-analyses of GWA and IBC array data for estimated glomerular filtration rate (eGFR), CKD (eGFR <60 mL/min/1.73 m2), urinary albumin-to-creatinine ratio (UACR), and microalbuminuria (UACR >30 mg/g) and interrogated the 250 kb flanking region around 24 SNPs previously identified in European Ancestry renal GWAS analyses. Findings were replicated in up to 4,358 African Americans. To assess function, individually identified genes were knocked down in zebrafish embryos by morpholino antisense oligonucleotides. Expression of kidney-specific genes was assessed by in situ hybridization, and glomerular filtration was evaluated by dextran clearance. Overall, 23 of 24 previously identified SNPs had direction-consistent associations with eGFR in African Americans, 2 of which achieved nominal significance (UMOD, PIP5K1B). Interrogation of the flanking regions uncovered 24 new index SNPs in African Americans, 12 of which were replicated (UMOD, ANXA9, GCKR, TFDP2, DAB2, VEGFA, ATXN2, GATM, SLC22A2, TMEM60, SLC6A13, and BCAS3). In addition, we identified 3 suggestive loci at DOK6 (p-value = 5.3×10−7) and FNDC1 (p-value = 3.0×10−7) for UACR, and KCNQ1 with eGFR (p = 3.6×10−6). Morpholino knockdown of kcnq1 in the zebrafish resulted in abnormal kidney development and filtration capacity. We identified several SNPs in association with eGFR in African Ancestry individuals, as well as 3 suggestive loci for UACR and eGFR. Functional genetic studies support a role for kcnq1 in glomerular development in zebrafish.
Author Summary
Chronic kidney disease (CKD) is an increasing global public health problem and disproportionately affects populations of African ancestry. Many studies have shown that genetic variants are associated with the development of CKD; however, similar studies are lacking in African ancestry populations. The CARe consortium consists of more than 8,000 individuals of African ancestry; genome-wide association analysis for renal-related phenotypes was conducted. In cross-ethnicity analyses, we found that 23 of 24 previously identified SNPs in European ancestry populations have the same effect direction in our samples of African ancestry. We also identified 3 suggestive genetic variants associated with measurement of kidney function. We then tested these genes in zebrafish knockdown models and demonstrated that kcnq1 is involved in kidney development in zebrafish. These results highlight the similarity of genetic variants across ethnicities and show that cross-species modeling in zebrafish is feasible for genes associated with chronic human disease.
doi:10.1371/journal.pgen.1002264
PMCID: PMC3169523  PMID: 21931561

Results 1-25 (1352996)