Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  GWATCH: a web platform for automated gene association discovery analysis 
GigaScience  2014;3:18.
As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations.
Here we present a dynamic web-based platform – GWATCH – that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis.
GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH.
PMCID: PMC4220276  PMID: 25374661
AIDS; HIV; Complex diseases; Genome-wide association studies (GWAS); Whole genome sequencing (WGS)
2.  Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data 
PLoS Genetics  2013;9(12):e1004023.
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is in MXL, in CLM, and in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas thousand years ago (kya), supports that the MXL Ancestors split kya, with a subsequent split of the ancestors to CLM and PUR kya. The model also features effective populations of in Mexico, in Colombia, and in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.
Author Summary
Populations of the Americas have a rich and heterogeneous genetic and cultural heritage that draws from a diversity of pre-Columbian Native American, European, and African populations. Characterizing this diversity facilitates the development of medical genetics research in diverse populations and the transfer of medical knowledge across populations. It also represents an opportunity to better understand the peopling of the Americas, from the crossing of Beringia to the post-Columbian era. Here, we take advantage sequencing of individuals of Colombian (CLM), Mexican (MXL), and Puerto Rican (PUR) origin by the 1000 Genomes project to improve our demographic models for the peopling of the Americas. The divergence among African, European, and Native American ancestors to these populations enables us to infer the continent of origin at each locus in the sampled genomes. The resulting patterns of ancestry suggest complex post-Columbian migration histories, starting later in CLM than in MXL and PUR. Whereas European ancestral segments show evidence of relatedness, a demographic model of synonymous variation suggests that the Native American Ancestors to MXL, PUR, and CLM panels split within a few hundred years over 12 thousand years ago. Together with early archeological sites in South America, these results support rapid divergence during the initial peopling of the Americas.
PMCID: PMC3873240  PMID: 24385924
3.  Colonization of islands in the Mona Passage by endemic dwarf geckoes (genus Sphaerodactylus) reconstructed with mitochondrial phylogeny 
Ecology and Evolution  2013;3(13):4488-4500.
Little is known about the natural history of the Sphaerodactylus species endemic to the three islands located in the Mona Passage separating the Greater Antillean islands of Hispaniola and Puerto Rico. In this study, parts of two mitochondrial genes, 16S rRNA and 12S rRNA, were sequenced to determine the relationships between the sphaerodactylids that live in the Mona Passage and other Caribbean species from the same genus. While the main goal was to identify the biogeographical origin of these species, we also identified a genetically distinct type of dwarf gecko that warrants future evaluation as a possible new species. According to the reconstructed phylogenies, we propose a stepwise model of colonization wherein S. nicholsi from southwestern Puerto Rico or a very close ancestor gave rise through a founder event to Sphaerodactylus monensis on Mona Island. In a similar fashion, S. monensis or a very close ancestor on Mona Island gave rise to S. levinsi on Desecheo Island. This study also suggests that the most recent common ancestor between the species from the islands in the Mona Passage and Puerto Rico existed approximately 3 MYA.
PMCID: PMC3856748  PMID: 24340189
Caribbean; Desecheo; island biogeography; Mona; mtDNA; Sphaerodactylus.
4.  Evidence for selection at HIV host susceptibility genes in a West Central African human population 
HIV-1 derives from multiple independent transfers of simian immunodeficiency virus (SIV) strains from chimpanzees to human populations. We hypothesized that human populations in west central Africa may have been exposed to SIV prior to the pandemic, and that previous outbreaks may have selected for genetic resistance to immunodeficiency viruses. To test this hypothesis, we examined the genomes of Biaka Western Pygmies, who historically resided in communities within the geographic range of the central African chimpanzee subspecies (Pan troglodytes troglodytes) that carries strains of SIV ancestral to HIV-1.
SNP genotypes of the Biaka were compared to those of African human populations who historically resided outside the range of P. t. troglodytes, including the Mbuti Eastern Pygmies. Genomic regions showing signatures of selection were compared to the genomic locations of genes reported to be associated with HIV infection or pathogenesis. In the Biaka, a strong signal of selection was detected at CUL5, which codes for a component of the vif-mediated APOBEC3 degradation pathway. A CUL5 allele protective against AIDS progression was fixed in the Biaka. A signal of selection was detected at TRIM5, which codes for an HIV post-entry restriction factor. A protective mis-sense mutation in TRIM5 had the highest frequency in Biaka compared to other African populations, as did a protective allele for APOBEC3G, which codes for an anti-HIV-1 restriction factor. Alleles protective against HIV-1 for APOBEC3H, CXCR6 and HLA-C were at higher frequencies in the Biaka than in the Mbuti. Biaka genomes showed a strong signal of selection at TSG101, an inhibitor of HIV-1 viral budding.
We found protective alleles or evidence for selection in the Biaka at a number of genes associated with HIV-1 infection or progression. Pygmies have also been reported to carry genotypes protective against HIV-1 for the genes CCR5 and CCL3L1. Our hypothesis that HIV-1 may have shaped the genomes of some human populations in West Central Africa appears to merit further investigation.
PMCID: PMC3537702  PMID: 23217182
HIV dependency factors; Single nucleotide polymorphisms; Biaka pygmies; Mbuti pygmies
5.  A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education 
GigaScience  2012;1:14.
Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale.
In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences.
The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts.
PMCID: PMC3626513  PMID: 23587420
Amazona vittata; Puerto rican parrot; Genome sequence; Annotation; Assembly; Local funding; Education
6.  Reconciling Apparent Conflicts between Mitochondrial and Nuclear Phylogenies in African Elephants 
PLoS ONE  2011;6(6):e20642.
Conservation strategies for African elephants would be advanced by resolution of conflicting claims that they comprise one, two, three or four taxonomic groups, and by development of genetic markers that establish more incisively the provenance of confiscated ivory. We addressed these related issues by genotyping 555 elephants from across Africa with microsatellite markers, developing a method to identify those loci most effective at geographic assignment of elephants (or their ivory), and conducting novel analyses of continent-wide datasets of mitochondrial DNA. Results showed that nuclear genetic diversity was partitioned into two clusters, corresponding to African forest elephants (99.5% Cluster-1) and African savanna elephants (99.4% Cluster-2). Hybrid individuals were rare. In a comparison of basal forest “F” and savanna “S” mtDNA clade distributions to nuclear DNA partitions, forest elephant nuclear genotypes occurred only in populations in which S clade mtDNA was absent, suggesting that nuclear partitioning corresponds to the presence or absence of S clade mtDNA. We reanalyzed African elephant mtDNA sequences from 81 locales spanning the continent and discovered that S clade mtDNA was completely absent among elephants at all 30 sampled tropical forest locales. The distribution of savanna nuclear DNA and S clade mtDNA corresponded closely to range boundaries traditionally ascribed to the savanna elephant species based on habitat and morphology. Further, a reanalysis of nuclear genetic assignment results suggested that West African elephants do not comprise a distinct third species. Finally, we show that some DNA markers will be more useful than others for determining the geographic origins of illegal ivory. These findings resolve the apparent incongruence between mtDNA and nuclear genetic patterns that has confounded the taxonomy of African elephants, affirm the limitations of using mtDNA patterns to infer elephant systematics or population structure, and strongly support the existence of two elephant species in Africa.
PMCID: PMC3110795  PMID: 21701575
7.  Genetics of focal segmental glomerulosclerosis and HIV-associated collapsing glomerulopathy: the role of MYH9 genetic variation 
Seminars in nephrology  2010;30(2):111-125.
Until recently knowledge of genetic causes of glomerular disease was limited to certain rare or uncommon inherited diseases, and to a genes, either rare or with small effect, identified in candidate gene studies. These genetic factors accounted for only a very small fraction of kidney disease. However, the striking differences in frequency of many forms of kidney disease between African Americans and European Americans, which could not be completely explained by cultural or economic factors, pointed to a large unidentified genetic influence. Since FSGS and HIV-associated collapsing glomerulopathy (HVAN) have striking racial disparities, we performed an admixture mapping study to identify contributing genetic factors. Admixture mapping identified genetic variants in the non-muscle myosin gene MYH9 as having an extreme influence on both FSGS and HIVAN, with odds ratios from 4 to 8 and attributable fractions of 70–100%. Previously identified, rare inherited MYH9 disorders point to a mechanism by which MYH9 variation disrupts the actin-myosin filaments responsible for maintaining the structure of podocytes, the cells that provide one of three filtration barriers in the glomeruli. MYH9 variation has a smaller but still highly significant effect on non-diabetic kidney disease, and a weaker but significant effect on diabetic kidney disease; it is unclear whether underlying cryptic FSGS is responsible for the MYH9 association with these diseases. The strong predicted power of MYH9 variation for disease indicates a clear role for genetic testing for these variants in personalized medicine, for assessment of genetic risk, and potentially for diagnosis.
PMCID: PMC2862292  PMID: 20347641
8.  History Shaped the Geographic Distribution of Genomic Admixture on the Island of Puerto Rico 
PLoS ONE  2011;6(1):e16513.
Contemporary genetic variation among Latin Americans human groups reflects population migrations shaped by complex historical, social and economic factors. Consequently, admixture patterns may vary by geographic regions ranging from countries to neighborhoods. We examined the geographic variation of admixture across the island of Puerto Rico and the degree to which it could be explained by historic and social events. We analyzed a census-based sample of 642 Puerto Rican individuals that were genotyped for 93 ancestry informative markers (AIMs) to estimate African, European and Native American ancestry. Socioeconomic status (SES) data and geographic location were obtained for each individual. There was significant geographic variation of ancestry across the island. In particular, African ancestry demonstrated a decreasing East to West gradient that was partially explained by historical factors linked to the colonial sugar plantation system. SES also demonstrated a parallel decreasing cline from East to West. However, at a local level, SES and African ancestry were negatively correlated. European ancestry was strongly negatively correlated with African ancestry and therefore showed patterns complementary to African ancestry. By contrast, Native American ancestry showed little variation across the island and across individuals and appears to have played little social role historically. The observed geographic distributions of SES and genetic variation relate to historical social events and mating patterns, and have substantial implications for the design of studies in the recently admixed Puerto Rican population. More generally, our results demonstrate the importance of incorporating social and geographic data with genetics when studying contemporary admixed populations.
PMCID: PMC3031579  PMID: 21304981
9.  Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans 
Science (New York, N.Y.)  2010;329(5993):841-845.
African-Americans have higher rates of kidney disease than European-Americans. Here we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 [FSGS odds ratio = 10.5 (95% CI 6.0–18.4); H-ESKD odds ratio = 7.3 (95% CI 5.6–9.5)]. The two APOL1 variants are common in African chromosomes but absent from European chromosomes and both reside within haplotypes that harbor signatures of positive selection. ApoL1 is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.
PMCID: PMC2980843  PMID: 20647424
10.  Worldwide Distribution of the MYH9 Kidney Disease Susceptibility Alleles and Haplotypes: Evidence of Historical Selection in Africa 
PLoS ONE  2010;5(7):e11474.
MYH9 was recently identified as renal susceptibility gene (OR 3–8, p<10−8) for major forms of kidney disease disproportionately affecting individuals of African descent. The risk haplotype (E-1) occurs at much higher frequencies in African Americans (≥60%) than in European Americans (<4%), revealing a genetic basis for a major health disparity. The population distributions of MYH9 risk alleles and the E-1 risk haplotype and the demographic and selective forces acting on the MYH9 region are not well explored. We reconstructed MYH9 haplotypes from 4 tagging single nucleotide polymorphisms (SNPs) spanning introns 12–23 using available data from HapMap Phase II, and by genotyping 938 DNAs from the Human Genome Diversity Panel (HGDP). The E-1 risk haplotype followed a cline, being most frequent within sub-Saharan African populations (range 50–80%), less frequent in populations from the Middle East (9–27%) and Europe (0–9%), and rare or absent in Asia, the Americas, and Oceania. The fixation indexes (FST) for pairwise comparisons between the risk haplotypes for continental populations were calculated for MYH9 haplotypes; FST ranged from 0.27–0.40 for Africa compared to other continental populations, possibly due to selection. Uniquely in Africa, the Yoruba population showed high frequency extended haplotype length around the core risk allele (C) compared to the alternative allele (T) at the same locus (rs4821481, iHs = 2.67), as well as high population differentiation (FST(CEU vs. YRI) = 0.51) in HapMap Phase II data, also observable only in the Yoruba population from HGDP (FST = 0.49), pointing to an instance of recent selection in the genomic region. The population-specific divergence in MYH9 risk allele frequencies among the world's populations may prove important in risk assessment and public health policies to mitigate the burden of kidney disease in vulnerable populations.
PMCID: PMC2901326  PMID: 20634883
11.  Evaluation of IL10, IL19, and IL20 gene polymorphisms and chronic hepatitis B infection outcome 
Hepatitis B viral infection remains a serious global health problem despite the availability of a highly effective vaccine. Approximately 5% of HBV-infected adults develop chronic hepatitis B, which may result in liver cirrhosis or hepatocellular carcinoma. Variants of interleukin-10 (IL10) have been previously associated with chronic hepatitis B infection and progression to hepatocellular carcinoma. Single nucleotide polymorphisms (SNPs, n = 42) from the IL10, IL19, and IL20 gene regions were examined for an association with HBV infection outcome, either chronic or recovered, in a nested case-control study of African Americans and European Americans. Among African Americans, three nominally statistically significant SNP associations in IL10, two in IL20, and one haplotype association were observed with different HBV infection outcomes (P = 0.005–0.04). The SNP, rs1518108, in IL20 nominally deviated significantly from Hardy-Weinberg equilibrium in African Americans, with a large excess of heterozygotes in chronic HBV-infected cases (P = 0.0006), which suggests a strong genetic effect. Among European Americans, a nominally statistically significant SNP association in IL20, as well as an IL20 haplotype were associated with HBV recovery (P = 0.01–0.04). These results suggest that IL10 and IL20 gene variants influence HBV infection outcome and encourage the pursuit of further studies of these cytokines in HBV pathogenesis.
PMCID: PMC2874896  PMID: 18479293
Interleukin-10; Inflammation; African American; Immunogenetics; Hepatitis b; HIV co-infection
12.  Genome-wide scans for footprints of natural selection 
Detecting recent selected ‘genomic footprints’ applies directly to the discovery of disease genes and in the imputation of the formative events that molded modern population genetic structure. The imprints of historic selection/adaptation episodes left in human and animal genomes allow one to interpret modern and ancestral gene origins and modifications. Current approaches to reveal selected regions applied in genome-wide selection scans (GWSSs) fall into eight principal categories: (I) phylogenetic footprinting, (II) detecting increased rates of functional mutations, (III) evaluating divergence versus polymorphism, (IV) detecting extended segments of linkage disequilibrium, (V) evaluating local reduction in genetic variation, (VI) detecting changes in the shape of the frequency distribution (spectrum) of genetic variation, (VII) assessing differentiating between populations (FST), and (VIII) detecting excess or decrease in admixture contribution from one population. Here, we review and compare these approaches using available human genome-wide datasets to provide independent verification (or not) of regions found by different methods and using different populations. The lessons learned from GWSSs will be applied to identify genome signatures of historic selective pressures on genes and gene regions in other species with emerging genome sequences. This would offer considerable potential for genome annotation in functional, developmental and evolutionary contexts.
PMCID: PMC2842710  PMID: 20008396
genomes; genome-wide selection scans; whole genome sequences; candidate genes; human populations; vertebrate species
13.  Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22 
BMC Genomics  2009;10:51.
Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.
Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.
Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.
PMCID: PMC2654908  PMID: 19171065
14.  Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations 
PLoS ONE  2008;3(3):e1712.
When a selective sweep occurs in the chromosomal region around a target gene in two populations that have recently separated, it produces three dramatic genomic consequences: 1) decreased multi-locus heterozygosity in the region; 2) elevated or diminished genetic divergence (FST) of multiple polymorphic variants adjacent to the selected locus between the divergent populations, due to the alternative fixation of alleles; and 3) a consequent regional increase in the variance of FST (S2FST) for the same clustered variants, due to the increased alternative fixation of alleles in the loci surrounding the selection target. In the first part of our study, to search for potential targets of directional selection, we developed and validated a resampling-based computational approach; we then scanned an array of 31 different-sized moving windows of SNP variants (5–65 SNPs) across the human genome in a set of European and African American population samples with 183,997 SNP loci after correcting for the recombination rate variation. The analysis revealed 180 regions of recent selection with very strong evidence in either population or both. In the second part of our study, we compared the newly discovered putative regions to those sites previously postulated in the literature, using methods based on inspecting patterns of linkage disequilibrium, population divergence and other methodologies. The newly found regions were cross-validated with those found in nine other studies that have searched for selection signals. Our study was replicated especially well in those regions confirmed by three or more studies. These validated regions were independently verified, using a combination of different methods and different databases in other studies, and should include fewer false positives. The main strength of our analysis method compared to others is that it does not require dense genotyping and therefore can be used with data from population-based genome SNP scans from smaller studies of humans or other species.
PMCID: PMC2248624  PMID: 18320033

Results 1-14 (14)