1.  First Ancient Mitochondrial Human Genome from a Prepastoralist Southern African 
Genome Biology and Evolution  2014;6(10):2647-2653.
The oldest contemporary human mitochondrial lineages arose in Africa. The earliest divergent extant maternal offshoot, namely haplogroup L0d, is represented by click-speaking forager peoples of southern Africa. Broadly defined as Khoesan, contemporary Khoesan are today largely restricted to the semidesert regions of Namibia and Botswana, whereas archeological, historical, and genetic evidence promotes a once broader southerly dispersal of click-speaking peoples including southward migrating pastoralists and indigenous marine-foragers. No genetic data have been recovered from the indigenous peoples that once sustained life along the southern coastal waters of Africa prepastoral arrival. In this study we generate a complete mitochondrial genome from a 2,330-year-old male skeleton, confirmed through osteological and archeological analysis as practicing a marine-based forager existence. The ancient mtDNA represents a new L0d2c lineage (L0d2c1c) that is today, unlike its Khoe-language based sister-clades (L0d2c1a and L0d2c1b) most closely related to contemporary indigenous San-speakers (specifically Ju). Providing the first genomic evidence that prepastoral Southern African marine foragers carried the earliest diverged maternal modern human lineages, this study emphasizes the significance of Southern African archeological remains in defining early modern human origins.
PMCID: PMC4224329  PMID: 25212860
ancient DNA; mitochondrial genome; Khoesan; southern Africa; marine foragers; archeological skeletons
2.  Risk for HIV-1 Infection Associated With a Common CXCL12 (SDF1) Polymorphism and CXCR4 Variation in an African Population 
CXC chemokine ligand 12 (CXCL12), or stromal cell–derived factor 1 (SDF1), is the only known natural ligand for the HIV-1 coreceptor, CXC chemokine receptor 4 (CXCR4). A single nucleotide polymorphism (SNP) in the CXCL12 gene (SDF1-3′A) has been associated with disease progression to AIDS in some studies, but not others. Mutations in the CXCR4 gene are generally rare and have not been implicated in HIV-1/AIDS pathogenesis. This study analyzed the SDF1-3′A SNP and performed mutation screening for polymorphic markers in the CXCR4 gene to determine the presence or absence of significant associations with susceptibility to HIV-1 infection. The study consisted of 257 HIV-1–seropositive patients and 113 HIV-1–seronegative controls representing a sub-Saharan African population belonging to the Xhosa ethnic group of South Africa. The SDF1-3′A SNP was associated with an increased risk for HIV-1 infection (P = 0.0319) whereas no significant association was observed between the occurrence of the SDF1-3′A SNP and increased or decreased plasma levels of CXCL12. Comprehensive mutation analysis of the CXCR4 gene confirmed a high degree of genetic conservation within the coding region of this ancient population.
PMCID: PMC1369993  PMID: 16284526
CXC chemokine ligand 12 (CXCL12); CXC chemokine receptor 4 (CXCR4); SDF1-3′A single-nucleotide polymorphism; HIV-1 infection risk; African population
3.  Complete Khoisan and Bantu genomes from southern Africa 
Nature  2010;463(7283):943-947.
The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial1 and small sets of nuclear markers2 have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans1,3. However, until now, fully sequenced human genomes have been limited to recently diverged populations4–8. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data.
PMCID: PMC3890430  PMID: 20164927
4.  Addressing the contribution of previously described genetic and epidemiological risk factors associated with increased prostate cancer risk and aggressive disease within men from South Africa 
BMC Urology  2013;13:74.
Although African ancestry represents a significant risk factor for prostate cancer, few studies have investigated the significance of prostate cancer and relevance of previously defined genetic and epidemiological prostate cancer risk factors within Africa. We recently established the Southern African Prostate Cancer Study (SAPCS), a resource for epidemiological and genetic analysis of prostate cancer risk and outcomes in Black men from South Africa. Biased towards highly aggressive prostate cancer disease, this is the first reported data analysis.
The SAPCS is an ongoing population-based study of Black men with or without prostate cancer. Pilot analysis was performed for the first 837 participants, 522 cases and 315 controls. We investigate 46 pre-defined prostate cancer risk alleles and up to 24 epidemiological measures including demographic, lifestyle and environmental factors, for power to predict disease status and to drive on-going SAPCS recruitment, sampling procedures and research direction.
Preliminary results suggest that no previously defined risk alleles significantly predict prostate cancer occurrence within the SAPCS. Furthermore, genetic risk profiles did not enhance the predictive power of prostate specific antigen (PSA) testing. Our study supports several lifestyle/environmental factors contributing to prostate cancer risk including a family history of cancer, diabetes, current sexual activity and erectile dysfunction, balding pattern, frequent aspirin usage and high PSA levels.
Despite a clear increased prostate cancer risk associated with an African ancestry, experimental data is lacking within Africa. This pilot study is therefore a significant contribution to the field. While genetic risk factors (largely European-defined) show no evidence for disease prediction in the SAPCS, several epidemiological factors were associated with prostate cancer status. We call for improved study power by building on the SAPCS resource, further validation of associated factors in independent African-based resources, and genome-wide approaches to define African-specific risk alleles.
PMCID: PMC3882498  PMID: 24373635
Prostate cancer; African ancestry; Risk factors; Aggressive disease; Genetics; Epidemiology; Pilot analysis; Southern Africa
5.  Performance of High-Throughput Sequencing for the Discovery of Genetic Variation Across the Complete Size Spectrum 
G3: Genes|Genomes|Genetics  2013;4(1):63-65.
We observed that current high-throughput sequencing approaches only detected a fraction of the full size-spectrum of insertions, deletions, and copy number variants compared with a previously published, Sanger-sequenced human genome. The sensitivity for detection was the lowest in the 100- to 10,000-bp size range, and at DNA repeats, with copy number gains harder to delineate than losses. We discuss strategies for discovering the full spectrum of genetic variation necessary for disease association studies.
PMCID: PMC3887540  PMID: 24192839
copy number variation; insertion/deletion; high-throughput sequencing; genome variation annotation
6.  Clinical Presentation of Prostate Cancer in Black South Africans 
The Prostate  2014;74(8):880-891.
Compared with White Americans, Black American men are at a significant increased risk of presenting with prostate cancer (PCa) and associated mortality, suggesting a link to African-ancestry. However, PCa status within Africa is largely unknown. We address the clinical presentation of PCa within Black South African men.
Over 1,000 participants with or without PCa have enrolled in the Southern African Prostate Cancer Study (SAPCS). Using genome-wide profiling we establish a unique within Africa population substructure. Adjusting for age, clinical variables were assessed, compared against Black Americans and between rural and urban localities while addressing potential socio-demographic confounders.
We report a significant difference in the distribution of prostate specific antigen (PSA) levels skewed towards higher PSA levels in the PCa cases (83.0% present with a PSA ≥ 20 µg/L; median PSA = 98.8 µg/L) relative to men with no detectable PCa (18.5% present with a PSA ≥ 20 µg/L; median PSA = 9.1 µg/L). Compared with Black Americans, Black South Africans presented with significantly more aggressive disease defined by Gleason score >7 (17% and 36%, respectively) and PSA ≥ 20 µg/L (17.2% and 83.2%, respectively). We report exasperated disease aggression defined by Gleason score >7 (P = 0.0042) and poorly differentiated tumor grade (P < 0.0001) within rural versus urban localities.
Black South African men present with higher PSA levels and histopathological tumor grade compared with Black Americans, which is further escalated in men from rural localities. Our data suggests that lack of PSA testing may be contributing to an aggressive PCa disease phenotype within South African men.
PMCID: PMC4135056  PMID: 24723425
Prostate cancer; clinical presentation; African ancestry; Southern Africa; aggressive disease
7.  Complex Patterns of Genomic Admixture within Southern Africa 
PLoS Genetics  2013;9(3):e1003309.
Within-population genetic diversity is greatest within Africa, while between-population genetic diversity is directly proportional to geographic distance. The most divergent contemporary human populations include the click-speaking forager peoples of southern Africa, broadly defined as Khoesan. Both intra- (Bantu expansion) and inter-continental migration (European-driven colonization) have resulted in complex patterns of admixture between ancient geographically isolated Khoesan and more recently diverged populations. Using gender-specific analysis and almost 1 million autosomal markers, we determine the significance of estimated ancestral contributions that have shaped five contemporary southern African populations in a cohort of 103 individuals. Limited by lack of available data for homogenous Khoesan representation, we identify the Ju/'hoan (n = 19) as a distinct early diverging human lineage with little to no significant non-Khoesan contribution. In contrast to the Ju/'hoan, we identify ancient signatures of Khoesan and Bantu unions resulting in significant Khoesan- and Bantu-derived contributions to the Southern Bantu amaXhosa (n = 15) and Khoesan !Xun (n = 14), respectively. Our data further suggests that contemporary !Xun represent distinct Khoesan prehistories. Khoesan assimilation with European settlement at the most southern tip of Africa resulted in significant ancestral Khoesan contributions to the Coloured (n = 25) and Baster (n = 30) populations. The latter populations were further impacted by 170 years of East Indian slave trade and intra-continental migrations resulting in a complex pattern of genetic variation (admixture). The populations of southern Africa provide a unique opportunity to investigate the genomic variability from some of the oldest human lineages to the implications of complex admixture patterns including ancient and recently diverged human lineages.
Author Summary
The Khoesan have received recent attention, as they are the most genetically diverse contemporary human populations. However, Khoesan populations are poorly defined, while archeological evidence suggests a once broader dispersal of click-speaking southern African foragers. Migrations into the regions populated by contemporary Khoesan involved agro-pastoral Bantu around 1,500 years ago, followed over a millennium later by the arrival of European colonists establishing a halfway station for a maritime route between Europe and the East, which led to unions between diverse global populations. Using almost a million genetic markers for 103 individuals, we confirmed a significant Khoesan contribution to five southern African populations. The Ju/'hoan show genetic isolation (early divergence from all other modern humans), carry no significant non-Khoesan contributions, and unlike most global populations lack signatures of gene-based adaption to agriculture. The !Xun show two distinct Khoesan prehistories; while comparable to the female-derived Khoesan contribution to the amaXhosa Bantu, the male-derived Bantu contribution to the !Xun most likely represents cultural-driven gender-biased gene-flow. Emanating largely from male-derived European ancestral contributions, the Basters showed the highest maternal Khoesan contribution, while the Coloured showed the largest within population and regional-associated variability. The unique admixture fractions of the two latter populations reflect both early diverged and recently diverged human lineages.
PMCID: PMC3597481  PMID: 23516368
8.  A Novel SERPINA1 Mutation Causing Serum Alpha1-Antitrypsin Deficiency 
PLoS ONE  2012;7(12):e51762.
Mutations in the SERPINA1 gene can cause deficiency in the circulating serine protease inhibitor α1-Antitrypsin (α1AT). α1AT deficiency is the major contributor to pulmonary emphysema and liver disease in persons of European ancestry, with a prevalence of 1 in 2500 in the USA. We present the discovery and characterization of a novel SERPINA1 mutant from an asymptomatic Middle Eastern male with circulating α1AT deficiency. This 49 base pair deletion mutation (T379Δ), originally mistyped by IEF, causes a frame-shift replacement of the last sixteen α1AT residues and adds an extra twenty-four residues. Functional analysis showed that the mutant protein is not secreted and prone to intracellular aggregation.
PMCID: PMC3520848  PMID: 23251618
9.  Cyclin D1 splice variants: polymorphism, risk, and isoform specific regulation in prostate cancer 
Alternative CCND1 splicing results in cyclin D1b, which has specialized, pro-tumorigenic functions in prostate not shared by the cyclin D1a (full-length) isoform. Here, the frequency, tumor relevance, and mechanisms controlling cyclin D1b were challenged.
Experimental Design
First, relative expression of both cyclin D1 isoforms was determined in prostate adenocarcinomas. Second, relevance of the androgen axis was determined. Third, minigenes were created to interrogate the role of the G/A870 polymorphism (within the splice site), and findings validated in primary tissue. Fourth, impact of G/A870 on cancer risk was assessed in two large case-control studies.
Cyclin D1b is induced in tumors, and a significant subset expressed this isoform in the absence of detectable cyclin D1a. Accordingly, the isoforms showed non-correlated expression patterns, and hormone status did not alter splicing. While G/A870 was not independently predictive of cancer risk, A870 predisposed for transcript-b production in cells and in normal prostate. The influence of A870 on overall transcript-b levels was relieved in tumors, indicating that aberrations in tumorigenesis likely alter the influence of the polymorphism.
These studies reveal that cyclin D1b is specifically elevated in prostate tumorigenesis. Cyclin D1b expression patterns are distinct from that observed with cyclin D1a. The A870 allele predisposes for transcript-b production in a context-specific manner. While A870 does not independently predict cancer risk, tumor cells can bypass the influence of the polymorphism. These findings have major implications for the analyses of D-cyclin function in the prostate, and provide the foundation for future studies directed at identifying potential modifiers of the G/A870 polymorphism.
PMCID: PMC2849314  PMID: 19706803
Cyclin D1a; Cyclin D1b; Prostate Adenocarcinoma; rs603965 Polymorphism; CCND1 Minigene
10.  Inflammatory Genetic Markers of Prostate Cancer Risk 
Cancers  2010;2(2):1198-1220.
Prostate cancer is the most common cancer in Western society males, with incidence rates predicted to rise with global aging. Etiology of prostate cancer is however poorly understood, while current diagnostic tools can be invasive (digital rectal exam or biopsy) and/or lack specificity for the disease (prostate-specific antigen (PSA) testing). Substantial histological, epidemiological and molecular genetic evidence indicates that inflammation is important in prostate cancer pathogenesis. In this review, we summarize the current status of inflammatory genetic markers influencing susceptibility to prostate cancer. The focus will be on inflammatory cytokines regulating T-helper cell and chemokine homeostasis, together with the Toll-like receptors as key players in the host innate immune system. Although association studies indicating a genetic basis for prostate cancer are presently limited mainly due to lack of replication, larger and more ethnically and clinically defined study populations may help elucidate the true contribution of inflammatory gene variants to prostate cancer risk.
PMCID: PMC3835126  PMID: 24281113
prostate cancer; inflammation; Toll like receptor (TLR); cytokine; chemokine; gene variant; inherited susceptibility
11.  Calling SNPs without a reference sequence 
BMC Bioinformatics  2010;11:130.
The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have been reported. Much less has been reported on how to use these technologies to determine genetic differences among individuals of a species for which a reference sequence is not available, which drastically limits the number of species that can easily benefit from these new technologies.
We describe a computational pipeline, called DIAL (De novo Identification of Alleles), for identifying single-base substitutions between two closely related genomes without the help of a reference genome. The method works even when the depth of coverage is insufficient for de novo assembly, and it can be extended to determine small insertions/deletions. We evaluate the software's effectiveness using published Roche/454 sequence data from the genome of Dr. James Watson (to detect heterozygous positions) and recent Illumina data from orangutan, in each case comparing our results to those from computational analysis that uses a reference genome assembly. We also illustrate the use of DIAL to identify nucleotide differences among transcriptome sequences.
DIAL can be used for identification of nucleotide differences in species for which no reference sequence is available. Our main motivation is to use this tool to survey the genetic diversity of endangered species as the identified sequence differences can be used to design genotyping arrays to assist in the species' management. The DIAL source code is freely available at
PMCID: PMC2851604  PMID: 20230626
12.  The 4q27 locus and prostate cancer risk 
BMC Cancer  2010;10:69.
Chronic inflammation is considered to be implicated in the development of prostate cancer. In this study we are the first to investigate a potential association between variants in an autoimmune related region on chromosome 4q27 and prostate cancer risk. This region harbors two cytokine genes IL-2 and the recently described IL-21.
We genotyped six variants previously associated with autoimmune disease (namely rs13151961, rs13119723, rs17388568, rs3136534, rs6822844 and rs6840978) and one functional IL-2 promoter variant (rs2069762) for possible association with prostate cancer risk using the Australian Risk Factors for Prostate Cancer case-control Study.
Overall, our results do not support an association between the seven variants at position 4q27 and prostate cancer risk. Per allele odds ratios (ORs) were not significantly different from 1 (all P-values = 0.06). However, we found suggestive evidence for a significant association between the presence of the rs13119723 variant (located in a protein of unknown function) and men with a family history of prostate cancer in first-degree relatives (P-value for interaction 0.02). The per allele OR associated with this variant was significantly higher than 1 (2.37; 95% C.I. = 1.01-5.57).
We suggest that genetic variation within the chromosome 4q27 locus might be associated with prostate cancer susceptibility in men with a family history of the disease. Furthermore, our study alludes to a potential role of unknown protein KIAA1109 in conferring this risk.
PMCID: PMC2841665  PMID: 20184734
13.  Interpretation of custom designed Illumina genotype cluster plots for targeted association studies and next-generation sequence validation 
BMC Research Notes  2010;3:39.
High-throughput custom designed genotyping arrays are a valuable resource for biologically focused research studies and increasingly for validation of variation predicted by next-generation sequencing (NGS) technologies. We investigate the Illumina GoldenGate chemistry using custom designed VeraCode and sentrix array matrix (SAM) assays for each of these applications, respectively. We highlight applications for interpretation of Illumina generated genotype cluster plots to maximise data inclusion and reduce genotyping errors.
We illustrate the dramatic effect of outliers in genotype calling and data interpretation, as well as suggest simple means to avoid genotyping errors. Furthermore we present this platform as a successful method for two-cluster rare or non-autosomal variant calling. The success of high-throughput technologies to accurately call rare variants will become an essential feature for future association studies. Finally, we highlight additional advantages of the Illumina GoldenGate chemistry in generating unusually segregated cluster plots that identify potential NGS generated sequencing error resulting from minimal coverage.
We demonstrate the importance of visually inspecting genotype cluster plots generated by the Illumina software and issue warnings regarding commonly accepted quality control parameters. In addition to suggesting applications to minimise data exclusion, we propose that the Illumina cluster plots may be helpful in identifying potential in-put sequence errors, particularly important for studies to validate NGS generated variation.
PMCID: PMC2848685  PMID: 20175893
14.  Comprehensive Sequence Analysis of the Human IL23A Gene Defines New Variation Content and High Rate of Evolutionary Conservation 
A newly described heterodimeric cytokine, interleukin-23 (IL-23) is emerging as a key player in both the innate and the adaptive T helper (Th)17 driven immune response as well as an initiator of several autoimmune diseases. The rate-limiting element of IL-23 production is believed to be driven by expression of the unique p19 subunit encoded by IL23A. We set out to perform comprehensive DNA sequencing of this previously under-studied gene in 96 individuals from two evolutionary distinct human population groups, Southern African Bantu and European. We observed a total of 33 different DNA variants within these two groups, 22 (67%) of which are currently not reported in any available database. We further demonstrate both inter-population and intra-species sequence conservation within the coding and known regulatory regions of IL23A, supporting a critical physiological role for IL-23. We conclude that IL23A may have undergone positive selection pressure directed towards conservation, suggesting that functional genetic variants within IL23A will have a significant impact on the host immune response.
PMCID: PMC2853383  PMID: 20154336
IL-23; IL23A; novel variants; genetic conservation

