Until recently, Ebola virus (EBOV) was a rarely encountered human pathogen that caused disease among small populations with extraordinarily high lethality. At the end of 2013, EBOV initiated an unprecedented disease outbreak in West Africa that is still ongoing and has already caused thousands of deaths. Recent studies revealed the genomic changes this particular EBOV variant undergoes over time during human-to-human transmission. Here we highlight the genomic changes that might negatively impact the efficacy of currently available EBOV sequence-based candidate therapeutics, such as small interfering RNAs (siRNAs), phosphorodiamidate morpholino oligomers (PMOs), and antibodies. Ten of the observed mutations modify the sequence of the binding sites of monoclonal antibody (MAb) 13F6, MAb 1H3, MAb 6D8, MAb 13C6, and siRNA EK-1, VP24, and VP35 targets and might influence the binding efficacy of the sequence-based therapeutics, suggesting that their efficacy should be reevaluated against the currently circulating strain.
We have developed a robust RNA sequencing method for generating complete de novo assemblies with intra-host variant calls of Lassa and Ebola virus genomes in clinical and biological samples. Our method uses targeted RNase H-based digestion to remove contaminating poly(rA) carrier and ribosomal RNA. This depletion step improves both the quality of data and quantity of informative reads in unbiased total RNA sequencing libraries. We have also developed a hybrid-selection protocol to further enrich the viral content of sequencing libraries. These protocols have enabled rapid deep sequencing of both Lassa and Ebola virus and are broadly applicable to other viral genomics studies.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0519-7) contains supplementary material, which is available to authorized users.
In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: “Makona”, Middle Africa: “Lomela”) and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures.
Ebola; Ebola virus; ebolavirus; filovirid; Filoviridae; filovirus; genome annotation; Lomela; Lokolia; Makona; mononegavirad; Mononegavirales; mononegavirus; virus classification; virus isolate; virus nomenclature; virus strain; virus taxonomy; virus variant
Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.
Bundibugyo virus; cDNA clone; cuevavirus; Ebola; Ebola virus; ebolavirus; filovirid; Filoviridae; filovirus; genome annotation; ICTV; International Committee on Taxonomy of Viruses; Lloviu virus; Marburg virus; marburgvirus; mononegavirad; Mononegavirales; mononegavirus; Ravn virus; RefSeq; Reston virus; reverse genetics; Sudan virus; Taï Forest virus; virus classification; virus isolate; virus nomenclature; virus strain; virus taxonomy; virus variant
Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques.
Mycobacterium tuberculosis is successfully evolving antibiotic resistance, threatening attempts at tuberculosis epidemic control. Mechanisms of resistance, including the genetic changes favored by selection in resistant isolates, are incompletely understood. Using 116 newly and 7 previously sequenced M. tuberculosis genomes, we identified genomewide signatures of positive selection specific to the 47 resistant genomes. By searching for convergent evolution, the independent fixation of mutations at the same nucleotide site or gene, we recovered 100% of a set of known resistance markers. We also found evidence of positive selection in an additional 39 genomic regions in resistant isolates. These regions encode pathways of cell wall biosynthesis, transcriptional regulation and DNA repair. Mutations in these regions could directly confer resistance or compensate for fitness costs associated with resistance. Functional genetic analysis of mutations in one gene, ponA1, demonstrated an in vitro growth advantage in the presence of the drug rifampicin.
Lassa fever (LF), an often-fatal hemorrhagic disease caused by Lassa virus (LASV), is a major public health threat in West Africa. When the violent civil conflict in Sierra Leone (1991 to 2002) ended, an international consortium assisted in restoration of the LF program at Kenema Government Hospital (KGH) in an area with the world's highest incidence of the disease.
Clinical and laboratory records of patients presenting to the KGH Lassa Ward in the post-conflict period were organized electronically. Recombinant antigen-based LF immunoassays were used to assess LASV antigenemia and LASV-specific antibodies in patients who met criteria for suspected LF. KGH has been reestablished as a center for LF treatment and research, with over 500 suspected cases now presenting yearly. Higher case fatality rates (CFRs) in LF patients were observed compared to studies conducted prior to the civil conflict. Different criteria for defining LF stages and differences in sensitivity of assays likely account for these differences. The highest incidence of LF in Sierra Leone was observed during the dry season. LF cases were observed in ten of Sierra Leone's thirteen districts, with numerous cases from outside the traditional endemic zone. Deaths in patients presenting with LASV antigenemia were skewed towards individuals less than 29 years of age. Women self-reporting as pregnant were significantly overrepresented among LASV antigenemic patients. The CFR of ribavirin-treated patients presenting early in acute infection was lower than in untreated subjects.
Lassa fever remains a major public health threat in Sierra Leone. Outreach activities should expand because LF may be more widespread in Sierra Leone than previously recognized. Enhanced case finding to ensure rapid diagnosis and treatment is imperative to reduce mortality. Even with ribavirin treatment, there was a high rate of fatalities underscoring the need to develop more effective and/or supplemental treatments for LF.
Lassa fever (LF) is a major public health threat in West Africa. After the violent civil conflict in Sierra Leone (1991 to 2002) ended, the LF research program at Kenema Government Hospital (KGH) was reestablished. Higher CFRs in LF patients were observed compared to studies conducted prior to the civil conflict. The criteria used for defining the stages of LF and differences in sensitivity of the assays used likely account for these differences. LF may be more widespread in Sierra Leone than recognized previously. Peak presentation of LF cases occurs in the dry season, which is consistent with previous studies. Our studies also confirmed reports conducted prior to the civil conflict that indicate that infants, children, young adults, and pregnant women are disproportionately impacted by LF. High fatality rates were observed among both ribavirin treated and untreated patients, which underscores then need for better LF treatments.
An adaptive variant of the human Ectodysplasin receptor, EDARV370A, is one of the strongest candidates of recent positive selection from genome-wide scans. We have modeled EDAR370A in mice and characterized its phenotype and evolutionary origins in humans. Our computational analysis suggests the allele arose in Central China approximately 30,000 years ago. Although EDAR370A has been associated with increased scalp hair thickness and changed tooth morphology in humans, its direct biological significance and potential adaptive role remain unclear. We generated a knock-in mouse model and find that, as in humans, hair thickness is increased in EDAR370A mice. We identify novel biological targets affected by the mutation, including mammary and eccrine glands. Building on these results, we find that EDAR370A is associated with an increased number of active eccrine glands in the Han Chinese. This interdisciplinary approach yields unique insight into the generation of adaptive variation among modern humans.
While several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes Project (1000G) and the Composite of Multiple Signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes thirty-five high-scoring non-synonymous variants, fifty-nine variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate non-synonymous variant in TLR5, and show that it leads to altered NF-κB signaling in response to bacterial flagellin.
Malaria is a deadly disease that causes nearly one million deaths each year. To develop methods to control and eradicate malaria, it is important to understand the genetic basis of Plasmodium falciparum adaptations to antimalarial treatments and the human immune system while taking into account its demographic history. To study the demographic history and identify genes under selection more efficiently, we sequenced the complete genomes of 25 culture-adapted P. falciparum isolates from three sites in Senegal. We show that there is no significant population structure among these Senegal sampling sites. By fitting demographic models to the synonymous allele-frequency spectrum, we also estimated a major 60-fold population expansion of this parasite population ∼20,000–40,000 years ago. Using inferred demographic history as a null model for coalescent simulation, we identified candidate genes under selection, including genes identified before, such as pfcrt and PfAMA1, as well as new candidate genes. Interestingly, we also found selection against G/C to A/T changes that offsets the large mutational bias toward A/T, and two unusual patterns: similar synonymous and nonsynonymous allele-frequency spectra, and 18% of genes having a nonsynonymous-to-synonymous polymorphism ratio >1.
P. falciparum; population expansion; base composition; selection
Using parasite genotyping tools, we screened patients with mild uncomplicated malaria seeking treatment at a clinic in Thiès, Senegal, from 2006 to 2011. We identified a growing frequency of infections caused by genetically identical parasite strains, coincident with increased deployment of malaria control interventions and decreased malaria deaths. Parasite genotypes in some cases persisted clonally across dry seasons. The increase in frequency of genetically identical parasite strains corresponded with decrease in the probability of multiple infections. Further, these observations support evidence of both clonal and epidemic population structures. These data provide the first evidence of a temporal correlation between the appearance of identical parasite types and increased malaria control efforts in Africa, which here included distribution of insecticide treated nets (ITNs), use of rapid diagnostic tests (RDTs) for malaria detection, and deployment of artemisinin combination therapy (ACT). Our results imply that genetic surveillance can be used to evaluate the effectiveness of disease control strategies and assist a rational global malaria eradication campaign.
Despite efforts to reduce malaria morbidity and mortality, drug-resistant parasites continue to evade control strategies. Recently, emphasis has shifted away from control and toward regional elimination and global eradication of malaria. Such a campaign requires tools to monitor genetic changes in the parasite that could compromise the effectiveness of antimalarial drugs and undermine eradication programs. These tools must be fast, sensitive, unambiguous, and cost-effective to offer real-time reports of parasite drug susceptibility status across the globe. We have developed and validated a set of genotyping assays using high-resolution melting (HRM) analysis to detect molecular biomarkers associated with drug resistance across six genes in Plasmodium falciparum. We improved on existing technical approaches by developing refinements and extensions of HRM, including the use of blocked probes (LunaProbes) and the mutant allele amplification bias (MAAB) technique. To validate the sensitivity and accuracy of our assays, we compared our findings to sequencing results in both culture-adapted lines and clinical isolates from Senegal. We demonstrate that our assays (i) identify both known and novel polymorphisms, (ii) detect multiple genotypes indicative of mixed infections, and (iii) distinguish between variants when multiple copies of a locus are present. These rapid and inexpensive assays can track drug resistance and detect emerging mutations in targeted genetic loci in P. falciparum. They provide tools for monitoring molecular changes associated with changes in drug response across populations and for determining whether parasites present after drug treatment are the result of recrudescence or reinfection in clinical settings.
Lassa fever is a viral hemorrhagic fever endemic in West Africa. However, none of the hospitals in the endemic areas of Nigeria has the capacity to perform Lassa virus diagnostics. Case identification and management solely relies on non-specific clinical criteria. The Irrua Specialist Teaching Hospital (ISTH) in the central senatorial district of Edo State struggled with this challenge for many years.
A laboratory for molecular diagnosis of Lassa fever, complying with basic standards of diagnostic PCR facilities, was established at ISTH in 2008. During 2009 through 2010, samples of 1,650 suspected cases were processed, of which 198 (12%) tested positive by Lassa virus RT-PCR. No remarkable demographic differences were observed between PCR-positive and negative patients. The case fatality rate for Lassa fever was 31%. Nearly two thirds of confirmed cases attended the emergency departments of ISTH. The time window for therapeutic intervention was extremely short, as 50% of the fatal cases died within 2 days of hospitalization—often before ribavirin treatment could be commenced. Fatal Lassa fever cases were older (p = 0.005), had lower body temperature (p<0.0001), and had higher creatinine (p<0.0001) and blood urea levels (p<0.0001) than survivors. Lassa fever incidence in the hospital followed a seasonal pattern with a peak between November and March. Lassa virus sequences obtained from the patients originating from Edo State formed—within lineage II—a separate clade that could be further subdivided into three clusters.
Lassa fever case management was improved at a tertiary health institution in Nigeria through establishment of a laboratory for routine diagnostics of Lassa virus. Data collected in two years of operation demonstrate that Lassa fever is a serious public health problem in Edo State and reveal new insights into the disease in hospitalized patients.
In the past, diagnostic testing for Lassa fever patients in Nigeria has been performed nearly exclusively outside of the country. Patients thus were managed on-site based on clinical suspicion alone, posing risks to patients and health care workers and exhausting resources. To tackle this problem, we established a diagnostic PCR laboratory directly at a referral hospital serving a Lassa fever endemic area in Nigeria. Long-term collaboration between partners in the North and the South was crucial to implement this project. Training of laboratory staff in the partner institutions and on-site, mobilization of local human and financial resources, good management of the laboratory, a basic quality management and control system, and a stable supply chain for consumables and reagents were among the key factors for success. The laboratory reliably delivered results in a short turnaround time, despite some problems due to PCR contamination. The service has improved patient and contact management including treatment with ribavirin and led to better protection of health care workers against hospital-acquired infections. The data provide new insights into disease progression and a basis for further optimization of case management including supportive treatment.
Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced.
Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways.
Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
Whole-genome sequencing; Vibrio cholerae; Haitian cholera epidemic; Microbial evolution
Identifying interesting relationships between pairs of variables in large datasets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to datasets in global health, gene expression, major-league baseball, and the human gut microbiota, and identify known and novel relationships.
Rapidly evolving viruses and other pathogens can have an immense impact on human evolution as natural selection acts to increase the prevalence of genetic variants providing resistance to disease. With the emergence of large datasets of human genetic variation, we can search for signatures of natural selection in the human genome driven by such disease-causing microorganisms. Based on this approach, we have previously hypothesized that Lassa virus (LASV) may have been a driver of natural selection in West African populations where Lassa haemorrhagic fever is endemic. In this study, we provide further evidence for this notion. By applying tests for selection to genome-wide data from the International Haplotype Map Consortium and the 1000 Genomes Consortium, we demonstrate evidence for positive selection in LARGE and interleukin 21 (IL21), two genes implicated in LASV infectivity and immunity. We further localized the signals of selection, using the recently developed composite of multiple signals method, to introns and putative regulatory regions of those genes. Our results suggest that natural selection may have targeted variants giving rise to alternative splicing or differential gene expression of LARGE and IL21. Overall, our study supports the hypothesis that selective pressures imposed by LASV may have led to the emergence of particular alleles conferring resistance to Lassa fever, and opens up new avenues of research pursuit.
Lassa fever; natural selection; positive selection; genome-wide scans; LARGE; interleukin 21
Cerebral malaria, a severe form of Plasmodium falciparum infection, is an important cause of mortality in sub-Saharan African children. A Taqman 24 Single Nucleotide Polymorphisms (SNP) molecular barcode assay was developed for use in laboratory parasites which estimates genotype number and identifies the predominant genotype.
The 24 SNP assay was used to determine predominant genotypes in blood and tissues from autopsy and clinical patients with cerebral malaria.
Single genotypes were shared between the peripheral blood, the brain, and other tissues of cerebral malaria patients, while malaria-infected patients who died of non-malarial causes had mixed genetic signatures in tissues examined. Children with retinopathy-positive cerebral malaria had significantly less complex infections than those without retinopathy (OR = 3.7, 95% CI [1.51-9.10]).The complexity of infections significantly decreased over the malaria season in retinopathy-positive patients compared to retinopathy-negative patients.
Cerebral malaria patients harbour a single or small set of predominant parasites; patients with incidental parasitaemia sustain infections involving diverse genotypes. Limited diversity in the peripheral blood of cerebral malaria patients and correlation with tissues supports peripheral blood samples as appropriate for genome-wide association studies of parasite determinants of pathogenicity.
Plasmodium falciparum; Cerebral malaria; Genotyping; Molecular barcode; Histopathology; Autopsy
Lassa fever (LF) is a devastating viral disease prevalent in West Africa. Efforts to take on this public health crisis have been hindered by lack of infrastructure and rapid field deployable diagnosis in areas where the disease is prevalent. Recent capacity building at the Kenema Government Hospital Lassa Fever Ward (KGH LFW) in Sierra Leone has lead to a major turning point in the diagnosis, treatment and study of LF. Herein we present the first comprehensive rapid diagnosis and real time characterization of an acute hemorrhagic LF case at KGH LFW. This case report focuses on a third trimester pregnant Sierra Leonean woman from the historically non-endemic Northern district of Tonkolili who survived the illness despite fetal demise.
Employed in this study were newly developed recombinant LASV Antigen Rapid Test cassettes and dipstick lateral flow immunoassays (LFI) that enabled the diagnosis of LF within twenty minutes of sample collection. Deregulation of overall homeostasis, significant hepatic and renal system involvement, and immunity profiles were extensively characterized during the course of hospitalization. Rapid diagnosis, prompt treatment with a full course of intravenous (IV) ribavirin, IV fluids management, and real time monitoring of clinical parameters resulted in a positive maternal outcome despite admission to the LFW seven days post onset of symptoms, fetal demise, and a natural still birth delivery. These studies solidify the growing rapid diagnostic, treatment, and surveillance capabilities at the KGH LF Laboratory, and the potential to significantly improve the current high mortality rate caused by LF. As a result of the growing capacity, we were also able to isolate Lassa virus (LASV) RNA from the patient and perform Sanger sequencing where we found significant genetic divergence from commonly circulating Sierra Leonean strains, showing potential for the discovery of a newly emerged LASV strain with expanded geographic distribution. Furthermore, recent emergence of LF cases in Northern Sierra Leone highlights the need for superior diagnostics to aid in the monitoring of LASV strain divergence with potentially increased geographic expansion.
Mounting evidence suggests a major role for epigenetic feedback in Plasmodium falciparum transcriptional regulation. Long non-coding RNAs (lncRNAs) have recently emerged as a new paradigm in epigenetic remodeling. We therefore set out to investigate putative roles for lncRNAs in P. falciparum transcriptional regulation.
We used a high-resolution DNA tiling microarray to survey transcriptional activity across 22.6% of the P. falciparum strain 3D7 genome. We identified 872 protein-coding genes and 60 putative P. falciparum lncRNAs under developmental regulation during the parasite's pathogenic human blood stage. Further characterization of lncRNA candidates led to the discovery of an intriguing family of lncRNA telomere-associated repetitive element transcripts, termed lncRNA-TARE. We have quantified lncRNA-TARE expression at 15 distinct chromosome ends and mapped putative transcriptional start and termination sites of lncRNA-TARE loci. Remarkably, we observed coordinated and stage-specific expression of lncRNA-TARE on all chromosome ends tested, and two dominant transcripts of approximately 1.5 kb and 3.1 kb transcribed towards the telomere.
We have characterized a family of 22 telomere-associated lncRNAs in P. falciparum. Homologous lncRNA-TARE loci are coordinately expressed after parasite DNA replication, and are poised to play an important role in P. falciparum telomere maintenance, virulence gene regulation, and potentially other processes of parasite chromosome end biology. Further study of lncRNA-TARE and other promising lncRNA candidates may provide mechanistic insight into P. falciparum transcriptional regulation.
The Plasmodium falciparum parasite's ability to adapt to
environmental pressures, such as the human immune system and antimalarial drugs,
makes malaria an enduring burden to public health. Understanding the genetic
basis of these adaptations is critical to intervening successfully against
malaria. To that end, we created a high-density genotyping array that assays
over 17,000 single nucleotide polymorphisms (∼1 SNP/kb), and applied it to
57 culture-adapted parasites from three continents. We characterized genome-wide
genetic diversity within and between populations and identified numerous loci
with signals of natural selection, suggesting their role in recent adaptation.
In addition, we performed a genome-wide association study (GWAS), searching for
loci correlated with resistance to thirteen antimalarials; we detected both
known and novel resistance loci, including a new halofantrine resistance locus,
PF10_0355. Through functional testing we demonstrated that
PF10_0355 overexpression decreases sensitivity to
halofantrine, mefloquine, and lumefantrine, but not to structurally unrelated
antimalarials, and that increased gene copy number mediates resistance. Our GWAS
and follow-on functional validation demonstrate the potential of genome-wide
studies to elucidate functionally important loci in the malaria parasite
Malaria infection with the human pathogen Plasmodium falciparum
results in almost a million deaths each year, mostly in African children.
Efforts to eliminate malaria are underway, but the parasite is adept at eluding
both the human immune response and antimalarial treatments. Thus, it is
important to understand how the parasite becomes resistant to drugs and to
develop strategies to overcome resistance mechanisms. Toward this end, we used
population genetic strategies to identify genetic loci that contribute to
parasite adaptation and to identify candidate genes involved in drug resistance.
We examined over 17,000 genetic variants across the parasite genome in over 50
strains in which we also measured responses to many known antimalarial
compounds. We found a number of genetic loci showing signs of recent natural
selection and a number of loci potentially involved in modulating the
parasite's response to drugs. We further demonstrated that one of the novel
candidate genes (PF10_0355) modulates resistance to the
antimalarial compounds halofantrine, mefloquine, and lumefantrine. Overall, this
study confirms that we can use genome-wide approaches to identify clinically
relevant genes and demonstrates through functional testing that at least one of
these candidate genes is indeed involved in antimalarial drug resistance.
In human cells, DNA double-strand breaks are repaired primarily by the non-homologous end joining (NHEJ) pathway. Given their critical nature, we expected NHEJ proteins to be evolutionarily conserved, with relatively little sequence change over time. Here, we report that while critical domains of these proteins are conserved as expected, the sequence of NHEJ proteins has also been shaped by recurrent positive selection, leading to rapid sequence evolution in other protein domains. In order to characterize the molecular evolution of the human NHEJ pathway, we generated large simian primate sequence datasets for NHEJ genes. Codon-based models of gene evolution yielded statistical support for the recurrent positive selection of five NHEJ genes during primate evolution: XRCC4, NBS1, Artemis, POLλ, and CtIP. Analysis of human polymorphism data using the composite of multiple signals (CMS) test revealed that XRCC4 has also been subjected to positive selection in modern humans. Crystal structures are available for XRCC4, Nbs1, and Polλ; and residues under positive selection fall exclusively on the surfaces of these proteins. Despite the positive selection of such residues, biochemical experiments with variants of one positively selected site in Nbs1 confirm that functions necessary for DNA repair and checkpoint signaling have been conserved. However, many viruses interact with the proteins of the NHEJ pathway as part of their infectious lifecycle. We propose that an ongoing evolutionary arms race between viruses and NHEJ genes may be driving the surprisingly rapid evolution of these critical genes.
Because all cells experience DNA damage, they must also have mechanisms for repairing DNA. When the proteins that repair DNA malfunction, mutation and disease often result. Based on their fundamental importance, DNA repair proteins would be expected to be well preserved over evolutionary time in order to ensure optimal DNA repair function. However, a previous genome-wide study of molecular evolution in Saccharomyces yeast identified the non-homologous end joining (NHEJ) DNA repair pathway as one of the two most rapidly evolving pathways in the yeast genome. In order to analyze the evolution of this pathway in humans, we have generated large evolutionary sequence sets of NHEJ genes from our primate relatives. Similar to the scenario in yeast, several genes in this pathway are evolving rapidly in primate genomes and in modern human populations. Thus, complex and seemingly opposite selective forces are shaping the evolution of these important DNA repair genes. The finding that NHEJ genes are rapidly evolving in species groups as diverse as yeasts and primates indicates a systematic perturbation of the NHEJ pathway, one that is potentially important to human health.
With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2)1. We used ‘long-range haplotype’ methods, which were developed to identify alleles segregating in a population that have undergone recent selection2, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population: LARGE and DMD, both related to infection by the Lassa virus3, in West Africa; SLC24A5 and SLC45A2, both involved in skin pigmentation4,5, in Europe; and EDAR and EDA2R, both involved in development of hair follicles6, in Asia.
The prevalence of CD36 deficiency in East Asian and African populations suggests that the causal variants are under selection by severe malaria. Previous analysis of data from the International HapMap Project indicated that a CD36 haplotype bearing a nonsense mutation (T1264G; rs3211938) had undergone recent positive selection in the Yoruba of Nigeria. To investigate the global distribution of this putative selection event, we genotyped T1264G in 3420 individuals from 66 populations. We confirmed the high frequency of 1264G in the Yoruba (26%). However, the 1264G allele is less common in other African populations and absent from all non-African populations without recent African admixture. Using long-range linkage disequilibrium, we studied two West African groups in depth. Evidence for recent positive selection at the locus was demonstrable in the Yoruba, although not in Gambians. We screened 70 variants from across CD36 for an association with severe malaria phenotypes, employing a case–control study of 1350 subjects and a family study of 1288 parent–offspring trios. No marker was significantly associated with severe malaria. We focused on T1264G, genotyping 10 922 samples from four African populations. The nonsense allele was not associated with severe malaria (pooled allelic odds ratio 1.0; 95% confidence interval 0.89–1.12; P = 0.98). These results suggest a range of possible explanations including the existence of alternative selection pressures on CD36, co-evolution between host and parasite or confounding caused by allelic heterogeneity of CD36 deficiency.