Search tips
Search criteria

Results 1-22 (22)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Genomic architecture of sickle cell disease in West African children 
Sickle cell disease (SCD) is a congenital blood disease, affecting predominantly children from sub-Saharan Africa, but also populations world-wide. Although the causal mutation of SCD is known, the sources of clinical variability of SCD remain poorly understood, with only a few highly heritable traits associated with SCD having been identified. Phenotypic heterogeneity in the clinical expression of SCD is problematic for follow-up (FU), management, and treatment of patients. Here we used the joint analysis of gene expression and whole genome genotyping data to identify the genetic regulatory effects contributing to gene expression variation among groups of patients exhibiting clinical variability, as well as unaffected siblings, in Benin, West Africa. We characterized and replicated patterns of whole blood gene expression variation within and between SCD patients at entry to clinic, as well as in follow-up programs. We present a global map of genes involved in the disease through analysis of whole blood sampled from the cohort. Genome-wide association mapping of gene expression revealed 390 peak genome-wide significant expression SNPs (eSNPs) and 6 significant eSNP-by-clinical status interaction effects. The strong modulation of the transcriptome implicates pathways affecting core circulating cell functions and shows how genotypic regulatory variation likely contributes to the clinical variation observed in SCD.
PMCID: PMC3924578  PMID: 24592274
sickle cell disease; genomics; transcriptome; eSNP mapping; gene-by-environment interactions
2.  Whole-Exome Sequencing Reveals a Rapid Change in the Frequency of Rare Functional Variants in a Founding Population of Humans 
PLoS Genetics  2013;9(9):e1003815.
Whole-exome or gene targeted resequencing in hundreds to thousands of individuals has shown that the majority of genetic variants are at low frequency in human populations. Rare variants are enriched for functional mutations and are expected to explain an important fraction of the genetic etiology of human disease, therefore having a potential medical interest. In this work, we analyze the whole-exome sequences of French-Canadian individuals, a founder population with a unique demographic history that includes an original population bottleneck less than 20 generations ago, followed by a demographic explosion, and the whole exomes of French individuals sampled from France. We show that in less than 20 generations of genetic isolation from the French population, the genetic pool of French-Canadians shows reduced levels of diversity, higher homozygosity, and an excess of rare variants with low variant sharing with Europeans. Furthermore, the French-Canadian population contains a larger proportion of putatively damaging functional variants, which could partially explain the increased incidence of genetic disease in the province. Our results highlight the impact of population demography on genetic fitness and the contribution of rare variants to the human genetic variation landscape, emphasizing the need for deep cataloguing of genetic variants by resequencing worldwide human populations in order to truly assess disease risk.
Author Summary
Recent resequencing of the whole genome or the coding part of the genome (the exome) in thousands of individuals has described a large excess of low frequency variants in humans, probably arising as a consequence of recent rapid growth in human population sizes. Most rare variants are private to specific populations and are enriched for functional mutations, thus potentially having some medical relevance. In this study, we analyze whole-exome sequences from over a hundred individuals from the French-Canadian population, which was founded less than 400 years ago by about 8,500 French settlers who colonized the province between the 17th and 18th centuries. We show that in a remarkably short period of time this population has accumulated substantial differences, including an excess of rare, functional and potentially damaging variants, when compared to the original European population. Our results show the effects of population history on genetic variation that may have an impact on genetic fitness and disease, and have implications in the design of genetic studies, highlighting the importance of extending deep resequencing to worldwide human populations.
PMCID: PMC3784517  PMID: 24086152
3.  A Family-Based Probabilistic Method for Capturing De Novo Mutations from High-Throughput Short-Read Sequencing Data 
Statistical applications in genetics and molecular biology  2012;11(2):10.2202/1544-6115.1713 /j/sagmb.2012.11.issue-2/1544-6115.1713/1544-6115.1713.xml.
Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.
PMCID: PMC3728889  PMID: 22499693
de novo mutations; pedigree; short-read data; mutation rates; trio model
4.  Selective constraint, background selection, and mutation accumulation variability within and between human populations 
BMC Genomics  2013;14:495.
Regions of the genome that are under evolutionary constraint across multiple species have previously been used to identify functional sequences in the human genome. Furthermore, it is known that there is an inverse relationship between evolutionary constraint and the allele frequency of a mutation segregating in human populations, implying a direct relationship between interspecies divergence and fitness in humans. Here we utilise this relationship to test differences in the accumulation of putatively deleterious mutations both between populations and on the individual level.
Using whole genome and exome sequencing data from Phase 1 of the 1000 Genome Project for 1,092 individuals from 14 worldwide populations we show that minor allele frequency (MAF) varies as a function of constraint around both coding regions and non-coding sites genome-wide, implying that negative, rather than positive, selection primarily drives the distribution of alleles among individuals via background selection. We find a strong relationship between effective population size and the depth of depression in MAF around the most conserved genes, suggesting that populations with smaller effective size are carrying more deleterious mutations, which also translates into higher genetic load when considering the number of putatively deleterious alleles segregating within each population. Finally, given the extreme richness of the data, we are now able to classify individual genomes by the accumulation of mutations at functional sites using high coverage 1000 Genomes data. Using this approach we detect differences between ‘healthy’ individuals within populations for the distributions of putatively deleterious rare alleles they are carrying.
These findings demonstrate the extent of background selection in the human genome and highlight the role of population history in shaping patterns of diversity between human individuals. Furthermore, we provide a framework for the utility of personal genomic data for the study of genetic fitness and diseases.
PMCID: PMC3727949  PMID: 23875710
5.  Exploiting Gene Expression Variation to Capture Gene-Environment Interactions for Disease 
Frontiers in Genetics  2013;3:228.
Gene-environment interactions have long been recognized as a fundamental concept in evolutionary, quantitative, and medical genetics. In the genomics era, study of how environment and genome interact to shape gene expression variation is relevant to understanding the genetic architecture of complex phenotypes. While genetic analysis of gene expression variation focused on main effects, little is known about the extent of interaction effects implicating regulatory variants and their consequences on transcriptional variation. Here we survey the current state of the concept of transcriptional gene-environment interactions and discuss its utility for mapping disease phenotypes in light of the insights gained from genome-wide association studies of gene expression.
PMCID: PMC3668192  PMID: 23755064
eQTL; eSNP; gene-environment interactions; transcriptome
6.  Hypervariable antigen genes in malaria have ancient roots 
The var genes of the human malaria parasite Plasmodium falciparum are highly polymorphic loci coding for the erythrocyte membrane proteins 1 (PfEMP1), which are responsible for the cytoaherence of P. falciparum infected red blood cells to the human vasculature. Cytoadhesion, coupled with differential expression of var genes, contributes to virulence and allows the parasite to establish chronic infections by evading detection from the host’s immune system. Although studying genetic diversity is a major focus of recent work on the var genes, little is known about the gene family's origin and evolutionary history.
Using a novel hidden Markov model-based approach and var sequences assembled from additional isolates and species, we are able to reveal elements of both the early evolution of the var genes as well as recent diversifying events. We compare sequences of the var gene DBLα domains from divergent isolates of P. falciparum (3D7 and HB3), and a closely-related species, Plasmodium reichenowi. We find that the gene family is equally large in P. reichenowi and P. falciparum -- with a minimum of 51 var genes in the P. reichenowi genome (compared to 61 in 3D7 and a minimum of 48 in HB3). In addition, we are able to define large, continuous blocks of homologous sequence among P. falciparum and P. reichenowi var gene DBLα domains. These results reveal that the contemporary structure of the var gene family was present before the divergence of P. falciparum and P. reichenowi, estimated to be between 2.5 to 6 million years ago. We also reveal that recombination has played an important and traceable role in both the establishment, and the maintenance, of diversity in the sequences.
Despite the remarkable diversity and rapid evolution found in these loci within and among P. falciparum populations, the basic structure of these domains and the gene family is surprisingly old and stable. Revealing a common structure as well as conserved sequence among two species also has implications for developing new primate-parasite models for studying the pathology and immunology of falciparum malaria, and for studying the population genetics of var genes and associated virulence phenotypes.
PMCID: PMC3680017  PMID: 23725540
Non-allelic homologous recombination; Hidden Markov-model; var genes; Malaria; PfEMP1; Gene family evolution; Balancing selection
8.  Harnessing genomics to identify environmental determinants of heritable disease 
Mutation research  2012;752(1):6-9.
Next-generation sequencing technologies can now be used to directly measure heritable de novo DNA sequence mutations in humans. However, these techniques have not been used to examine environmental factors that induce such mutations and their associated diseases. To address this issue, a working group on environmentally induced germline mutation analysis (ENIGMA) met in October 2011 to propose the necessary foundational studies, which include sequencing of parent–offspring trios from highly exposed human populations, and controlled dose–response experiments in animals. These studies will establish background levels of variability in germline mutation rates and identify environmental agents that influence these rates and heritable disease. Guidance for the types of exposures to examine come from rodent studies that have identified agents such as cancer chemotherapeutic drugs, ionizing radiation, cigarette smoke, and air pollution as germ-cell mutagens. Research is urgently needed to establish the health consequences of parental exposures on subsequent generations.
PMCID: PMC3556182  PMID: 22935230
Germ cell; Heritable mutation; Next generation sequencing; Copy number variants
9.  Variation in genome-wide mutation rates within and between human families 
Nature genetics  2011;43(7):712-714.
J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female 1. Diverse studies have supported Haldane’s contention of a higher average mutation rate in the male germline in a variety of mammals, including humans (e.g. 2,3). Here we present the first direct comparative analysis of male and female germline mutation rates from complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell-lines from which DNA was derived. Most strikingly, in one family we observed that 92% of germline DNMs were from the paternal germline, while, in complete contrast, in the other family 64% of DNMs were from the maternal germline. These observations reveal considerable variation in mutation rates within and between families.
PMCID: PMC3322360  PMID: 21666693
10.  Low-Complexity Regions in Plasmodium falciparum: Missing Links in the Evolution of an Extreme Genome 
Molecular Biology and Evolution  2010;27(9):2198-2209.
Over the past decade, attempts to explain the unusual size and prevalence of low-complexity regions (LCRs) in the proteins of the human malaria parasite Plasmodium falciparum have used both neutral and adaptive models. This past research has offered conflicting explanations for LCR characteristics and their role in, and influence on, the evolution of genome structure. Here we show that P. falciparum LCRs (PfLCRs) are not a single phenomenon, but rather consist of at least three distinct types of sequence, and this heterogeneity is the source of the conflict in the literature. Using molecular and population genetics, we show that these families of PfLCRs are evolving by different mechanisms. One of these families, named here the HighGC family, is of particular interest because these LCRs act as recombination hotspots, both in genes under positive selection for high levels of diversity which can be created by recombination (antigens) and those likely to be evolving neutrally or under negative selection (metabolic enzymes). We discuss how the discovery of these distinct species of PfLCRs helps to resolve previous contradictory studies on LCRs in malaria and contributes to our understanding of the evolution of the of the parasite's unusual genome.
PMCID: PMC2922621  PMID: 20427419
Plasmodium falciparum; low-complexity regions; repeat sequences; genome evolution; recombination
11.  Age-Dependent Recombination Rates in Human Pedigrees 
PLoS Genetics  2011;7(9):e1002251.
In humans, chromosome-number abnormalities have been associated with altered recombination and increased maternal age. Therefore, age-related effects on recombination are of major importance, especially in relation to the mechanisms involved in human trisomies. Here, we examine the relationship between maternal age and recombination rate in humans. We localized crossovers at high resolution by using over 600,000 markers genotyped in a panel of 69 French-Canadian pedigrees, revealing recombination events in 195 maternal meioses. Overall, we observed the general patterns of variation in fine-scale recombination rates previously reported in humans. However, we make the first observation of a significant decrease in recombination rates with advancing maternal age in humans, likely driven by chromosome-specific effects. The effect appears to be localized in the middle section of chromosomal arms and near subtelomeric regions. We postulate that, for some chromosomes, protection against non-disjunction provided by recombination becomes less efficient with advancing maternal age, which can be partly responsible for the higher rates of aneuploidy in older women. We propose a model that reconciles our findings with reported associations between maternal age and recombination in cases of trisomies.
Author Summary
Aging is a genetically and environmentally modulated process. One particular manifestation of aging in humans is the age-related changes that affect the female reproductive system. It is well established that chromosome-number abnormalities in offspring occur more frequently as maternal age advances, but the meiotic mechanisms involved remain unclear. Meiotic recombination has been associated with maternal age in different species but contrasting effects of maternal age on recombination rates have been reported among mammals. In this study, we found a decrease of recombination rates with increasing maternal age in a French-Canadian cohort, with the most pronounced decline possibly occurring before 32 years of age. We observed chromosome-specific age effects, and in older women recombination frequencies are notably reduced in the middle portion of chromosomal arms and near subtelomeric regions. No paternal age effect on recombination was found, highlighting differences in patterns of variation among sexes. Many studies have shown significant inter-individual variation in genome-wide recombination rates, and our results points to an additional, intra-individual source of variation in recombination rates among transmissions from the same mother.
PMCID: PMC3164683  PMID: 21912527
12.  Genetic adaptation of the antibacterial human innate immunity network 
Pathogens have represented an important selective force during the adaptation of modern human populations to changing social and other environmental conditions. The evolution of the immune system has therefore been influenced by these pressures. Genomic scans have revealed that immune system is one of the functions enriched with genes under adaptive selection.
Here, we describe how the innate immune system has responded to these challenges, through the analysis of resequencing data for 132 innate immunity genes in two human populations. Results are interpreted in the context of the functional and interaction networks defined by these genes. Nucleotide diversity is lower in the adaptors and modulators functional classes, and is negatively correlated with the centrality of the proteins within the interaction network. We also produced a list of candidate genes under positive or balancing selection in each population detected by neutrality tests and showed that some functional classes are preferential targets for selection.
We found evidence that the role of each gene in the network conditions the capacity to evolve or their evolvability: genes at the core of the network are more constrained, while adaptation mostly occurred at particular positions at the network edges. Interestingly, the functional classes containing most of the genes with signatures of balancing selection are involved in autoinflammatory and autoimmune diseases, suggesting a counterbalance between the beneficial and deleterious effects of the immune response.
PMCID: PMC3155920  PMID: 21745391
13.  High recombination rates and hotspots in a Plasmodium falciparum genetic cross 
Genome Biology  2011;12(4):R33.
The human malaria parasite Plasmodium falciparum survives pressures from the host immune system and antimalarial drugs by modifying its genome. Genetic recombination and nucleotide substitution are the two major mechanisms that the parasite employs to generate genome diversity. A better understanding of these mechanisms may provide important information for studying parasite evolution, immune evasion and drug resistance.
Here, we used a high-density tiling array to estimate the genetic recombination rate among 32 progeny of a P. falciparum genetic cross (7G8 × GB4). We detected 638 recombination events and constructed a high-resolution genetic map. Comparing genetic and physical maps, we obtained an overall recombination rate of 9.6 kb per centimorgan and identified 54 candidate recombination hotspots. Similar to centromeres in other organisms, the sequences of P. falciparum centromeres are found in chromosome regions largely devoid of recombination activity. Motifs enriched in hotspots were also identified, including a 12-bp G/C-rich motif with 3-bp periodicity that may interact with a protein containing 11 predicted zinc finger arrays.
These results show that the P. falciparum genome has a high recombination rate, although it also follows the overall rule of meiosis in eukaryotes with an average of approximately one crossover per chromosome per meiosis. GC-rich repetitive motifs identified in the hotspot sequences may play a role in the high recombination rate observed. The lack of recombination activity in centromeric regions is consistent with the observations of reduced recombination near the centromeres of other organisms.
PMCID: PMC3218859  PMID: 21463505
14.  Similarity in Recombination Rate Estimates Highly Correlates with Genetic Differentiation in Humans 
PLoS ONE  2011;6(3):e17913.
Recombination varies greatly among species, as illustrated by the poor conservation of the recombination landscape between humans and chimpanzees. Thus, shorter evolutionary time frames are needed to understand the evolution of recombination. Here, we analyze its recent evolution in humans. We calculated the recombination rates between adjacent pairs of 636,933 common single-nucleotide polymorphism loci in 28 worldwide human populations and analyzed them in relation to genetic distances between populations. We found a strong and highly significant correlation between similarity in the recombination rates corrected for effective population size and genetic differentiation between populations. This correlation is observed at the genome-wide level, but also for each chromosome and when genetic distances and recombination similarities are calculated independently from different parts of the genome. Moreover, and more relevant, this relationship is robustly maintained when considering presence/absence of recombination hotspots. Simulations show that this correlation cannot be explained by biases in the inference of recombination rates caused by haplotype sharing among similar populations. This result indicates a rapid pace of evolution of recombination, within the time span of differentiation of modern humans.
PMCID: PMC3065460  PMID: 21464928
15.  A Population Genetic Approach to Mapping Neurological Disorder Genes Using Deep Resequencing 
PLoS Genetics  2011;7(2):e1001318.
Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders. Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia. Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts. In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone. Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls. This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies. Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.
Author Summary
It is widely accepted that genetic factors play important roles in the etiology of neurological diseases. However, the nature of the underlying genetic variation remains unclear. Critical questions in the field of human genetics relate to the frequency and size effects of genetic variants associated with disease. For instance, the common disease–common variant model is based on the idea that sets of common variants explain a significant fraction of the variance found in common disease phenotypes. On the other hand, rare variants may have strong effects and therefore largely contribute to disease phenotypes. Due to their high penetrance and reduced fitness, such variants are maintained in the population at low frequencies, thus limiting their detection in genome-wide association studies. Here, we use a resequencing approach on a cohort of 285 Autism Spectrum Disorder and Schizophrenia patients and preformed several analyses, enhanced with population genetic approaches, to identify variants associated with both diseases. Our results demonstrate an excess of rare variants in these disease cohorts and identify genes with negative (deleterious) selection coefficients, suggesting an accumulation of variants of detrimental effects. Our results present further evidence for rare variants explaining a component of the genetic etiology of autism and schizophrenia.
PMCID: PMC3044677  PMID: 21383861
16.  Plasmodium falciparum Genetic Diversity Maintained and Amplified Over 5 Years of a Low Transmission Endemic in the Peruvian Amazon 
Molecular Biology and Evolution  2010;28(7):1973-1986.
Plasmodium falciparum entered into the Peruvian Amazon in 1994, sparking an epidemic between 1995 and 1998. Since 2000, there has been sustained low P. falciparum transmission. The Malaria Immunology and Genetics in the Amazon project has longitudinally followed members of the community of Zungarococha (N = 1,945, 4 villages) with active household and health center-based visits each year since 2003. We examined parasite population structure and traced the parasite genetic diversity temporally and spatially. We genotyped infections over 5 years (2003–2007) using 14 microsatellite (MS) markers scattered across ten different chromosomes. Despite low transmission, there was considerable genetic diversity, which we compared with other geographic regions. We detected 182 different haplotypes from 302 parasites in 217 infections. Structure v2.2 identified five clusters (subpopulations) of phylogenetically related clones. To consider genetic diversity on a more detailed level, we defined haplotype families (hapfams) by grouping haplotypes with three or less loci differences. We identified 34 different hapfams identified. The Fst statistic and heterozygosity analysis showed the five clusters were maintained in each village throughout this time. A minimum spanning network (MSN), stratified by the year of detection, showed that haplotypes within hapfams had allele differences and haplotypes within a cluster definition were more separated in the later years (2006–2007). We modeled hapfam detection and loss, accounting for sample size and stochastic fluctuations in frequencies overtime. Principle component analysis of genetic variation revealed patterns of genetic structure with time rather than village. The population structure, genetic diversity, appearance/disappearance of the different haplotypes from 2003 to 2007 provides a genome-wide “real-time” perspective of P. falciparum parasites in a low transmission region.
PMCID: PMC3112368  PMID: 21109587
malaria; genetic diversity; immunity; low transmission; Peru; microsatellite
17.  Plasmodium falciparum genome-wide scans for positive selection, recombination hot spots and resistance to antimalarial drugs 
Nature genetics  2010;42(3):268-271.
Antimalarial drugs impose strong pressure on Plasmodium falciparum parasites and leave signatures of selection in the parasite genome 1,2. Search for signals of selection may lead to genes encoding drug or immune targets 3. The lack of high-throughput genotyping methods, inadequate knowledge of parasite population history, and time-consuming adaptations of parasites to in vitro culture have hampered genome-wide association studies (GWAS) of parasite traits. Here we report genotyping of DNA from 189 culture-adapted P. falciparum parasites using a custom-built array with thousands of single nucleotide polymorphisms (SNPs). Population structure, variation in recombination rate, and loci under recent positive selection were detected. Parasite half maximum inhibitory concentrations (IC50) to seven antimalarial drugs were obtained and used in GWAS to identify genes associated with drug responses. The SNP array and genome-wide parameters provide valuable tools and information for new advances in P. falciparum genetics.
PMCID: PMC2828519  PMID: 20101240
malaria; single nucleotide polymorphism (SNP); genome-wide association study; recombination; drug resistance; population structure
18.  Selection shapes malaria genomes and drives divergence between pathogens infecting hominids versus rodents 
Malaria kills more people worldwide than all inherited human genetic disorders combined. To characterize how the parasites causing this disease adapt to different host environments, we compared the evolutionary genomics of two distinct groups of malaria pathogens in order to identify critical properties associated with infection of different hosts: those parasites infecting hominids (Plasmodium falciparum and P. reichenowi) versus parasites infecting rodent hosts (P. yoelii yoelii, P. berghei, and P. chabaudi). Adaptation by the parasite to its host is likely highly critical to the evolution of these species.
Our comparative analysis suggests that patterns of molecular evolution in the hominid parasite lineage are generally similar to those of the rodent lineage but distinct in several aspects. The most rapidly evolving genes in both lineages are those involved in host-parasite interactions as well as those that show the lowest expression levels. However, we found that, similar to their respective mammal host lineages, parasite genomes infecting hominids are generally less constrained, evolving at faster rates, and accumulating more deleterious mutations than those infecting murids, which may reflect an historical lower effective size of the hominid lineage and relaxed host-driven selective pressures.
Our study highlights for the first time the differences in trends and rates of evolution in Plasmodium lineages infecting different hosts and emphasizes the potential importance of the variation in effective size between lineages to explain variation in selective constraints among genomes.
PMCID: PMC2529309  PMID: 18667061
19.  A murine specific expansion of the Rhox cluster involved in embryonic stem cell biology is under natural selection 
BMC Genomics  2006;7:212.
The rodent specific reproductive homeobox (Rhox) gene cluster on the X chromosome has been reported to contain twelve homeobox-containing genes, Rhox1-12.
We have identified a 40 kb genomic region within the Rhox cluster that is duplicated eight times in tandem resulting in the presence of eight paralogues of Rhox2 and Rhox3 and seven paralogues of Rhox4. Transcripts have been identified for the majority of these paralogues and all but three are predicted to produce full-length proteins with functional potential. We predict that there are a total of thirty-two Rhox genes at this genomic location, making it the most gene-rich homoeobox cluster identified in any species. From the 95% sequence similarity between the eight duplicated genomic regions and the synonymous substitution rate of the Rhox2, 3 and 4 paralogues we predict that the duplications occurred after divergence of mouse and rat and represent the youngest homoeobox cluster identified to date. Molecular evolutionary analysis reveals that this cluster is an actively evolving region with Rhox2 and 4 paralogues under diversifying selection and Rhox3 evolving neutrally. The biological importance of this duplication is emphasised by the identification of an important role for Rhox2 and Rhox4 in regulating the initial stages of embryonic stem (ES) cell differentiation.
The gene rich Rhox cluster provides the mouse with significant biological novelty that we predict could provide a substrate for speciation. Moreover, this unique cluster may explain species differences in ES cell derivation and maintenance between mouse, rat and human.
PMCID: PMC1562416  PMID: 16916441
20.  Functional Divergence Caused by Ancient Positive Selection of a Drosophila Hybrid Incompatibility Locus 
PLoS Biology  2004;2(6):e142.
Interspecific hybrid lethality and sterility are a consequence of divergent evolution between species and serve to maintain the discrete identities of species. The evolution of hybrid incompatibilities has been described in widely accepted models by Dobzhansky and Muller where lineage-specific functional divergence is the essential characteristic of hybrid incompatibility genes. Experimentally tractable models are required to identify and test candidate hybrid incompatibility genes. Several Drosophila melanogaster genes involved in hybrid incompatibility have been identified but none has yet been shown to have functionally diverged in accordance with the Dobzhansky-Muller model. By introducing transgenic copies of the X-linked Hybrid male rescue (Hmr) gene into D. melanogaster from its sibling species D. simulans and D. mauritiana, we demonstrate that Hmr has functionally diverged to cause F1 hybrid incompatibility between these species. Consistent with the Dobzhansky-Muller model, we find that Hmr has diverged extensively in the D. melanogaster lineage, but we also find extensive divergence in the sibling-species lineage. Together, these findings implicate over 13% of the amino acids encoded by Hmr as candidates for causing hybrid incompatibility. The exceptional level of divergence at Hmr cannot be explained by neutral processes because we use phylogenetic methods and population genetic analyses to show that the elevated amino-acid divergence in both lineages is due to positive selection in the distant past—at least one million generations ago. Our findings suggest that multiple substitutions driven by natural selection may be a general phenomenon required to generate hybrid incompatibility alleles.
Transgenic experiments show that the HMR gene has functionally diverged in Drosophila melanogaster and its sibling species and causes the death of hybrid offspring in interspecific crosses
PMCID: PMC423131  PMID: 15208709
21.  Recombination Hotspots and Population Structure in Plasmodium falciparum 
PLoS Biology  2005;3(10):e335.
Understanding the influences of population structure, selection, and recombination on polymorphism and linkage disequilibrium (LD) is integral to mapping genes contributing to drug resistance or virulence in Plasmodium falciparum. The parasite's short generation time, coupled with a high cross-over rate, can cause rapid LD break-down. However, observations of low genetic variation have led to suggestions of effective clonality: selfing, population admixture, and selection may preserve LD in populations. Indeed, extensive LD surrounding drug-resistant genes has been observed, indicating that recombination and selection play important roles in shaping recent parasite genome evolution. These studies, however, provide only limited information about haplotype variation at local scales. Here we describe the first (to our knowledge) chromosome-wide SNP haplotype and population recombination maps for a global collection of malaria parasites, including the 3D7 isolate, whose genome has been sequenced previously. The parasites are clustered according to continental origin, but alternative groupings were obtained using SNPs at 37 putative transporter genes that are potentially under selection. Geographic isolation and highly variable multiple infection rates are the major factors affecting haplotype structure. Variation in effective recombination rates is high, both among populations and along the chromosome, with recombination hotspots conserved among populations at chromosome ends. This study supports the feasibility of genome-wide association studies in some parasite populations.
PMCID: PMC1201364  PMID: 16144426
22.  Exome sequencing identifies mutations in the gene TTC7A in French-Canadian cases with hereditary multiple intestinal atresia 
Journal of Medical Genetics  2013;50(5):324-329.
Congenital multiple intestinal atresia (MIA) is a severe, fatal neonatal disorder, involving the occurrence of obstructions in the small and large intestines ultimately leading to organ failure. Surgical interventions are palliative but do not provide long-term survival. Severe immunodeficiency may be associated with the phenotype. A genetic basis for MIA is likely. We had previously ascertained a cohort of patients of French-Canadian origin, most of whom were deceased as infants or in utero. The goal of the study was to identify the molecular basis for the disease in the patients of this cohort.
We performed whole exome sequencing on samples from five patients of four families. Validation of mutations and familial segregation was performed using standard Sanger sequencing in these and three additional families with deceased cases. Exon skipping was assessed by reverse transcription-PCR and Sanger sequencing.
Five patients from four different families were each homozygous for a four base intronic deletion in the gene TTC7A, immediately adjacent to a consensus GT splice donor site. The deletion was demonstrated to have deleterious effects on splicing causing the skipping of the attendant upstream coding exon, thereby leading to a predicted severe protein truncation. Parents were heterozygous carriers of the deletion in these families and in two additional families segregating affected cases. In a seventh family, an affected case was compound heterozygous for the same 4bp deletion and a second missense mutation p.L823P, also predicted as pathogenic. No other sequenced genes possessed deleterious variants explanatory for all patients in the cohort. Neither mutation was seen in a large set of control chromosomes.
Based on our genetic results, TTC7A is the likely causal gene for MIA.
PMCID: PMC3625823  PMID: 23423984
Gastroenterology; Genetics; Developmental; Molecular genetics

Results 1-22 (22)