Of 7028 disorders with suspected Mendelian inheritance, 1139 are recessive and have an established molecular basis. Although individually uncommon, Mendelian diseases collectively account for ~20% of infant mortality and ~10% of pediatric hospitalizations. Preconception screening, together with genetic counseling of carriers, has resulted in remarkable declines in the incidence of several severe recessive diseases including Tay-Sachs disease and cystic fibrosis. However, extension of preconception screening to most severe disease genes has hitherto been impractical. Here, we report a preconception carrier screen for 448 severe recessive childhood diseases. Rather than costly, complete sequencing of the human genome, 7717 regions from 437 target genes were enriched by hybrid capture or microdroplet polymerase chain reaction, sequenced by next-generation sequencing (NGS) to a depth of up to 2.7 gigabases, and assessed with stringent bioinformatic filters. At a resultant 160× average target coverage, 93% of nucleotides had at least 20× coverage, and mutation detection/genotyping had ~95% sensitivity and ~100% specificity for substitution, insertion/deletion, splicing, and gross deletion mutations and single-nucleotide polymorphisms. In 104 unrelated DNA samples, the average genomic carrier burden for severe pediatric recessive mutations was 2.8 and ranged from 0 to 7. The distribution of mutations among sequenced samples appeared random. Twenty-seven percent of mutations cited in the literature were found to be common polymorphisms or misannotated, underscoring the need for better mutation databases as part of a comprehensive carrier testing strategy. Given the magnitude of carrier burden and the lower cost of testing compared to treating these conditions, carrier screening by NGS made available to the general population may be an economical way to reduce the incidence of and ameliorate suffering associated with severe recessive childhood disorders.
Genome-wide association study (GWAS) has revolutionized the search for the genetic basis of complex traits. To date, GWAS have generally relied on relatively sparse sampling of nucleotide diversity, which is likely to bias results by preferentially sampling high-frequency SNPs not in complete linkage disequilibrium (LD) with causative SNPs. To avoid these limitations we conducted GWAS with >6 million SNPs identified by sequencing the genomes of 226 accessions of the model legume Medicago truncatula. We used these data to identify candidate genes and the genetic architecture underlying phenotypic variation in plant height, trichome density, flowering time, and nodulation. The characteristics of candidate SNPs differed among traits, with candidates for flowering time and trichome density in distinct clusters of high linkage disequilibrium (LD) and the minor allele frequencies (MAF) of candidates underlying variation in flowering time and height significantly greater than MAF of candidates underlying variation in other traits. Candidate SNPs tagged several characterized genes including nodulation related genes SERK2, MtnodGRP3, MtMMPL1, NFP, CaML3, MtnodGRP3A and flowering time gene MtFD as well as uncharacterized genes that become candidates for further molecular characterization. By comparing sequence-based candidates to candidates identified by in silico 250K SNP arrays, we provide an empirical example of how reliance on even high-density reduced representation genomic makers can bias GWAS results. Depending on the trait, only 30–70% of the top 20 in silico array candidates were within 1 kb of sequence-based candidates. Moreover, the sequence-based candidates tagged by array candidates were heavily biased towards common variants; these comparisons underscore the need for caution when interpreting results from GWAS conducted with sparsely covered genomes.
The draft genome sequence of a non-O1 Vibrio cholerae strain, PS15, organized into 3,512 open reading frames within a 3.9-Mb genome, was determined. The PS15 genome sequence will allow for the study of the evolution of virulence and environmental adaptation in V. cholerae.
Tomato is a model and economically important crop plant with little information available about gene expression in roots. Currently, there have only been a few studies that examine hormonal responses in tomato roots and none at a genome-wide level. This study examined the transcriptome atlas of tomato root regions (root tip, lateral roots, and whole roots) and the transcriptional regulation of each root region in response to the plant hormones cytokinin and auxin using Illumina RNA sequencing. More than 165 million 1×54 base pair reads were mapped onto the Solanum lycopersicum reference genome and differential expression patterns in each root region in response to each hormone were assessed. Many novel cytokinin- and auxin-induced and -repressed genes were identified as significantly differentially expressed and the expression levels of several were confirmed by qPCR. A number of these regulated genes represent tomato orthologues of cytokinin- or auxin-regulated genes identified in other species, including CKXs, type-A RRs, Aux/IAAs, and ARFs. Additionally, the data confirm some of the hormone regulation studies for recently examined genes in tomato such as SlIAAs and SlGH3s. Moreover, genes expressed abundantly in each root region were identified which provide a spatial distribution of many classes of genes, including plant defence, secondary metabolite production, and general metabolism across the root. Overall this study presents the first global expression patterns of hormone-regulated transcripts in tomato roots, which will be functionally relevant for future studies directed towards tomato root growth and development.
Auxin; cytokinin; lateral root; RNA sequencing; root; root tip; tomato.
Tomato is one of the most economically and agriculturally important Solanaceous species and vegetable crops, serving as a model for examination of fruit biology and compound leaf development. Cytokinin is a plant hormone linked to the control of leaf development and is known to regulate a wide range of genes including many transcription factors. Currently there is little known of the leaf transcriptome in tomato and how it might be regulated by cytokinin. We employ high throughput mRNA sequencing technology and bioinformatic methodologies to robustly analyze cytokinin regulated tomato leaf transcriptomes. Leaf samples of two ages, 13d and 35d were treated with cytokinin or the solvent vehicle control dimethyl sulfoxide (DMSO) for 2 h or 24 h, after which RNA was extracted for sequencing. To confirm the accuracy of RNA sequencing results, we performed qPCR analysis of select transcripts identified as cytokinin regulated by the RNA sequencing approach. The resulting data provide the first hormone transcriptome analysis of leaves in tomato. Specifically we identified several previously untested tomato orthologs of cytokinin-related genes as well as numerous novel cytokinin-regulated transcripts in tomato leaves. Principal component analysis of the data indicates that length of cytokinin treatment and plant age are the major factors responsible for changes in transcripts observed in this study. Two hour cytokinin treatment showed a more robust transcript response indicated by both greater fold change of induced transcripts and the induction of twice as many cytokinin-related genes involved in signaling, metabolism, and transport in young vs. older leaves. This difference in transcriptome response in younger vs. older leaves was also found to a lesser extent with an extended (24 h) cytokinin treatment. Overall data presented here provides a solid foundation for future study of cytokinin and cytokinin regulated genes involved in compound leaf development or other developmental processes in tomato.
The oomycete vegetable pathogen Phytophthora capsici has shown remarkable adaptation to fungicides and new hosts. Like other members of this destructive genus, P. capsici has an explosive epidemiology, rapidly producing massive numbers of asexual spores on infected hosts. In addition, P. capsici can remain dormant for years as sexually-recombined oospores, making it difficult to produce crops at infested sites, and allowing outcrossing populations to maintain significant genetic variation. Genome sequencing, development of a high-density genetic map, and integrative genomic/genetic characterization of P. capsici field isolates and intercross progeny revealed significant mitotic loss of heterozygosity (LOH) and higher levels of SNVs than those reported for humans, plants, and P. infestans. LOH was detected in clonally propagated field isolates and sexual progeny, cumulatively affecting >30% of the genome. LOH altered genotypes for more than 11,000 single nucleotide variant (SNV) sites and showed a strong association with changes in mating type and pathogenicity. Overall, it appears that LOH may provide a rapid mechanism for fixing alleles and may be an important component of adaptability for P. capsici.
Tissue-specific transcription factors are thought to cooperate with signaling pathways to promote patterned tissue specification, in part by co-regulating transcription. The Drosophila melanogaster Pax6 homolog Eyeless forms a complex, incompletely understood regulatory network with the Hedgehog, Decapentaplegic and Notch signaling pathways to control eye-specific gene expression. We report a combinatorial approach, including mRNAseq and microarray analyses, to identify targets co-regulated by Eyeless and Hedgehog, Decapentaplegic or Notch. Multiple analyses suggest that the transcriptomes resulting from co-misexpression of Eyeless+signaling factors provide a more complete picture of eye development compared to previous efforts involving Eyeless alone: (1) Principal components analysis and two-way hierarchical clustering revealed that the Eyeless+signaling factor transcriptomes are closer to the eye control transcriptome than when Eyeless is misexpressed alone; (2) more genes are upregulated at least three-fold in response to Eyeless+signaling factors compared to Eyeless alone; (3) based on gene ontology analysis, the genes upregulated in response to Eyeless+signaling factors had a greater diversity of functions compared to Eyeless alone. Through a secondary screen that utilized RNA interference, we show that the predicted gene CG4721 has a role in eye development. CG4721 encodes a neprilysin family metalloprotease that is highly up-regulated in response to Eyeless+Notch, confirming the validity of our approach. Given the similarity between D. melanogaster and vertebrate eye development, the large number of novel genes identified as potential targets of Ey+signaling factors will provide novel insights to our understanding of eye development in D. melanogaster and humans.
The symbiosis between rhizobial bacteria and legume plants has served as a model for investigating the genetics of nitrogen fixation and the evolution of facultative mutualism. We used deep sequence coverage (>100×) to characterize genomic diversity at the nucleotide level among 12 Sinorhizobium medicae and 32 S. meliloti strains. Although these species are closely related and share host plants, based on the ratio of shared polymorphisms to fixed differences we found that horizontal gene transfer (HGT) between these species was confined almost exclusively to plasmid genes. Three multi-genic regions that show the strongest evidence of HGT harbor genes directly involved in establishing or maintaining the mutualism with host plants. In both species, nucleotide diversity is 1.5–2.5 times greater on the plasmids than chromosomes. Interestingly, nucleotide diversity in S. meliloti but not S. medicae is highly structured along the chromosome – with mean diversity (θπ) on one half of the chromosome five times greater than mean diversity on the other half. Based on the ratio of plasmid to chromosome diversity, this appears to be due to severely reduced diversity on the chromosome half with less diversity, which is consistent with extensive hitchhiking along with a selective sweep. Frequency-spectrum based tests identified 82 genes with a signature of adaptive evolution in one species or another but none of the genes were identified in both species. Based upon available functional information, several genes identified as targets of selection are likely to alter the symbiosis with the host plant, making them attractive targets for further functional characterization.
Facultative mutualisms are relationships between two species that can live independently, but derive benefits when living together with their mutualistic partners. The facultative mutualism between rhizobial bacteria and legume plants contributes approximately half of all biologically fixed nitrogen, an essential plant nutrient, and is an important source of nitrogen to both natural and agricultural ecosystems. We resequenced the genomes of 44 strains of two closely related species of the genus Sinorhizobium that form facultative mutualisms with the model legme Medicago truncatula. These data provide one of the most complete examinations of genomic diversity segregating within microbial species that are not causative agents of human illness. Our analyses reveal that horizontal gene transfer, a common source of new genes in microbial species, disproportionately affects genes with direct roles in the rhizobia-plant symbiosis. Analyses of nucleotide diversity segregating within each species suggests that strong selection, along with genetic hitchhiking has sharply reduced diversity along an entire chromosome half in S. meliloti. Despite the two species' ecological similarity, we did not find evidence for selection acting on the same genetic targets. In addition to providing insight into the evolutionary history of rhizobial, this study shows the feasibility and potential power of applying population genomic analyses to microbial species.
Homologous recombination, together with selection, laid the foundation for traditional plant breeding. The recombination process that takes place during meiotic cell division is crucial for the creation of novel variations of highly desired traits by breeders. Gaining control over this process is important for molecular breeding to achieve more precise, large-scale and quicker plant improvement. As conventional ubiquitous promoters are neither tissue-specific nor efficient in driving gene expression in meiocytes, promoters with high meiotic activities are potential candidates for manipulating the recombination process. So far, only a few meiotically-active promoters have been reported. Recently developed techniques to profile the transcriptome landscape of isolated meiocytes provided the means to discover promoters from genes that are actively expressed in meiosis.
In a screen for meiotically-active promoters, we examined ten promoter sequences that are associated with novel meiotic candidate genes. Each promoter was tested by expressing a GFP reporter gene in Arabidopsis. Characterization of regulatory regions revealed that these meiotically-active promoters possessed conserved motifs and motif arrangement. Some of the promoters unite optimal properties which are invaluable for meiosis-directed studies such as delivering specific gene expression in early meiosis I and/or meiosis II. Furthermore, the examination of homologs of the corresponding genes within green plants points to a great potential of applying the information from Arabidopsis to other species, especially crop plants.
We identified ten novel meiotically-active promoters; which, along with their homologs, are prime candidates to specifically drive gene expression during meiosis in plants and can thus provide important tools for meiosis study and crop breeding.
Meiosis; Homologous recombination; Promoter; GFP; cis-regulatory elements; Plant molecular breeding
Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation 1. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Mya). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species 2. Medicago truncatula (Mt) is a long-established model for the study of legume biology. Here we describe the draft sequence of the Mt euchromatin based on a recently completed BAC-assembly supplemented with Illumina-shotgun sequence, together capturing ~94% of all Mt genes. A whole-genome duplication (WGD) approximately 58 Mya played a major role in shaping the Mt genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the Mt genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max (Gm) and Lotus japonicus (Lj). Mt is a close relative of alfalfa (M. sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the Mt genome sequence provides significant opportunities to expand alfalfa’s genomic toolbox.
Meiosis is a critical process in the reproduction and life cycle of flowering plants in which homologous chromosomes pair, synapse, recombine and segregate. Understanding meiosis will not only advance our knowledge of the mechanisms of genetic recombination, but also has substantial applications in crop improvement. Despite the tremendous progress in the past decade in other model organisms (e.g., Saccharomyces cerevisiae and Drosophila melanogaster), the global identification of meiotic genes in flowering plants has remained a challenge due to the lack of efficient methods to collect pure meiocytes for analyzing the temporal and spatial gene expression patterns during meiosis, and for the sensitive identification and quantitation of novel genes.
A high-throughput approach to identify meiosis-specific genes by combining isolated meiocytes, RNA-Seq, bioinformatic and statistical analysis pipelines was developed. By analyzing the studied genes that have a meiosis function, a pipeline for identifying meiosis-specific genes has been defined. More than 1,000 genes that are specifically or preferentially expressed in meiocytes have been identified as candidate meiosis-specific genes. A group of 55 genes that have mitochondrial genome origins and a significant number of transposable element (TE) genes (1,036) were also found to have up-regulated expression levels in meiocytes.
These findings advance our understanding of meiotic genes, gene expression and regulation, especially the transcript profiles of MGI genes and TE genes, and provide a framework for functional analysis of genes in meiosis.
Monozygotic (MZ) or “identical” twins have been widely studied to dissect the relative contributions of genetics and environment in human diseases. In multiple sclerosis (MS), an autoimmune demyelinating disease and common cause of neurodegeneration and disability in young adults, disease discordance in MZ twins has been interpreted to indicate environmental importance in its pathogenesis1–8. However, genetic and epigenetic differences between MZ twins have been described, challenging the accepted experimental paradigm in disambiguating effects of nature and nurture.9–12 Here, we report the genome sequences of one MS-discordant MZ twin pair and messenger RNA (mRNA) transcriptome and epigenome sequences of CD4+ lymphocytes from three MS-discordant, MZ twin pairs. No reproducible differences were detected between co-twins among ~3.6 million single nucleotide polymorphisms (SNPs) or ~0.2 million insertion-deletion polymorphisms (indels). Nor were any reproducible differences observed between siblings of the three twin pairs in HLA haplotypes, confirmed MS-susceptibility SNPs, copy number variations, mRNA and genomic SNP and indel genotypes, or expression of ~19,000 genes in CD4+ T cells. Only two to 176 differences in methylation of ~2 million CpG dinucleotides were detected between siblings of the three twin pairs, in contrast to ~800 methylation differences between T cells of unrelated individuals and several thousand differences between tissues or normal and cancerous tissues. In the first systematic effort to estimate sequence variation among MZ co-twins, we did not find evidence for genetic, epigenetic or transcriptome differences that explained disease discordance. These are the first female, twin and autoimmune disease individual genome sequences reported.
Mosquitoes rely on RNA interference (RNAi) as their primary defense against viral infections. To this end, the combination of RNAi and invertebrate cell culture systems has become an invaluable tool in studying virus-vector interactions. Nevertheless, a recent study failed to detect an active RNAi response to West Nile virus (WNV) infection in C6/36 (Aedes albopictus) cells, a mosquito cell line frequently used to study arthropod-borne viruses (arboviruses). Therefore, we sought to determine if WNV actively evades the host's RNAi response or if C6/36 cells have a dysfunctional RNAi pathway. C6/36 and Drosophila melanogaster S2 cells were infected with WNV (Flaviviridae), Sindbis virus (SINV, Togaviridae) and La Crosse virus (LACV, Bunyaviridae) and total RNA recovered from cell lysates. Small RNA (sRNA) libraries were constructed and subjected to high-throughput sequencing. In S2 cells, virus-derived small interfering RNAs (viRNAs) from all three viruses were predominantly 21 nt in length, a hallmark of the RNAi pathway. However, in C6/36 cells, viRNAs were primarily 17 nt in length from WNV infected cells and 26–27 nt in length in SINV and LACV infected cells. Furthermore, the origin (positive or negative viral strand) and distribution (position along viral genome) of S2 cell generated viRNA populations was consistent with previously published studies, but the profile of sRNAs isolated from C6/36 cells was altered. In total, these results suggest that C6/36 cells lack a functional antiviral RNAi response. These findings are analogous to the type-I interferon deficiency described in Vero (African green monkey kidney) cells and suggest that C6/36 cells may fail to accurately model mosquito-arbovirus interactions at the molecular level.
Cell culture systems are invaluable tools for studying virus-host interactions. These systems are typically easy to maintain and manipulate; however, they can fail to accurately mimic the host environment encountered by viruses. Therefore, defining the limitations of each system is critical to properly interpreting the results. C6/36 Aedes albopictus cells are commonly used to study arthropod-borne viruses (arboviruses), such as West Nile virus (WNV). Recent evidence suggests that the RNA interference (RNAi) pathway, a critical aspect of the cellular innate antiviral immune response in invertebrates, may not actively target WNV in C6/36 cells. However, it is unknown whether this observation is limited to WNV. Therefore, we examined small RNA populations from C6/36 and Drosophila melanogastor S2 cells infected with WNV, Sindbis virus and La Crosse virus by high-throughput sequencing. We demonstrate that the RNAi pathway actively targets each of the three viruses in S2 cells, but does not in C6/36 cells. These findings suggest that C6/36 cells may fail to accurately model mosquito-arbovirus interactions.
Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of north-west European origin, and a person from China1–4. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8× coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades5,6, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.
Although genetic studies have been critically important for the identification of therapeutic targets in Mendelian disorders, genetic approaches aiming to identify targets for common, complex diseases have traditionally had much more limited success. However, during the past year, a novel genetic approach — genome-wide association (GWA) — has demonstrated its potential to identify common genetic variants associated with complex diseases such as diabetes, inflammatory bowel disease and cancer. Here, we highlight some of these recent successes, and discuss the potential for GWA studies to identify novel therapeutic targets and genetic biomarkers that will be useful for drug discovery, patient selection and stratification in common diseases.
High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem’s SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.
Alpheus; sequencing-by-synthesis; pyrosequencing; GMAP; GSNAP; resequencing; transcriptome sequencing
Schizophrenia (SCZ) is a common, disabling mental illness with high heritability but complex, poorly understood genetic etiology. As the first phase of a genomic convergence analysis of SCZ, we generated 16.7 billion nucleotides of short read, shotgun sequences of cDNA from post-mortem cerebellar cortices of 14 patients and six, matched controls. A rigorous analysis pipeline was developed for analysis of digital gene expression studies. Sequences aligned to approximately 33,200 transcripts in each sample, with average coverage of 450 reads per gene. Following adjustments for confounding clinical, sample and experimental sources of variation, 215 genes differed significantly in expression between cases and controls. Golgi apparatus, vesicular transport, membrane association, Zinc binding and regulation of transcription were over-represented among differentially expressed genes. Twenty three genes with altered expression and involvement in presynaptic vesicular transport, Golgi function and GABAergic neurotransmission define a unifying molecular hypothesis for dysfunction in cerebellar cortex in SCZ.
Novel, comprehensive approaches for biomarker discovery and validation are urgently needed. One particular area of methodologic need is for discovery of novel genetic biomarkers in complex diseases and traits. Here, we review recent successes in the use of genome wide association (GWA) approaches to identify genetic biomarkers in common human diseases and traits. Such studies are yielding initial insights into the allelic architecture of complex traits. In general, it appears that complex diseases are associated with many common polymorphisms, implying profound genetic heterogeneity between affected individuals.
Genome-wide association studies; Complex diseases; Complex traits; Genetic biomarkers; Population genetics
Recent genome sequencing enables mega-base scale comparisons between related genomes. Comparisons between animals, plants, fungi, and bacteria demonstrate extensive synteny tempered by rearrangements. Within the legume plant family, glimpses of synteny have also been observed. Characterizing syntenic relationships in legumes is important in transferring knowledge from model legumes to crops that are important sources of protein, fixed nitrogen, and health-promoting compounds.
We have uncovered two large soybean regions exhibiting synteny with M. truncatula and with a network of segmentally duplicated regions in Arabidopsis. In all, syntenic regions comprise over 500 predicted genes spanning 3 Mb. Up to 75% of soybean genes are colinear with M. truncatula, including one region in which 33 of 35 soybean predicted genes with database support are colinear to M. truncatula. In some regions, 60% of soybean genes share colinearity with a network of A. thaliana duplications. One region is especially interesting because this 500 kbp segment of soybean is syntenic to two paralogous regions in M. truncatula on different chromosomes. Phylogenetic analysis of individual genes within these regions demonstrates that one is orthologous to the soybean region, with which it also shows substantially denser synteny and significantly lower levels of synonymous nucleotide substitutions. The other M. truncatula region is inferred to be paralogous, presumably resulting from a duplication event preceding speciation.
The presence of well-defined M. truncatula segments showing orthologous and paralogous relationships with soybean allows us to explore the evolution of contiguous genomic regions in the context of ancient genome duplication and speciation events.