Next-generation sequencing technologies are making a substantial impact on many areas of biology, including the analysis of genetic diversity in populations. However, genome-scale population genetic studies have been accessible only to well-funded model systems. Restriction-site associated DNA sequencing, a method that samples at reduced complexity across target genomes, promises to deliver high resolution population genomic data—thousands of sequenced markers across many individuals—for any organism at reasonable costs. It has found application in wild populations and non-traditional study species, and promises to become an important technology for ecological population genomics.
RADSeq; population genetics; next-generation sequencing; genetic marker discovery; SNP discovery
Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These ‘islands’ of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the ‘speciation continuum’. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races.
Heliconius; colour pattern; divergence; target enrichment; speciation; genomic islands
Anguillicolidae Yamaguti, 1935 is a family of parasitic nematode infecting fresh-water eels of the genus Anguilla, comprising five species in the genera Anguillicola and Anguillicoloides. Anguillicoloides crassus is of particular importance, as it has recently spread from its endemic range in the Eastern Pacific to Europe and North America, where it poses a significant threat to new, naïve hosts such as the economic important eel species Anguilla anguilla and Anguilla rostrata. The Anguillicolidae are therefore all potentially invasive taxa, but the relationships of the described species remain unclear. Anguillicolidae is part of Spirurina, a diverse clade made up of only animal parasites, but placement of the family within Spirurina is based on limited data.
We generated an extensive DNA sequence dataset from three loci (the 5' one-third of the nuclear small subunit ribosomal RNA, the D2-D3 region of the nuclear large subunit ribosomal RNA and the 5' half of the mitochondrial cytochrome c oxidase I gene) for the five species of Anguillicolidae and used this to investigate specific and generic boundaries within the family, and the relationship of Anguillicolidae to other spirurine nematodes. Neither nuclear nor mitochondrial sequences supported monophyly of Anguillicoloides. Genetic diversity within the African species Anguillicoloides papernai was suggestive of cryptic taxa, as was the finding of distinct lineages of Anguillicoloides novaezelandiae in New Zealand and Tasmania. Phylogenetic analysis of the Spirurina grouped the Anguillicolidae together with members of the Gnathostomatidae and Seuratidae.
The Anguillicolidae is part of a complex radiation of parasitic nematodes of vertebrates with wide host diversity (chondrichthyes, teleosts, squamates and mammals), most closely related to other marine vertebrate parasites that also have complex life cycles. Molecular analyses do not support the recent division of Anguillicolidae into two genera. The described species may hide cryptic taxa, identified here by DNA taxonomy, and this DNA barcoding approach may assist in tracking species invasions. The propensity for host switching, and thus the potential for invasive behaviour, is found in A. crassus, A. novaezelandiae and A. papernai, and thus may be common to the group.
Anguillicola; Anguillicoloides; Invasive; Host switch; Cryptic species; DNA-taxonomy; Barcoding
Second-generation sequencing has made possible the sequencing of genomes of interest for even small research groups. However, obtaining separate clean cultures and clonal or inbred samples of metazoan hosts and their bacterial symbionts is often difficult. We present a computational pipeline for separating metazoan and bacterial DNA in silico rather than at the bench. The method relies on the generation of deep coverage of all the genomes in a mixed sample using Illumina short-read sequencing technology, and using aggregate properties of the different genomes to identify read sets belonging to each. This inexpensive and rapid approach has been used to sequence several nematode genomes and their bacterial endosymbionts in the last year in our laboratory and can also be used to visualize and identify unexpected contaminants (or possible symbionts) in genomic DNA samples. We hope that this method will enable researchers studying symbiotic systems to move from gene-centric to genome-centric approaches.
Symbiont; Second-generation sequencing; Genome; Nematode; Illumina
Understanding polyphenism, the ability of a single genome to express multiple morphologically and behaviourally distinct phenotypes, is an important goal for evolutionary and developmental biology. Polyphenism has been key to the evolution of the Hymenoptera, and particularly the social Hymenoptera where the genome of a single species regulates distinct larval stages, sexual dimorphism and physical castes within the female sex. Transcriptomic analyses of social Hymenoptera will therefore provide unique insights into how changes in gene expression underlie such complexity. Here we describe gene expression in individual specimens of the pre-adult stages, sexes and castes of the key pollinator, the buff-tailed bumblebee Bombus terrestris.
cDNA was prepared from mRNA from five life cycle stages (one larva, one pupa, one male, one gyne and two workers) and a total of 1,610,742 expressed sequence tags (ESTs) were generated using Roche 454 technology, substantially increasing the sequence data available for this important species. Overlapping ESTs were assembled into 36,354 B. terrestris putative transcripts, and functionally annotated. A preliminary assessment of differences in gene expression across non-replicated specimens from the pre-adult stages, castes and sexes was performed using R-STAT analysis. Individual samples from the life cycle stages of the bumblebee differed in the expression of a wide array of genes, including genes involved in amino acid storage, metabolism, immunity and olfaction.
Detailed analyses of immune and olfaction gene expression across phenotypes demonstrated how transcriptomic analyses can inform our understanding of processes central to the biology of B. terrestris and the social Hymenoptera in general. For example, examination of immunity-related genes identified high conservation of important immunity pathway components across individual specimens from the life cycle stages while olfactory-related genes exhibited differential expression with a wider repertoire of gene expression within adults, especially sexuals, in comparison to immature stages. As there is an absence of replication across the samples, the results of this study are preliminary but provide a number of candidate genes which may be related to distinct phenotypic stage expression. This comprehensive transcriptome catalogue will provide an important gene discovery resource for directed programmes in ecology, evolution and conservation of a key pollinator.
The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.
Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users.
iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.
Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis.
Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs.
Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.
Branchiopod crustaceans in the genus Daphnia are key model organisms for investigating interactions between genes and the environment. One major theme of research on Daphnia species has been the evolution of resistance to pathogens and parasites, but lack of knowledge of the Daphnia immune system has limited the study of immune responses. Here we provide a survey of the immune-related genome of D. pulex, derived from the newly completed genome sequence. Genes likely to be involved in innate immune responses were identified by comparison to homologues from other arthropods. For each candidate, the gene model was refined, and we conducted an analysis of sequence divergence from homologues from other taxa.
Results and conclusion
We found that some immune pathways, in particular the TOLL pathway, are fairly well conserved between insects and Daphnia, while other elements, in particular antimicrobial peptides, could not be recovered from the genome sequence. We also found considerable variation in gene family copy number when comparing Daphnia to insects and present phylogenetic analyses to shed light on the evolution of a range of conserved immune gene families.
Orthologs of the vertebrate ATP gated P2X channels have been identified in Dictyostelium and green algae, demonstrating that the emergence of ionotropic purinergic signalling was an early event in eukaryotic evolution. However, the genomes of a number of animals including Drosophila melanogaster and Caenorhabditis elegans, both members of the Ecdysozoa superphylum, lack P2X-like proteins, whilst other species such as the flatworm Schistosoma mansoni have P2X proteins making it unclear as to what stages in evolution P2X receptors were lost. Here we describe the functional characterisation of a P2X receptor (HdP2X) from the tardigrade Hypsibius dujardini demonstrating that purinergic signalling is preserved in some ecdysozoa.
ATP (EC50 ~44.5 μM) evoked transient inward currents in HdP2X with millisecond rates of activation and desensitisation. HdP2X is antagonised by pyridoxal-phosphate-6-azophenyl-2',4' disulfonic acid (IC50 15.0 μM) and suramin (IC50 22.6 μM) and zinc and copper inhibit ATP-evoked currents with IC50 values of 62.8 μM and 19.9 μM respectively. Site-directed mutagenesis showed that unlike vertebrate P2X receptors, extracellular histidines do not play a major role in coordinating metal binding in HdP2X. However, H306 was identified as playing a minor role in the actions of copper but not zinc. Ivermectin potentiated responses to ATP with no effect on the rates of current activation or decay.
The presence of a P2X receptor in a tardigrade species suggests that both nematodes and arthropods lost their P2X genes independently, as both traditional and molecular phylogenies place the divergence between Nematoda and Arthropoda before their divergence from Tardigrada. The phylogenetic analysis performed in our study also clearly demonstrates that the emergence of the family of seven P2X channels in human and other mammalian species was a relatively recent evolutionary event that occurred subsequent to the split between vertebrates and invertebrates. Furthermore, several characteristics of HdP2X including fast kinetics with low ATP sensitivity, potentiation by ivermectin in a channel with fast kinetics and distinct copper and zinc binding sites not dependent on histidines make HdP2X a useful model for comparative structure-function studies allowing a better understanding of P2X receptors in higher organisms.
The left-right asymmetry of snails, including the direction of shell coiling, is determined by the delayed effect of a maternal gene on the chiral twist that takes place during early embryonic cell divisions. Yet, despite being a well-established classical problem, the identity of the gene and the means by which left-right asymmetry is established in snails remain unknown. We here demonstrate the power of new genomic approaches for identification of the chirality gene, “D”. First, heterozygous (Dd) pond snails Lymnaea stagnalis were self-fertilised or backcrossed, and the genotype of more than six thousand offspring inferred, either dextral (DD/Dd) or sinistral (dd). Then, twenty of the offspring were used for Restriction-site-Associated DNA Sequencing (RAD-Seq) to identify anonymous molecular markers that are linked to the chirality locus. A local genetic map was constructed by genotyping three flanking markers in over three thousand snails. The three markers lie either side of the chirality locus, with one very tightly linked (<0.1 cM). Finally, bacterial artificial chromosomes (BACs) were isolated that contained the three loci. Fluorescent in situ hybridization (FISH) of pachytene cells showed that the three BACs tightly cluster on the same bivalent chromosome. Fibre-FISH identified a region of greater that ∼0.4 Mb between two BAC clone markers that must contain D. This work therefore establishes the resources for molecular identification of the chirality gene and the variation that underpins sinistral and dextral coiling. More generally, the results also show that combining genomic technologies, such as RAD-Seq and high resolution FISH, is a robust approach for mapping key loci in non-model systems.
The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways.
annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools.
annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.
With over 100 000 species and a large community of evolutionary biologists, population ecologists, pest biologists and genome researchers, the Lepidoptera are an important insect group. Genomic resources [expressed sequence tags (ESTs), genome sequence, genetic and physical maps, proteomic and microarray datasets] are growing, but there has up to now been no single access and analysis portal for this group. Here we present ButterflyBase (http://www.butterflybase.org), a unified resource for lepidopteran genomics. A total of 273 077 ESTs from more than 30 different species have been clustered to generate stable unigene sets, and robust protein translations derived from each unigene cluster. Clusters and their protein translations are annotated with BLAST-based similarity, gene ontology (GO), enzyme classification (EC) and Kyoto encyclopaedia of genes and genomes (KEGG) terms, and are also searchable using similarity tools such as BLAST and MS-BLAST. The database supports many needs of the lepidopteran research community, including molecular marker development, orthologue prediction for deep phylogenetics, and detection of rapidly evolving proteins likely involved in host–pathogen or other evolutionary processes. ButterflyBase is expanding to include additional genomic sequence, ecological and mapping data for key species.
The nematode Caenorhabditis elegans is unique among model animals in that many of its genes are cotranscribed as polycistronic pre-mRNAs from operons. The mechanism by which these operonic transcripts are resolved into mature mRNAs includes trans-splicing to a family of SL2-like spliced leader exons. SL2-like spliced leaders are distinct from SL1, the major spliced leader in C. elegans and other nematode species. We surveyed five additional nematode species, representing three of the five major clades of the phylum Nematoda, for the presence of operons and the use of trans-spliced leaders in resolution of polycistronic pre-mRNAs. Conserved operons were found in Pristionchus pacificus, Nippostrongylus brasiliensis, Strongyloides ratti, Brugia malayi, and Ascaris suum. In nematodes closely related to the rhabditine C. elegans, a related family of SL2-like spliced leaders is used for operonic transcript resolution. However, in the tylenchine S. ratti operonic transcripts are resolved using a family of spliced leaders related to SL1. Non-operonic genes in S. ratti may also receive these SL1 variants. In the spirurine nematodes B. malayi and A. suum operonic transcripts are resolved using SL1. Mapping these phenotypes onto the robust molecular phylogeny for the Nematoda suggests that operons evolved before SL2-like spliced leaders, which are an evolutionary invention of the rhabditine lineage.
The genome of the nematode worm Caenorhabditis elegans was the first of any animal to be completely sequenced. One surprising finding in this worm's genome was that about one-fifth of its genes were organised as sets of from two to eight genes expressed from the same promoter, similar to bacterial “operons.” The pre-mRNAs made from these operons are processed by an intermolecular ligation process called SL trans-splicing. Other animal genomes, such as the human genome or that of the fruit fly contain neither operons nor SL trans-splicing. In this article, Guiliano and Blaxter have investigated whether this curious facet of genome organisation is peculiar to C. elegans and close relatives by examining the genomes of a wide range of parasitic and free-living nematodes. The authors find that both operons and trans-splicing are present across the nematodes, and that operons evolve as other genome features do. All of the species surveyed use trans-splicing to resolve their multigene pre-mRNAs into single-gene mRNAs, but the details differ significantly from the process in C. elegans. In particular, the short piece of RNA that is attached to the beginning of operon-derived mRNAs has changed independently in many nematode groups.
Not only is the number of described species a very small proportion of the estimated extant number of taxa, but it also appears that all concepts of the extent and boundaries of 'species' fail in many cases. Using conserved molecular sequences it is possible to define and diagnose molecular operational taxonomic units (MOTU) that have a similar extent to traditional 'species'. Use of a MOTU system not only allows the rapid and effective identification of most taxa, including those not encountered before, but also allows investigation of the evolution of patterns of diversity. A MOTU approach is not without problems, particularly in the area of deciding what level of molecular difference defines a biologically relevant taxon, but has many benefits. Molecular data are extremely well suited to re-analysis and meta-analysis, and data from multiple independent studies can be readily collated and investigated by using new parameters and assumptions. Previous molecular taxonomic efforts have focused narrowly. Advances in high-throughput sequencing methodologies, however, place the idea of a universal, multi-locus molecular barcoding system in the realm of the possible.
In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) nematodes have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequencing.
Caenorhabditis elegans; comparative genomics; genome sequencing; systematics
The genomes of an increasing number of species are being investigated through generation of expressed sequence tags (ESTs). However, ESTs are prone to sequencing errors and typically define incomplete transcripts, making downstream annotation difficult. Annotation would be greatly improved with robust polypeptide translations. Many current solutions for EST translation require a large number of full-length gene sequences for training purposes, a resource that is not available for the majority of EST projects.
As part of our ongoing EST programs investigating these "neglected" genomes, we have developed a polypeptide prediction pipeline, prot4EST. It incorporates freely available software to produce final translations that are more accurate than those derived from any single method. We show that this integrated approach goes a long way to overcoming the deficit in training data.
prot4EST provides a portable EST translation solution and can be usefully applied to >95% of EST projects to improve downstream annotation. It is freely available from .
Analysis of secreted proteins indicate that they may be undergoing accelerated evolution, either because of relaxed functional constraints, or in response to stronger selective pressure from host immunity.
Parasitism is a highly successful mode of life and one that requires suites of gene adaptations to permit survival within a potentially hostile host. Among such adaptations is the secretion of proteins capable of modifying or manipulating the host environment. Nippostrongylus brasiliensis is a well-studied model nematode parasite of rodents, which secretes products known to modulate host immunity.
Taking a genomic approach to characterize potential secreted products, we analyzed expressed sequence tag (EST) sequences for putative amino-terminal secretory signals. We sequenced ESTs from a cDNA library constructed by oligo-capping to select full-length cDNAs, as well as from conventional cDNA libraries. SignalP analysis was applied to predicted open reading frames, to identify potential signal peptides and anchors. Among 1,234 ESTs, 197 (~16%) contain predicted 5' signal sequences, with 176 classified as conventional signal peptides and 21 as signal anchors. ESTs cluster into 742 distinct genes, of which 135 (18%) bear predicted signal-sequence coding regions. Comparisons of clusters with homologs from Caenorhabditis elegans and more distantly related organisms reveal that the majority (65% at P < e-10) of signal peptide-bearing sequences from N. brasiliensis show no similarity to previously reported genes, and less than 10% align to conserved genes recorded outside the phylum Nematoda. Of all novel sequences identified, 32% contained predicted signal peptides, whereas this was the case for only 3.4% of conserved genes with sequence homologies beyond the Nematoda.
These results indicate that secreted proteins may be undergoing accelerated evolution, either because of relaxed functional constraints, or in response to stronger selective pressure from host immunity.
Intracellular bacteria have been described in several species of filarial nematodes, but their relationships with, and effects on, their nematode hosts have not previously been elucidated. In this study, intracellular bacteria were observed in tissues of the rodent parasite Litomosoides sigmodontis by transmission electron microscopy and by immunohistochemistry using antiendobacterial heat shock protein-60 antisera. Molecular phylogenetic analysis of the bacterial 16S ribosomal RNA gene, isolated by PCR, showed a close relationship to the rickettsial Wolbachia endobacteria of arthropods and to other filarial intracellular bacteria. The impact of tetracycline therapy of infected rodents on L. sigmodontis development was analyzed in order to understand the role(s) these bacteria might play in filarial biology. Tetracycline therapy, when initiated with L. sigmodontis infection, eliminated the bacteria and resulted in filarial growth retardation and infertility. If initiated after microfilarial development, treatment reduced filarial fertility. Treatment with antibiotics not affecting rickettsial bacteria did not inhibit filarial development. Acanthocheilonema viteae filariae were shown to lack intracellular bacteria and to be insensitive to tetracycline. These results suggest a mutualistic interaction between the intracellular bacteria and the filarial nematode. Investigation of such a mutualism in endobacteria-containing human filariae is warranted for a potential chemotherapeutic exploitation.
A collection of Caenorhabditis elegans mutants that show ectopic surface lectin binding (Srf mutants) was analyzed to determine the biochemical basis for this phenotype. This analysis involved selective removal or labeling of surface components, specific labeling of surface glycans, and fractionation of total protein with subsequent detection of wheat germ agglutinin (WGA) binding proteins. Wild-type and mutant nematodes showed no differences in their profiles of extractable surface glycoproteins or total WGA-binding proteins, suggesting that the ectopic lectin binding does not result from the novel expression of surface glycans. Instead, these results support a model in which ectopic lectin binding results from an unmasking of glycosylated components present in the insoluble cuticle matrix of wild-type animals. To explain the multiple internal defects found in some surface mutants, we propose that these mutants have a basic defect in protein processing. This defect would interfere with the expression of the postulated masking protein(s), as well as other proteins required for normal development.
biochemical analysis; Caenorhabditis elegans; glycoproteins; lectin; nematode; surface mutant
Biodiversity is of crucial importance for ecosystem functioning, sustainability and resilience, but the magnitude and organization of marine diversity at a range of spatial and taxonomic scales are undefined. In this paper, we use second-generation sequencing to unmask putatively diverse marine metazoan biodiversity in a Scottish temperate benthic ecosystem. We show that remarkable differences in diversity occurred at microgeographical scales and refute currently accepted ecological and taxonomic paradigms of meiofaunal identity, rank abundance and concomitant understanding of trophic dynamics. Richness estimates from the current benchmarked Operational Clustering of Taxonomic Units from Parallel UltraSequencing analyses are broadly aligned with those derived from morphological assessments. However, the slope of taxon rarefaction curves for many phyla remains incomplete, suggesting that the true alpha diversity is likely to exceed current perceptions. The approaches provide a rapid, objective and cost-effective taxonomic framework for exploring links between ecosystem structure and function of all hitherto intractable, but ecologically important, communities.
Recent developments in sequencing technologies have provided the opportunity to investigate the biodiversity of ecosystems. Such a metagenomic approach, combined with taxon clustering, is used here to demonstrate that the species richness of a marine community in Scotland is much greater than anticipated.
Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools.
contig assembly; genotyping by sequencing; population genetics; RAD Sequencing; restriction enzymes
Studies on the classic shell colour and banding polymorphism of the land snail Cepaea played a crucial role in establishing the importance of natural selection in maintaining morphological variation. Cepaea is also a pre-eminent model for ecological genetics because the outward colour and banding phenotype is entirely genetically determined, primarily by a ‘supergene’ of at least five loci. Unfortunately, progress in understanding the evolution and maintenance of the Cepaea polymorphism stalled, partly because of a lack of genetic markers. With a view to re-establish Cepaea as a prominent model of molecular ecology, we made six laboratory crosses of Cepaea nemoralis, five of which segregated for shell ground colour (C) and the presence or absence of bands (B). First, scoring of colour and banding in 323 individuals found no recombination between the C and B loci of the supergene. Second, using restriction site–associated DNA sequencing (RAD-Seq) of two parents and 22 offspring, we identified 44 anonymous markers putatively linked to the colour (C) and banding (B) loci. The genotype of eleven of the most promising RAD-Seq markers was independently validated in the same 22 offspring, then up to a further 146 offspring were genotyped. The closest RAD-Seq markers scored are within ∼0.6 centimorgan (cM) of the C-B supergene linkage group, with the combined loci together forming a 35.8 cM linkage map of markers that flank both sides of the Cepaea C-B supergene.
colour polymorphism; Heliconius; RAD-Seq; restriction site–associated DNA sequencing; snail; supergene
The cestode Echinococcus granulosus - the agent of cystic echinococcosis, a zoonosis affecting humans and domestic animals worldwide - is an excellent model for the study of host-parasite cross-talk that interfaces with two mammalian hosts. To develop the molecular analysis of these interactions, we carried out an EST survey of E. granulosus larval stages. We report the salient features of this study with a focus on genes reflecting physiological adaptations of different parasite stages.
We generated ∼10,000 ESTs from two sets of full-length enriched libraries (derived from oligo-capped and trans-spliced cDNAs) prepared with three parasite materials: hydatid cyst wall, larval worms (protoscoleces), and pepsin/H+-activated protoscoleces. The ESTs were clustered into 2700 distinct gene products. In the context of the biology of E. granulosus, our analyses reveal: (i) a diverse group of abundant long non-protein coding transcripts showing homology to a middle repetitive element (EgBRep) that could either be active molecular species or represent precursors of small RNAs (like piRNAs); (ii) an up-regulation of fermentative pathways in the tissue of the cyst wall; (iii) highly expressed thiol- and selenol-dependent antioxidant enzyme targets of thioredoxin glutathione reductase, the functional hub of redox metabolism in parasitic flatworms; (iv) candidate apomucins for the external layer of the tissue-dwelling hydatid cyst, a mucin-rich structure that is critical for survival in the intermediate host; (v) a set of tetraspanins, a protein family that appears to have expanded in the cestode lineage; and (vi) a set of platyhelminth-specific gene products that may offer targets for novel pan-platyhelminth drug development.
This survey has greatly increased the quality and the quantity of the molecular information on E. granulosus and constitutes a valuable resource for gene prediction on the parasite genome and for further genomic and proteomic analyses focused on cestodes and platyhelminths.
Cestodes are a neglected group of platyhelminth parasites, despite causing chronic infections to humans and domestic animals worldwide. We used Echinococcus granulosus as a model to study the molecular basis of the host-parasite cross-talk during cestode infections. For this purpose, we carried out a survey of the genes expressed by parasite larval stages interfacing with definitive and intermediate hosts. Sequencing from several high quality cDNA libraries provided numerous insights into the expression of genes involved in important aspects of E. granulosus biology, e.g. its metabolism (energy production and antioxidant defences) and the synthesis of key parasite structures (notably, the one exposed to humans and livestock intermediate hosts). Our results also uncovered the existence of an intriguing set of abundant repeat-associated non-protein coding transcripts that may participate in the regulation of gene expression in all surveyed stages. The dataset now generated constitutes a valuable resource for gene prediction on the parasite genome and for further genomic and proteomic studies focused on cestodes and platyhelminths. In particular, the detailed characterization of a range of newly discovered genes will contribute to a better understanding of the biology of cestode infections and, therefore, to the development of products allowing their efficient control.
Restriction site-associated DNA sequencing (RAD-Seq) is a genome complexity reduction technique that facilitates large-scale marker discovery and genotyping by sequencing. Recent applications of RAD-Seq have included linkage and QTL mapping with a particular focus on non-model species. In the current study, we have applied RAD-Seq to two Atlantic salmon families from a commercial breeding program. The offspring from these families were classified into resistant or susceptible based on survival/mortality in an Infectious Pancreatic Necrosis (IPN) challenge experiment, and putative homozygous resistant or susceptible genotype at a major IPN-resistance QTL. From each family, the genomic DNA of the two heterozygous parents and seven offspring of each IPN phenotype and genotype was digested with the SbfI enzyme and sequenced in multiplexed pools.
Sequence was obtained from approximately 70,000 RAD loci in both families and a filtered set of 6,712 segregating SNPs were identified. Analyses of genome-wide RAD marker segregation patterns in the two families suggested SNP discovery on all 29 Atlantic salmon chromosome pairs, and highlighted the dearth of male recombination. The use of pedigreed samples allowed us to distinguish segregating SNPs from putative paralogous sequence variants resulting from the relatively recent genome duplication of salmonid species. Of the segregating SNPs, 50 were linked to the QTL. A subset of these QTL-linked SNPs were converted to a high-throughput assay and genotyped across large commercial populations of IPNV-challenged salmon fry. Several SNPs showed highly significant linkage and association with resistance to IPN, and population linkage-disequilibrium-based SNP tests for resistance were identified.
We used RAD-Seq to successfully identify and characterise high-density genetic markers in pedigreed aquaculture Atlantic salmon. These results underline the effectiveness of RAD-Seq as a tool for rapid and efficient generation of QTL-targeted and genome-wide marker data in a large complex genome, and its possible utility in farmed animal selection programs.
Atlantic salmon; RAD sequencing; Aquaculture; Infectious pancreatic necrosis; Recombination; Single nucleotide polymorphism; Paralogous sequence variant