PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-9 (9)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Reference-Free Comparative Genomics of 174 Chloroplasts 
PLoS ONE  2012;7(11):e48995.
Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.
doi:10.1371/journal.pone.0048995
PMCID: PMC3502452  PMID: 23185288
2.  Deciphering neo-sex and B chromosome evolution by the draft genome of Drosophila albomicans 
BMC Genomics  2012;13:109.
Background
Drosophila albomicans is a unique model organism for studying both sex chromosome and B chromosome evolution. A pair of its autosomes comprising roughly 40% of the whole genome has fused to the ancient X and Y chromosomes only about 0.12 million years ago, thereby creating the youngest and most gene-rich neo-sex system reported to date. This species also possesses recently derived B chromosomes that show non-Mendelian inheritance and significantly influence fertility.
Methods
We sequenced male flies with B chromosomes at 124.5-fold genome coverage using next-generation sequencing. To characterize neo-Y specific changes and B chromosome sequences, we also sequenced inbred female flies derived from the same strain but without B's at 28.5-fold.
Results
We assembled a female genome and placed 53% of the sequence and 85% of the annotated proteins into specific chromosomes, by comparison with the 12 Drosophila genomes. Despite its very recent origin, the non-recombining neo-Y chromosome shows various signs of degeneration, including a significant enrichment of non-functional genes compared to the neo-X, and an excess of tandem duplications relative to other chromosomes. We also characterized a B-chromosome linked scaffold that contains an actively transcribed unit and shows sequence similarity to the subcentromeric regions of both the ancient X and the neo-X chromosome.
Conclusions
Our results provide novel insights into the very early stages of sex chromosome evolution and B chromosome origination, and suggest an unprecedented connection between the births of these two systems in D. albomicans.
doi:10.1186/1471-2164-13-109
PMCID: PMC3353239  PMID: 22439699
Drosophila albomicans; neo-sex chromosome; B chromosome
3.  The diploid genome sequence of an Asian individual 
Nature  2008;456(7218):60-65.
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
doi:10.1038/nature07484
PMCID: PMC2716080  PMID: 18987735
4.  The Sequence Alignment/Map format and SAMtools 
Bioinformatics  2009;25(16):2078-2079.
Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Availability: http://samtools.sourceforge.net
Contact: rd@sanger.ac.uk
doi:10.1093/bioinformatics/btp352
PMCID: PMC2723002  PMID: 19505943
5.  TreeFam: 2008 Update 
Nucleic Acids Research  2007;36(Database issue):D735-D740.
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
doi:10.1093/nar/gkm1005
PMCID: PMC2238856  PMID: 18056084
6.  PigGIS: Pig Genomic Informatics System 
Nucleic Acids Research  2006;35(Database issue):D654-D657.
Pig Genomic Information System (PigGIS) is a web-based depository of pig (Sus scrofa) genomic learning mainly engineered for biomedical research to locate pig genes from their human homologs and position single nucleotide polymorphisms (SNPs) in different pig populations. It utilizes a variety of sequence data, including whole genome shotgun (WGS) reads and expressed sequence tags (ESTs), and achieves a successful mapping solution to the low-coverage genome problem. With the data presently available, we have identified a total of 15 700 pig consensus sequences covering 18.5 Mb of the homologous human exons. We have also recovered 18 700 SNPs and 20 800 unique 60mer oligonucleotide probes for future pig genome analyses. PigGIS can be freely accessed via the web at and .
doi:10.1093/nar/gkl808
PMCID: PMC1669765  PMID: 17090590
7.  TreeFam: a curated database of phylogenetic trees of animal gene families 
Nucleic Acids Research  2005;34(Database issue):D572-D580.
TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; ∼40–85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at and .
doi:10.1093/nar/gkj118
PMCID: PMC1347480  PMID: 16381935
8.  SilkDB: a knowledgebase for silkworm biology and genomics 
Nucleic Acids Research  2004;33(Database Issue):D399-D402.
The Silkworm Knowledgebase (SilkDB) is a web-based repository for the curation, integration and study of silkworm genetic and genomic data. With the recent accomplishment of a ∼6X draft genome sequence of the domestic silkworm (Bombyx mori), SilkDB provides an integrated representation of the large-scale, genome-wide sequence assembly, cDNAs, clusters of expressed sequence tags (ESTs), transposable elements (TEs), mutants, single nucleotide polymorphisms (SNPs) and functional annotations of genes with assignments to InterPro domains and Gene Ontology (GO) terms. SilkDB also hosts a set of ESTs from Bombyx mandarina, a wild progenitor of B.mori, and a collection of genes from other Lepidoptera. Comparative analysis results between the domestic and wild silkworm, between B.mori and other Lepidoptera, and between B.mori and the two sequenced insects, fruitfly and mosquito, are displayed by using B.mori genome sequence as a reference framework. Designed as a basic platform, SilkDB strives to provide a comprehensive knowledgebase about the silkworm and present the silkworm genome and related information in systematic and graphical ways for the convenience of in-depth comparative studies. SilkDB is publicly accessible at http://silkworm.genomics.org.cn.
doi:10.1093/nar/gki116
PMCID: PMC540070  PMID: 15608225
9.  ChickVD: a sequence variation database for the chicken genome 
Nucleic Acids Research  2004;33(Database Issue):D438-D441.
Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the ‘Chicken Variation Database’ (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn.
doi:10.1093/nar/gki092
PMCID: PMC540046  PMID: 15608233

Results 1-9 (9)