Search tips
Search criteria

Results 1-16 (16)

Clipboard (0)
Year of Publication
Document Types
1.  MicroRNA expression profiling of the fifth-instar posterior silk gland of Bombyx mori 
BMC Genomics  2014;15(1):410.
The growth and development of the posterior silk gland and the biosynthesis of the silk core protein at the fifth larval instar stage of Bombyx mori are of paramount importance for silk production.
Here, aided by next-generation sequencing and microarry assay, we profile 1,229 microRNAs (miRNAs), including 728 novel miRNAs and 110 miRNA/miRNA* duplexes, of the posterior silk gland at the fifth larval instar. Target gene prediction yields 14,222 unique target genes from 1,195 miRNAs. Functional categorization classifies the targets into complex pathways that include both cellular and metabolic processes, especially protein synthesis and processing.
The enrichment of target genes in the ribosome-related pathway indicates that miRNAs may directly regulate translation. Our findings pave a way for further functional elucidation of these miRNAs and their targets in silk production.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-410) contains supplementary material, which is available to authorized users.
PMCID: PMC4045974  PMID: 24885170
MicroRNA; Silkworm; Posterior silk gland; Target gene
2.  VCGDB: a dynamic genome database of the Chinese population 
BMC Genomics  2014;15:265.
The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies.
We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software.
VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases.
PMCID: PMC4028056  PMID: 24708222
Chinese population; Dynamic genome; Database; 1000 Genomes Project; Big data
3.  Genome sequencing of high-penicillin producing industrial strain of Penicillium chrysogenum 
BMC Genomics  2014;15(Suppl 1):S11.
Due to the importance of Penicillium chrysogenum holding in medicine, the genome of low-penicillin producing laboratorial strain Wisconsin54-1255 had been sequenced and fully annotated. Through classical mutagenesis of Wisconsin54-1255, product titers and productivities of penicillin have dramatically increased, but what underlying genome structural variations is still little known. Therefore, genome sequencing of a high-penicillin producing industrial strain is very meaningful.
To reveal more insights into the genome structural variations of high-penicillin producing strain, we sequenced an industrial strain P. chrysogenum NCPC10086. By whole genome comparative analysis, we observed a large number of mutations, insertions and deletions, and structural variations. There are 69 new genes that not exist in the genome sequence of Wisconsin54-1255 and some of them are involved in energy metabolism, nitrogen metabolism and glutathione metabolism. Most importantly, we discovered a 53.7 Kb "new shift fragment" in a seven copies of determinative penicillin biosynthesis cluster in NCPC10086 and the arrangement type of amplified region is unique. Moreover, we presented two large-scale translocations in NCPC10086, containing genes involved energy, nitrogen metabolism and peroxysome pathway. At last, we found some non-synonymous mutations in the genes participating in homogentisate pathway or working as regulators of penicillin biosynthesis.
We provided the first high-quality genome sequence of industrial high-penicillin strain of P. chrysogenum and carried out a comparative genome analysis with a low-producing experimental strain. The genomic variations we discovered are related with energy metabolism, nitrogen metabolism and so on. These findings demonstrate the potential information for insights into the high-penicillin yielding mechanism and metabolic engineering in the future.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-S1-S11) contains supplementary material, which is available to authorized users.
PMCID: PMC4046689  PMID: 24564352
4.  Gene expression analysis of induced pluripotent stem cells from aneuploid chromosomal syndromes 
BMC Genomics  2013;14(Suppl 5):S8.
Human aneuploidy is the leading cause of early pregnancy loss, mental retardation, and multiple congenital anomalies. Due to the high mortality associated with aneuploidy, the pathophysiological mechanisms of aneuploidy syndrome remain largely unknown. Previous studies focused mostly on whether dosage compensation occurs, and the next generation transcriptomics sequencing technology RNA-seq is expected to eventually uncover the mechanisms of gene expression regulation and the related pathological phenotypes in human aneuploidy.
Using next generation transcriptomics sequencing technology RNA-seq, we profiled the transcriptomes of four human aneuploid induced pluripotent stem cell (iPSC) lines generated from monosomy × (Turner syndrome), trisomy 8 (Warkany syndrome 2), trisomy 13 (Patau syndrome), and partial trisomy 11:22 (Emanuel syndrome) as well as two umbilical cord matrix iPSC lines as euploid controls to examine how phenotypic abnormalities develop with aberrant karyotype. A total of 466 M (50-bp) reads were obtained from the six iPSC lines, and over 13,000 mRNAs were identified by gene annotation. Global analysis of gene expression profiles and functional analysis of differentially expressed (DE) genes were implemented. Over 5000 DE genes are determined between aneuploidy and euploid iPSCs respectively while 9 KEGG pathways are overlapped enriched in four aneuploidy samples.
Our results demonstrate that the extra or missing chromosome has extensive effects on the whole transcriptome. Functional analysis of differentially expressed genes reveals that the genes most affected in aneuploid individuals are related to central nervous system development and tumorigenesis.
PMCID: PMC3852284  PMID: 24564826
5.  Genetic variation and metabolic pathway intricacy govern the active compound content and quality of the Chinese medicinal plantLonicera japonicathunb 
BMC Genomics  2012;13:195.
Traditional Chinese medicine uses various herbs for the treatment of various diseases for thousands of years and it is now time to assess the characteristics and effectiveness of these medicinal plants based on modern genetic and molecular tools. The herb Flos Lonicerae Japonicae (FLJ or Lonicera japonica Thunb.) is used as an anti-inflammatory agent but the chemical quality of FLJ and its medicinal efficacy has not been consistent. Here, we analyzed the transcriptomes and metabolic pathways to evaluate the active medicinal compounds in FLJ and hope that this approach can be used for a variety of medicinal herbs in the future.
We assess transcriptomic differences between FLJ and L. japonica Thunb. var. chinensis (Watts) (rFLJ), which may explain the variable medicinal effects. We acquired transcriptomic data (over 100 million reads) from the two herbs, using RNA-seq method and the Illumina GAII platform. The transcriptomic profiles contain over 6,000 expressed sequence tags (ESTs) for each of the three flower development stages from FLJ, as well as comparable amount of ESTs from the rFLJ flower bud. To elucidate enzymatic divergence on biosynthetic pathways between the two varieties, we correlated genes and their expression profiles to known metabolic activities involving the relevant active compounds, including phenolic acids, flavonoids, terpenoids, and fatty acids. We also analyzed the diversification of genes that process the active compounds to distinguish orthologs and paralogs together with the pathways concerning biosynthesis of phenolic acid and its connections with other related pathways.
Our study provides both an initial description of gene expression profiles in flowers of FLJ and its counterfeit rFLJ and the enzyme pool that can be used to evaluate FLJ quality. Detailed molecular-level analyses allow us to decipher the relationship between metabolic pathways involved in processing active medicinal compounds and gene expressions of their processing enzymes. Our evolutionary analysis revealed specific functional divergence of orthologs and paralogs, which lead to variation in gene functions that govern the profile of active compounds.
PMCID: PMC3443457  PMID: 22607188
RNA-seq; Transcriptome; Active compounds; Synthetic pathways; Flos Lonicerae Japonicae
6.  Comparative genomic analysis of Streptococcus suis reveals significant genomic diversity among different serotypes 
BMC Genomics  2011;12:523.
Streptococcus suis (S. suis) is a major swine pathogen and an emerging zoonotic agent. Serotypes 1, 2, 3, 7, 9, 14 and 1/2 are the most prevalent serotypes of this pathogen. However, almost all studies were carried out on serotype 2 strains. Therefore, characterization of genomic features of other serotypes will be required to better understand their virulence potential and phylogenetic relationships among different serotypes.
Four Chinese S. suis strains belonging to serotypes 1, 7, 9 and 1/2 were sequenced using a rapid, high-throughput approach. Based on the 13 corresponding serotype strains, including 9 previously completed genomes of this bacterium, a full comparative genomic analysis was performed. The results provide evidence that (i) the pan-genome of this species is open and the size increases with addition of new sequenced genomes, (ii) strains of serotypes 1, 3, 7 and 9 are phylogenetically distinct from serotype 2 strains, but all serotype 2 strains, plus the serotype 1/2 and 14 strains, are very closely related. (iii) all these strains, except for the serotype 1 strain, could harbor a recombinant site for a pathogenic island (89 K) mediated by conjugal transfer, and may have the ability to gain the 89 K sequence.
There is significant genomic diversity among different strains in S. suis, and the gain and loss of large amount of genes are involved in shaping their genomes. This is indicated by (i) pairwise gene content comparisons between every pair of these strains, (ii) the open pan-genome of this species, (iii) the observed indels, invertions and rearrangements in the collinearity analysis. Phylogenetic relationships may be associated with serotype, as serotype 2 strains are closely related and distinct from other serotypes like 1, 3, 7 and 9, but more strains need to be sequenced to confirm this.
PMCID: PMC3227697  PMID: 22026465
7.  Comparative analysis of mitochondrial genomes between a wheat K-type cytoplasmic male sterility (CMS) line and its maintainer line 
BMC Genomics  2011;12:163.
Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line.
The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants.
The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.
PMCID: PMC3079663  PMID: 21443807
8.  The complete genome of Zunongwangia profunda SM-A87 reveals its adaptation to the deep-sea environment and ecological role in sedimentary organic nitrogen degradation 
BMC Genomics  2010;11:247.
Zunongwangia profunda SM-A87, which was isolated from deep-sea sediment, is an aerobic, gram-negative bacterium that represents a new genus of Flavobacteriaceae. This is the first sequenced genome of a deep-sea bacterium from the phylum Bacteroidetes.
The Z. profunda SM-A87 genome has a single 5 128 187-bp circular chromosome with no extrachromosomal elements and harbors 4 653 predicted protein-coding genes. SM-A87 produces a large amount of capsular polysaccharides and possesses two polysaccharide biosynthesis gene clusters. It has a total of 130 peptidases, 61 of which have signal peptides. In addition to extracellular peptidases, SM-A87 also has various extracellular enzymes for carbohydrate, lipid and DNA degradation. These extracellular enzymes suggest that the bacterium is able to hydrolyze organic materials in the sediment, especially carbohydrates and proteinaceous organic nitrogen. There are two clustered regularly interspaced short palindromic repeats in the genome, but their spacers do not match any sequences in the public sequence databases. SM-A87 is a moderate halophile. Our protein isoelectric point analysis indicates that extracellular proteins have lower predicted isoelectric points than intracellular proteins. SM-A87 accumulates organic osmolytes in the cell, so its extracelluar proteins are more halophilic than its intracellular proteins.
Here, we present the first complete genome of a deep-sea sedimentary bacterium from the phylum Bacteroidetes. The genome analysis shows that SM-A87 has some common features of deep-sea bacteria, as well as an important capacity to hydrolyze sedimentary organic nitrogen.
PMCID: PMC2864250  PMID: 20398413
9.  Genome evolution driven by host adaptations results in a more virulent and antimicrobial-resistant Streptococcus pneumoniae serotype 14 
BMC Genomics  2009;10:158.
Streptococcus pneumoniae serotype 14 is one of the most common pneumococcal serotypes that cause invasive pneumococcal diseases worldwide. Serotype 14 often expresses resistance to a variety of antimicrobial agents, resulting in difficulties in treatment. To gain insight into the evolution of virulence and antimicrobial resistance traits in S. pneumoniae from the genome level, we sequenced the entire genome of a serotype 14 isolate (CGSP14), and carried out comprehensive comparison with other pneumococcal genomes. Multiple serotype 14 clinical isolates were also genotyped by multilocus sequence typing (MLST).
Comparative genomic analysis revealed that the CGSP14 acquired a number of new genes by horizontal gene transfer (HGT), most of which were associated with virulence and antimicrobial resistance and clustered in mobile genetic elements. The most remarkable feature is the acquisition of two conjugative transposons and one resistance island encoding eight resistance genes. Results of MLST suggested that the major driving force for the genome evolution is the environmental drug pressure.
The genome sequence of S. pneumoniae serotype 14 shows a bacterium with rapid adaptations to its lifecycle in human community. These include a versatile genome content, with a wide range of mobile elements, and chromosomal rearrangement; the latter re-balanced the genome after events of HGT.
PMCID: PMC2678160  PMID: 19361343
10.  Analysis of tarantula skeletal muscle protein sequences and identification of transcriptional isoforms 
BMC Genomics  2009;10:117.
Tarantula has been used as a model system for studying skeletal muscle structure and function, yet data on the genes expressed in tarantula muscle are lacking.
We constructed a cDNA library from Aphonopelma sp. (Tarantula) skeletal muscle and got 2507 high-quality 5'ESTs (expressed sequence tags) from randomly picked clones. EST analysis showed 305 unigenes, among which 81 had more than 2 ESTs. Twenty abundant unigenes had matches to skeletal muscle-related genes including actin, myosin, tropomyosin, troponin-I, T and C, paramyosin, muscle LIM protein, muscle protein 20, a-actinin and tandem Ig/Fn motifs (found in giant sarcomere-related proteins). Matches to myosin light chain kinase and calponin were also identified. These results support the existence of both actin-linked and myosin-linked regulation in tarantula skeletal muscle.
We have predicted full-length as well as partial cDNA sequences both experimentally and computationally for myosin heavy and light chains, actin, tropomyosin, and troponin-I, T and C, and have deduced the putative peptides. A preliminary analysis of the structural and functional properties was also carried out. Sequence similarities suggested multiple isoforms of most myofibrillar proteins, supporting the generality of multiple isoforms known from previous muscle sequence studies. This may be related to a mix of muscle fiber types.
The present study serves as a basis for defining the transcriptome of tarantula skeletal muscle, for future in vitro expression of tarantula proteins, and for interpreting structural and functional observations in this model species.
PMCID: PMC2674065  PMID: 19298669
11.  A gene catalogue for post-diapause development of an anhydrobiotic arthropod Artemia franciscana 
BMC Genomics  2009;10:52.
Diapause is a reversible state of developmental suspension and found among diverse taxa, from plants to animals, including marsupials and some other mammals. Although previous work has accumulated ample data, the molecular mechanism underlying diapause and reactivation from it remain elusive.
Using Artemia franciscana, a model organism to study the development of post-diapause embryos in Arthropod, we sequenced random clones up to a total of 28,039 ESTs from four cDNA libraries made from dehydrated cysts and three time points after rehydration/reactivation, which were assembled into 8,018 unigene clusters. We identified 324 differentially-expressed genes (DEGs, P < 0.05) based on pairwise comparisons of the four cDNA libraries. We identified a group of genes that are involved in an anti-water-deficit system, including proteases, protease inhibitors, heat shock proteins, and several novel members of the late embryogenesis abundant (LEA) protein family. In addition, we classified most of the up-regulated genes after cyst reactivation into metabolism, biosynthesis, transcription, and translation, and this result is consistent with the rapid development of the embryo. Some of the specific expressions of DEGs were confirmed experimentally based on quantitative real-time PCR.
We found that the first 5-hour period after rehydration is most important for embryonic reactivation of Artemia. As the total number of expressed genes increases significantly, the majority of DEGs were also identified in this period, including a group of water-deficient-induced genes. A group of genes with similar functions have been described in plant seeds; for instance, one of the novel LEA members shares ~70% amino-acid identity with an Arabidopsis EM (embryonic abundant) protein, the closest animal relative to plant LEA families identified thus far. Our findings also suggested that not only nutrition, but also mRNAs are produced and stored during cyst formation to support rapid development after reactivation.
PMCID: PMC2649162  PMID: 19173719
12.  How many human genes can be defined as housekeeping with current expression data? 
BMC Genomics  2008;9:172.
Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached.
We collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates.
We concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.
PMCID: PMC2396180  PMID: 18416810
13.  A complete mitochondrial genome sequence of the wild two-humped camel (Camelus bactrianus ferus): an evolutionary history of camelidae 
BMC Genomics  2007;8:241.
The family Camelidae that evolved in North America during the Eocene survived with two distinct tribes, Camelini and Lamini. To investigate the evolutionary relationship between them and to further understand the evolutionary history of this family, we determined the complete mitochondrial genome sequence of the wild two-humped camel (Camelus bactrianus ferus), the only wild survivor of the Old World camel.
The mitochondrial genome sequence (16,680 bp) from C. bactrianus ferus contains 13 protein-coding, two rRNA, and 22 tRNA genes as well as a typical control region; this basic structure is shared by all metazoan mitochondrial genomes. Its protein-coding region exhibits codon usage common to all mammals and possesses the three cryptic stop codons shared by all vertebrates. C. bactrianus ferus together with the rest of mammalian species do not share a triplet nucleotide insertion (GCC) that encodes a proline residue found only in the nd1 gene of the New World camelid Lama pacos. This lineage-specific insertion in the L. pacos mtDNA occurred after the split between the Old and New World camelids suggests that it may have functional implication since a proline insertion in a protein backbone usually alters protein conformation significantly, and nd1 gene has not been seen as polymorphic as the rest of ND family genes among camelids. Our phylogenetic study based on complete mitochondrial genomes excluding the control region suggested that the divergence of the two tribes may occur in the early Miocene; it is much earlier than what was deduced from the fossil record (11 million years). An evolutionary history reconstructed for the family Camelidae based on cytb sequences suggested that the split of bactrian camel and dromedary may have occurred in North America before the tribe Camelini migrated from North America to Asia.
Molecular clock analysis of complete mitochondrial genomes from C. bactrianus ferus and L. pacos suggested that the two tribes diverged from their common ancestor about 25 million years ago, much earlier than what was predicted based on fossil records.
PMCID: PMC1939714  PMID: 17640355
14.  Transcriptome analysis of Deinagkistrodon acutus venomous gland focusing on cellular structure and functional aspects using expressed sequence tags 
BMC Genomics  2006;7:152.
The snake venom gland is a specialized organ, which synthesizes and secretes the complex and abundant toxin proteins. Though gene expression in the snake venom gland has been extensively studied, the focus has been on the components of the venom. As far as the molecular mechanism of toxin secretion and metabolism is concerned, we still knew a little. Therefore, a fundamental question being arisen is what genes are expressed in the snake venom glands besides many toxin components?
To examine extensively the transcripts expressed in the venom gland of Deinagkistrodon acutus and unveil the potential of its products on cellular structure and functional aspects, we generated 8696 expressed sequence tags (ESTs) from a non-normalized cDNA library. All ESTs were clustered into 3416 clusters, of which 40.16% of total ESTs belong to recognized toxin-coding sequences; 39.85% are similar to cellular transcripts; and 20.00% have no significant similarity to any known sequences. By analyzing cellular functional transcripts, we found high expression of some venom related genes and gland-specific genes, such as calglandulin EF-hand protein gene and protein disulfide isomerase gene. The transcripts of creatine kinase and NADH dehydrogenase were also identified at high level. Moreover, abundant cellular structural proteins similar to mammalian muscle tissues were also identified. The phylogenetic analysis of two snake venom toxin families of group III metalloproteinase and serine protease in suborder Colubroidea showed an early single recruitment event in the viperids evolutionary process.
Gene cataloguing and profiling of the venom gland of Deinagkistrodon acutus is an essential requisite to provide molecular reagents for functional genomic studies needed for elucidating mechanisms of action of toxins and surveying physiological events taking place in the very specialized secretory tissue. So this study provides a first global view of the genetic programs for the venom gland of Deinagkistrodon acutus described so far and an insight into molecular mechanism of toxin secreting.
All sequences data reported in this paper have been submitted into the public database [GenBank: DV556511-DV565206].
PMCID: PMC1525187  PMID: 16776837
15.  LocustDB: a relational database for the transcriptome and biology of the migratory locust (Locusta migratoria) 
BMC Genomics  2006;7:11.
The migratory locust (Locusta migratoria) is an orthopteran pest and a representative member of hemimetabolous insects for biological studies. Its transcriptomic data provide invaluable information for molecular entomology and pave a way for the comparative research of other medically, agronomically, and ecologically relevant insects. We developed the first transcriptomic database of the locust (LocustDB), building necessary infrastructures to integrate, organize, and retrieve data that are either currently available or to be acquired in the future.
LocustDB currently hosts 45,474 high-quality EST sequences from the locust, which were assembled into 12,161 unigenes. It, through user-friendly web interfaces, allows investigators to freely access sequence data, including homologous/orthologous sequences, functional annotations, and pathway analysis, based on conserved orthologous groups (COG), gene ontology (GO), protein domain (InterPro), and functional pathways (KEGG). It also provides information from comparative analysis based on data from the migratory locust and five other invertebrate species, including the silkworm, the honeybee, the fruitfly, the mosquito and the nematode. The website address of LocustDB is .
LocustDB starts with the first transcriptome information for an orthopteran and hemimetabolous insect and will be extended to provide a framework for incorporating in-coming genomic data of relevant insect groups and a workbench for cross-species comparative studies.
PMCID: PMC1388198  PMID: 16426458
16.  Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing 
BMC Genomics  2005;6:70.
Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls.
We have generated ~3.84 million shotgun sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis.
The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human.
The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.
PMCID: PMC1142312  PMID: 15885146

Results 1-16 (16)