Search tips
Search criteria

Results 1-20 (20)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Potential impact on kidney infection: a whole-genome analysis of Leptospira santarosai serovar Shermani 
Leptospira santarosai serovar Shermani is the most frequently encountered serovar, and it causes leptospirosis and tubulointerstitial nephritis in Taiwan. This study aims to complete the genome sequence of L. santarosai serovar Shermani and analyze the transcriptional responses of L. santarosai serovar Shermani to renal tubular cells. To assemble this highly repetitive genome, we combined reads that were generated from four next-generation sequencing platforms by using hybrid assembly approaches to finish two-chromosome contiguous sequences without gaps by validating the data with optical restriction maps and Sanger sequencing. Whole-genome comparison studies revealed a 28-kb region containing genes that encode transposases and hypothetical proteins in L. santarosai serovar Shermani, but this region is absent in other pathogenic Leptospira spp. We found that lipoprotein gene expression in both L. santarosai serovar Shermani and L. interrogans serovar Copenhageni were upregulated upon interaction with renal tubular cells, and LSS19962, a L. santarosai serovar Shermani-specific gene within a 28-kb region that encodes hypothetical proteins, was upregulated in L. santarosai serovar Shermani-infected renal tubular cells. Lipoprotein expression during leptospiral infection might facilitate the interactions of leptospires within kidneys. The availability of the whole-genome sequence of L. santarosai serovar Shermani would make it the first completed sequence of this species, and its comparison with that of other Leptospira spp. may provide invaluable information for further studies in leptospiral pathogenesis.
PMCID: PMC4274889
hypothetical proteins; Leptospira santarosai; leptospirosis; repetitive genome; whole-genome sequencing
2.  A cytoplasmic RNA virus generates functional viral small RNAs and regulates viral IRES activity in mammalian cells 
Nucleic Acids Research  2014;42(20):12789-12805.
The roles of virus-derived small RNAs (vsRNAs) have been studied in plants and insects. However, the generation and function of small RNAs from cytoplasmic RNA viruses in mammalian cells remain unexplored. This study describes four vsRNAs that were detected in enterovirus 71-infected cells using next-generation sequencing and northern blots. Viral infection produced substantial levels (>105 copy numbers per cell) of vsRNA1, one of the four vsRNAs. We also demonstrated that Dicer is involved in vsRNA1 generation in infected cells. vsRNA1 overexpression inhibited viral translation and internal ribosomal entry site (IRES) activity in infected cells. Conversely, blocking vsRNA1 enhanced viral yield and viral protein synthesis. We also present evidence that vsRNA1 targets stem-loop II of the viral 5′ untranslated region and inhibits the activity of the IRES through this sequence-specific targeting. Our study demonstrates the ability of a cytoplasmic RNA virus to generate functional vsRNA in mammalian cells. In addition, we also demonstrate a potential novel mechanism for a positive-stranded RNA virus to regulate viral translation: generating a vsRNA that targets the IRES.
PMCID: PMC4227785  PMID: 25352551
3.  Novel Approach for Coexpression Analysis of E2F1–3 and MYC Target Genes in Chronic Myelogenous Leukemia 
BioMed Research International  2014;2014:439840.
Background. Chronic myelogenous leukemia (CML) is characterized by tremendous amount of immature myeloid cells in the blood circulation. E2F1–3 and MYC are important transcription factors that form positive feedback loops by reciprocal regulation in their own transcription processes. Since genes regulated by E2F1–3 or MYC are related to cell proliferation and apoptosis, we wonder if there exists difference in the coexpression patterns of genes regulated concurrently by E2F1–3 and MYC between the normal and the CML states. Results. We proposed a method to explore the difference in the coexpression patterns of those candidate target genes between the normal and the CML groups. A disease-specific cutoff point for coexpression levels that classified the coexpressed gene pairs into strong and weak coexpression classes was identified. Our developed method effectively identified the coexpression pattern differences from the overall structure. Moreover, we found that genes related to the cell adhesion and angiogenesis properties were more likely to be coexpressed in the normal group when compared to the CML group. Conclusion. Our findings may be helpful in exploring the underlying mechanisms of CML and provide useful information in cancer treatment.
PMCID: PMC4142389  PMID: 25180182
4.  ChIPseek, a web-based analysis tool for ChIP data 
BMC Genomics  2014;15(1):539.
Chromatin is a dynamic but highly regulated structure. DNA-binding proteins such as transcription factors, epigenetic and chromatin modifiers are responsible for regulating specific gene expression pattern and may result in different phenotypes. To reveal the identity of the proteins associated with the specific region on DNA, chromatin immunoprecipitation (ChIP) is the most widely used technique. ChIP assay followed by next generation sequencing (ChIP-seq) or microarray (ChIP-chip) is often used to study patterns of protein-binding profiles in different cell types and in cancer samples on a genome-wide scale. However, only a limited number of bioinformatics tools are available for ChIP datasets analysis.
We present ChIPseek, a web-based tool for ChIP data analysis providing summary statistics in graphs and offering several commonly demanded analyses. ChIPseek can provide statistical summary of the dataset including histogram of peak length distribution, histogram of distances to the nearest transcription start site (TSS), and pie chart (or bar chart) of genomic locations for users to have a comprehensive view on the dataset for further analysis. For examining the potential functions of peaks, ChIPseek provides peak annotation, visualization of peak genomic location, motif identification, sequence extraction, and comparison between datasets. Beyond that, ChIPseek also offers users the flexibility to filter peaks and re-analyze the filtered subset of peaks. ChIPseek supports 20 different genome assemblies for 12 model organisms including human, mouse, rat, worm, fly, frog, zebrafish, chicken, yeast, fission yeast, Arabidopsis, and rice. We use demo datasets to demonstrate the usage and intuitive user interface of ChIPseek.
ChIPseek provides a user-friendly interface for biologists to analyze large-scale ChIP data without requiring any programing skills. All the results and figures produced by ChIPseek can be downloaded for further analysis. The analysis tools built into ChIPseek, especially the ones for selecting and examine a subset of peaks from ChIP data, provides invaluable helps for exploring the high through-put data from either ChIP-seq or ChIP-chip. ChIPseek is freely available at
PMCID: PMC4092222  PMID: 24974934
ChIP-seq; ChIP-chip; Analysis tool; Web-services; Peak annotation; Motif identification; Filter tools; Comparison
5.  Interference of Co-Amplified Nuclear Mitochondrial DNA Sequences on the Determination of Human mtDNA Heteroplasmy by Using the SURVEYOR Nuclease and the WAVE HS System 
PLoS ONE  2014;9(3):e92817.
High-sensitivity and high-throughput mutation detection techniques are useful for screening the homoplasmy or heteroplasmy status of mitochondrial DNA (mtDNA), but might be susceptible to interference from nuclear mitochondrial DNA sequences (NUMTs) co-amplified during polymerase chain reaction (PCR). In this study, we first evaluated the platform of SURVEYOR Nuclease digestion of heteroduplexed DNA followed by the detection of cleaved DNA by using the WAVE HS System (SN/WAVE-HS) for detecting human mtDNA variants and found that its performance was slightly better than that of denaturing high-performance liquid chromatography (DHPLC). The potential interference from co-amplified NUMTs on screening mtDNA heteroplasmy when using these 2 highly sensitive techniques was further examined by using 2 published primer sets containing a total of 65 primer pairs, which were originally designed to be used with one of the 2 techniques. We confirmed that 24 primer pairs could amplify NUMTs by conducting bioinformatic analysis and PCR with the DNA from 143B-ρ0 cells. Using mtDNA extracted from the mitochondria of human 143B cells and a cybrid line with the nuclear background of 143B-ρ0 cells, we demonstrated that NUMTs could affect the patterns of chromatograms for cell DNA during SN-WAVE/HS analysis of mtDNA, leading to incorrect judgment of mtDNA homoplasmy or heteroplasmy status. However, we observed such interference only in 2 of 24 primer pairs selected, and did not observe such effects during DHPLC analysis. These results indicate that NUMTs can affect the screening of low-level mtDNA variants, but it might not be predicted by bioinformatic analysis or the amplification of DNA from 143B-ρ0 cells. Therefore, using purified mtDNA from cultured cells with proven purity to evaluate the effects of NUMTs from a primer pair on mtDNA detection by using PCR-based high-sensitivity methods prior to the use of a primer pair in real studies would be a more practical strategy.
PMCID: PMC3963942  PMID: 24664244
6.  Identification and characterisation of microRNAs in young adults of Angiostrongylus cantonensis via a deep-sequencing approach 
Memórias do Instituto Oswaldo Cruz  2013;108(6):699-706.
Angiostrongylus cantonensis is an important causative agent of eosinophilic meningitis and eosinophilic meningoencephalitis in humans. MicroRNAs (miRNAs) are small non-coding RNAs that participate in a wide range of biological processes. This study employed a deep-sequencing approach to study miRNAs from young adults of A. cantonensis. Based on 16,880,456 high-quality reads, 252 conserved mature miRNAs including 10 antisense miRNAs that belonging to 90 families, together with 10 antisense miRNAs were identified and characterised. Among these sequences, 53 miRNAs from 25 families displayed 50 or more reads. The conserved miRNA families were divided into four groups according to their phylogenetic distribution and a total of nine families without any members showing homology to other nematodes or adult worms were identified. Stem-loop real-time polymerase chain reaction analysis of aca-miR-1-1 and aca-miR-71-1 demonstrated that their level of expression increased dramatically from infective larvae to young adults and then decreased in adult worms, with the male worms exhibiting significantly higher levels of expression than female worms. These findings provide information related to the regulation of gene expression during the growth, development and pathogenesis of young adults of A. cantonensis.
PMCID: PMC3970689  PMID: 24037191
Angiostrongylus cantonensis; deep-sequencing approach; microRNA; stem-loop real-time polymerase chain reaction; young adults
7.  Transcriptome profiling of the fifth-stage larvae of Angiostrongylus cantonensis by next-generation sequencing 
Parasitology Research  2013;112(9):3193-3202.
Angiostrongylus cantonensis is an important zoonotic nematode. It is the causative agent of eosinophilic meningitis and eosinophilic meningoencephalitis in humans. However, information of this parasite at the genomic level is very limited. In the present study, the transcriptomic profiles of the fifth-stage larvae (L5) of A. cantonensis were investigated by next-generation sequencing (NGS). In the NGS database established from the larvae isolated from the brain of Sprague–Dawley rats, 31,487 unique genes with a mean length of 617 nucleotides were assembled. These genes were found to have a 46.08 % significant similarity to Caenorhabditis elegans by BLASTx. They were then compared with the expressed sequence tags of 18 other nematodes, and significant matches of 36.09–59.12 % were found. Among these genes, 3,338 were found to participate in 124 Kyoto Encyclopedia of Genes and Genomes pathways. These pathways included 1,514 metabolisms, 846 genetic information processing, 358 environmental information processing, 264 cellular processes, and 91 organismal systems. Analysis of 30,816 sequences with the gene ontology database indicated that their annotations included 5,656 biological processes (3,364 cellular processes, 3,061 developmental processes, and 3,191 multicellular organismal processes), 7,218 molecular functions (4,597 binding and 3,084 catalytic activities), and 4,719 cellular components (4,459 cell parts and 4,466 cells). Moreover, stress-related genes (112 heat stress and 33 oxidation stress) and genes for proteases (159) were not uncommon. This study is the first NGS-based study to set up a transcriptomic database of A. cantonensis L5. The results provide new insights into the survival, development, and host–parasite interactions of this blood-feeding nematode.
Electronic supplementary material
The online version of this article (doi:10.1007/s00436-013-3495-z) contains supplementary material, which is available to authorized users.
PMCID: PMC3742962  PMID: 23828188
8.  Comparative Transcriptomic and Proteomic Analyses of Trichomonas vaginalis following Adherence to Fibronectin 
Infection and Immunity  2012;80(11):3900-3911.
The morphological transformation of Trichomonas vaginalis from an ellipsoid form in batch culture to an adherent amoeboid form results from the contact of parasites with vaginal epithelial cells and with immobilized fibronectin (FN), a basement membrane component. This suggests host signaling of the parasite. We applied integrated transcriptomic and proteomic approaches to investigate the molecular responses of T. vaginalis upon binding to FN. A transcriptome analysis was performed by using large-scale expressed-sequence-tag (EST) sequencing. A total of 20,704 ESTs generated from batch culture (trophozoite-EST) versus FN-amoeboid trichomonad (FN-EST) cDNA libraries were analyzed. The FN-EST library revealed decreased amounts of transcripts that were of lower abundance in the trophozoite-EST library. There was a shift by FN-bound organisms to the expression of transcripts encoding essential proteins, possibly indicating the expression of genes for adaptation to the morphological changes needed for the FN-adhesive processes. In addition, we identified 43 differentially expressed proteins in the proteomes of FN-bound and unbound trichomonads. Among these proteins, cysteine peptidase, glyceraldehyde-3-phosphate dehydrogenase (an FN-binding protein), and stress-related proteins were upregulated in the FN-adherent cells. Stress-related genes and proteins were highly expressed in both the transcriptome and proteome of FN-bound organisms, implying that these genes and proteins may play critical roles in the response to adherence. This is the first report of a comparative proteomic and transcriptomic analysis after the binding of T. vaginalis to FN. This approach may lead to the discovery of novel virulence genes and affirm the role of genes involved in disease pathogenesis. This knowledge will permit a greater understanding of the complex host-parasite interplay.
PMCID: PMC3486053  PMID: 22927047
9.  Peripheral Immune Cell Gene Expression Changes in Advanced Non-Small Cell Lung Cancer Patients Treated with First Line Combination Chemotherapy 
PLoS ONE  2013;8(2):e57053.
Increasing evidence has shown that immune surveillance is compromised in a tumor-promoting microenvironment for patients with non-small cell lung cancer (NSCLC), and can be restored by appropriate chemotherapy.
To test this hypothesis, we analyzed microarray gene expression profiles of peripheral blood mononuclear cells from 30 patients with newly-diagnosed advanced stage NSCLC, and 20 age-, sex-, and co-morbidity-matched healthy controls. All the patients received a median of four courses of chemotherapy with cisplatin and gemcitabine for a 28-day cycle as first line treatment.
Sixty-nine differentially expressed genes between the patients and controls, and 59 differentially expressed genes before and after chemotherapy were identified. The IL4 pathway was significantly enriched in both tumor progression and chemotherapy signatures. CXCR4 and IL2RG were down-regulated, while DOK2 and S100A15 were up-regulated in the patients, and expressions of all four genes were partially or totally reversed after chemotherapy. Real-time quantitative RT-PCR for the four up-regulated (S100A15, DOK2) and down-regulated (TLR7, TOP1MT) genes in the patients, and the six up-regulated (TLR7, CRISP3, TOP1MT) and down-regulated (S100A15, DOK2, IL2RG) genes after chemotherapy confirmed the validity of the microarray results. Further immunohistochemical analysis of the paraffin-embedded lung cancer tissues identified strong S100A15 nuclear staining not only in stage IV NSCLC as compared to stage IIIB NSCLC (p = 0.005), but also in patients with stable or progressive disease as compared to those with a partial response (p = 0.032). A high percentage of S100A15 nuclear stained cells (HR 1.028, p = 0.01) was the only independent factor associated with three-year overall mortality.
Our results suggest a potential role of the IL4 pathway in immune surveillance of advanced stage NSCLC, and immune potentiation of combination chemotherapy. S100A15 may serve as a potential biomarker for tumor staging, and a predictor of poor prognosis in NSCLC.
PMCID: PMC3581559  PMID: 23451142
10.  Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling 
Genome Biology  2013;14(2):R11.
The Amoebozoa constitute one of the primary divisions of eukaryotes, encompassing taxa of both biomedical and evolutionary importance, yet its genomic diversity remains largely unsampled. Here we present an analysis of a whole genome assembly of Acanthamoeba castellanii (Ac) the first representative from a solitary free-living amoebozoan.
Ac encodes 15,455 compact intron-rich genes, a significant number of which are predicted to have arisen through inter-kingdom lateral gene transfer (LGT). A majority of the LGT candidates have undergone a substantial degree of intronization and Ac appears to have incorporated them into established transcriptional programs. Ac manifests a complex signaling and cell communication repertoire, including a complete tyrosine kinase signaling toolkit and a comparable diversity of predicted extracellular receptors to that found in the facultatively multicellular dictyostelids. An important environmental host of a diverse range of bacteria and viruses, Ac utilizes a diverse repertoire of predicted pattern recognition receptors, many with predicted orthologous functions in the innate immune systems of higher organisms.
Our analysis highlights the important role of LGT in the biology of Ac and in the diversification of microbial eukaryotes. The early evolution of a key signaling facility implicated in the evolution of metazoan multicellularity strongly argues for its emergence early in the Unikont lineage. Overall, the availability of an Ac genome should aid in deciphering the biology of the Amoebozoa and facilitate functional genomic studies in this important model organism and environmental host.
PMCID: PMC4053784  PMID: 23375108
11.  FastAnnotator- an efficient transcript annotation web tool 
BMC Genomics  2012;13(Suppl 7):S9.
Recent developments in high-throughput sequencing (HTS) technologies have made it feasible to sequence the complete transcriptomes of non-model organisms or metatranscriptomes from environmental samples. The challenge after generating hundreds of millions of sequences is to annotate these transcripts and classify the transcripts based on their putative functions. Because many biological scientists lack the knowledge to install Linux-based software packages or maintain databases used for transcript annotation, we developed an automatic annotation tool with an easy-to-use interface.
To elucidate the potential functions of gene transcripts, we integrated well-established annotation tools: Blast2GO, PRIAM and RPS BLAST in a web-based service, FastAnnotator, which can assign Gene Ontology (GO) terms, Enzyme Commission numbers (EC numbers) and functional domains to query sequences.
Using six transcriptome sequence datasets as examples, we demonstrated the ability of FastAnnotator to assign functional annotations. FastAnnotator annotated 88.1% and 81.3% of the transcripts from the well-studied organisms Caenorhabditis elegans and Streptococcus parasanguinis, respectively. Furthermore, FastAnnotator annotated 62.9%, 20.4%, 53.1% and 42.0% of the sequences from the transcriptomes of sweet potato, clam, amoeba, and Trichomonas vaginalis, respectively, which lack reference genomes. We demonstrated that FastAnnotator can complete the annotation process in a reasonable amount of time and is suitable for the annotation of transcriptomes from model organisms or organisms for which annotated reference genomes are not avaiable.
The sequencing process no longer represents the bottleneck in the study of genomics, and automatic annotation tools have become invaluable as the annotation procedure has become the limiting step. We present FastAnnotator, which was an automated annotation web tool designed to efficiently annotate sequences with their gene functions, enzyme functions or domains. FastAnnotator is useful in transcriptome studies and especially for those focusing on non-model organisms or metatranscriptomes. FastAnnotator does not require local installation and is freely available at
PMCID: PMC3521244  PMID: 23281853
12.  Transcriptomic Identification of Iron-Regulated and Iron-Independent Gene Copies within the Heavily Duplicated Trichomonas vaginalis Genome 
Genome Biology and Evolution  2012;4(10):1017-1029.
Gene duplication is an important evolutionary mechanism and no eukaryote has more duplicated gene families than the parasitic protist Trichomonas vaginalis. Iron is an essential nutrient for Trichomonas and plays a pivotal role in the establishment of infection, proliferation, and virulence. To gain insight into the role of iron in T. vaginalis gene expression and genome evolution, we screened iron-regulated genes using an oligonucleotide microarray for T. vaginalis and by comparative EST (expressed sequence tag) sequencing of cDNA libraries derived from trichomonads cultivated under iron-rich (+Fe) and iron-restricted (−Fe) conditions. Among 19,000 ESTs from both libraries, we identified 336 iron-regulated genes, of which 165 were upregulated under +Fe conditions and 171 under −Fe conditions. The microarray analysis revealed that 195 of 4,950 unique genes were differentially expressed. Of these, 117 genes were upregulated under +Fe conditions and 78 were upregulated under −Fe conditions. The results of both methods were congruent concerning the regulatory trends and the representation of gene categories. Under +Fe conditions, the expression of proteins involved in carbohydrate metabolism, particularly in the energy metabolism of hydrogenosomes, and in methionine catabolism was increased. The iron–sulfur cluster assembly machinery and certain cysteine proteases are of particular importance among the proteins upregulated under −Fe conditions. A unique feature of the T. vaginalis genome is the retention during evolution of multiple paralogous copies for a majority of all genes. Although the origins and reasons for this gene expansion remain unclear, the retention of multiple gene copies could provide an opportunity to evolve differential expression during growth in variable environmental conditions. For genes whose expression was affected by iron, we found that iron influenced the expression of only some of the paralogous copies, whereas the expression of the other paralogs was iron independent. This finding indicates a very stringent regulation of the differentially expressed paralogous genes in response to changes in the availability of exogenous nutrients and provides insight into the evolutionary rationale underlying massive paralog retention in the Trichomonas genome.
PMCID: PMC3490414  PMID: 22975721
gene duplication; iron; microarrays; EST analysis
13.  Complete Genome and Transcriptomes of Streptococcus parasanguinis FW213: Phylogenic Relations and Potential Virulence Mechanisms 
PLoS ONE  2012;7(4):e34769.
Streptococcus parasanguinis, a primary colonizer of the tooth surface, is also an opportunistic pathogen for subacute endocarditis. The complete genome of strain FW213 was determined using the traditional shotgun sequencing approach and further refined by the transcriptomes of cells in early exponential and early stationary growth phases in this study. The transcriptomes also discovered 10 transcripts encoding known hypothetical proteins, one pseudogene, five transcripts matched to the Rfam and additional 87 putative small RNAs within the intergenic regions defined by the GLIMMER analysis. The genome contains five acquired genomic islands (GIs) encoding proteins which potentially contribute to the overall pathogenic capacity and fitness of this microbe. The differential expression of the GIs and various open reading frames outside the GIs at the two growth phases suggested that FW213 possess a range of mechanisms to avoid host immune clearance, to colonize host tissues, to survive within oral biofilms and to overcome various environmental insults. Furthermore, the comparative genome analysis of five S. parasanguinis strains indicates that albeit S. parasanguinis strains are highly conserved, variations in the genome content exist. These variations may reflect differences in pathogenic potential between the strains.
PMCID: PMC3329508  PMID: 22529932
14.  DSAP: deep-sequencing small RNA analysis pipeline 
Nucleic Acids Research  2010;38(Web Server issue):W385-W391.
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (; and (iv) known miRNA matching: detection of known miRNAs in miRBase ( based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log2-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at
PMCID: PMC2896168  PMID: 20478825
15.  Trichomonas vaginalis vast BspA-like gene family: evidence for functional diversity from structural organisation and transcriptomics 
BMC Genomics  2010;11:99.
Trichomonas vaginalis is the most common non-viral human sexually transmitted pathogen and importantly, contributes to facilitating the spread of HIV. Yet very little is known about its surface and secreted proteins mediating interactions with, and permitting the invasion and colonisation of, the host mucosa. Initial annotations of T. vaginalis genome identified a plethora of candidate extracellular proteins.
Data mining of the T. vaginalis genome identified 911 BspA-like entries (TvBspA) sharing TpLRR-like leucine-rich repeats, which represent the largest gene family encoding potential extracellular proteins for the pathogen. A broad range of microorganisms encoding BspA-like proteins was identified and these are mainly known to live on mucosal surfaces, among these T. vaginalis is endowed with the largest gene family. Over 190 TvBspA proteins with inferred transmembrane domains were characterised by a considerable structural diversity between their TpLRR and other types of repetitive sequences and two subfamilies possessed distinct classic sorting signal motifs for endocytosis. One TvBspA subfamily also shared a glycine-rich protein domain with proteins from Clostridium difficile pathogenic strains and C. difficile phages. Consistent with the hypothesis that TvBspA protein structural diversity implies diverse roles, we demonstrated for several TvBspA genes differential expression at the transcript level in different growth conditions. Identified variants of repetitive segments between several TvBspA paralogues and orthologues from two clinical isolates were also consistent with TpLRR and other repetitive sequences to be functionally important. For one TvBspA protein cell surface expression and antibody responses by both female and male T. vaginalis infected patients were also demonstrated.
The biased mucosal habitat for microbial species encoding BspA-like proteins, the characterisation of a vast structural diversity for the TvBspA proteins, differential expression of a subset of TvBspA genes and the cellular localisation and immunological data for one TvBspA; all point to the importance of the TvBspA proteins to various aspects of T. vaginalis pathobiology at the host-pathogen interface.
PMCID: PMC2843621  PMID: 20144183
16.  Genome evolution driven by host adaptations results in a more virulent and antimicrobial-resistant Streptococcus pneumoniae serotype 14 
BMC Genomics  2009;10:158.
Streptococcus pneumoniae serotype 14 is one of the most common pneumococcal serotypes that cause invasive pneumococcal diseases worldwide. Serotype 14 often expresses resistance to a variety of antimicrobial agents, resulting in difficulties in treatment. To gain insight into the evolution of virulence and antimicrobial resistance traits in S. pneumoniae from the genome level, we sequenced the entire genome of a serotype 14 isolate (CGSP14), and carried out comprehensive comparison with other pneumococcal genomes. Multiple serotype 14 clinical isolates were also genotyped by multilocus sequence typing (MLST).
Comparative genomic analysis revealed that the CGSP14 acquired a number of new genes by horizontal gene transfer (HGT), most of which were associated with virulence and antimicrobial resistance and clustered in mobile genetic elements. The most remarkable feature is the acquisition of two conjugative transposons and one resistance island encoding eight resistance genes. Results of MLST suggested that the major driving force for the genome evolution is the environmental drug pressure.
The genome sequence of S. pneumoniae serotype 14 shows a bacterium with rapid adaptations to its lifecycle in human community. These include a versatile genome content, with a wide range of mobile elements, and chromosomal rearrangement; the latter re-balanced the genome after events of HGT.
PMCID: PMC2678160  PMID: 19361343
17.  A novel DNA sequence periodicity decodes nucleosome positioning 
Nucleic Acids Research  2008;36(19):6228-6236.
There have been two types of well-characterized DNA sequence periodicities; both are found to be associated with important molecular mechanisms. One is a 3-nt periodicity corresponding to codon triplets, the other is a 10.5-nt periodicity related to the structure of DNA helixes. In the process of analyzing the genome and transcriptome of Trichomonas vaginalis, we observed a 120.9-nt periodicity along DNA sequences. Different from the 3- and 10.5-nt periodicities, this novel periodicity originates near the 5′-end of transcripts, extends along the direction of transcription, and weakens gradually along transcripts. As a result, codon usage as well as amino acid composition is constrained by this periodicity. Similar periodicities were also identified in other organisms, but with variable length associated with the length of nucleosome units. We validated this association experimentally in T. vaginalis, and demonstrated that the periodicity manifests nucleotide variations between linker-DNA and wrapping-DNA along nucleosome array. We conclude that this novel DNA sequence periodicity is a signature of nucleosome organization suggesting that nucleosomes are well-positioned with regularity, especially near the 5′-end of transcripts.
PMCID: PMC2577358  PMID: 18829715
18.  Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis 
Science (New York, N.Y.)  2007;315(5809):207-212.
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.
PMCID: PMC2080659  PMID: 17218520
20.  The genome sequence of Salmonella enterica serovar Choleraesuis, a highly invasive and resistant zoonotic pathogen 
Nucleic Acids Research  2005;33(5):1690-1698.
Salmonella enterica serovar Choleraesuis (S.Choleraesuis), a highly invasive serovar among non-typhoidal Salmonella, usually causes sepsis or extra-intestinal focal infections in humans. S.Choleraesuis infections have now become particularly difficult to treat because of the emergence of resistance to multiple antimicrobial agents. The 4.7 Mb genome sequence of a multidrug-resistant S.Choleraesuis strain SC-B67 was determined. Genome wide comparison of three sequenced Salmonella genomes revealed that more deletion events occurred in S.Choleraesuis SC-B67 and S.Typhi CT18 relative to S.Typhimurium LT2. S.Choleraesuis has 151 pseudogenes, which, among the three Salmonella genomes, include the highest percentage of pseudogenes arising from the genes involved in bacterial chemotaxis signal-transduction pathways. Mutations in these genes may increase smooth swimming of the bacteria, potentially allowing more effective interactions with and invasion of host cells to occur. A key regulatory gene of TetR/AcrR family, acrR, was inactivated through the introduction of an internal stop codon resulting in overexpression of AcrAB that appears to be associated with ciprofloxacin resistance. While lateral gene transfer providing basic functions to allow niche expansion in the host and environment is maintained during the evolution of different serovars of Salmonella, genes providing little overall selective benefit may be lost rapidly. Our findings suggest that the formation of pseudogenes may provide a simple evolutionary pathway that complements gene acquisition to enhance virulence and antimicrobial resistance in S.Choleraesuis.
PMCID: PMC1069006  PMID: 15781495

Results 1-20 (20)