Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm 
Microbiome  2014;2:26.
Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions.
We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity.
The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at
PMCID: PMC4129434  PMID: 25136443
Binning; Metagenomics; Expectation-maximization algorithm
2.  Effects of 6-Hydroxyflavone on Osteoblast Differentiation in MC3T3-E1 Cells 
Osteoblast differentiation plays an essential role in bone integrity. Isoflavones and some flavonoids are reported to have osteogenic activity and potentially possess the ability to treat osteoporosis. However, limited information concerning the osteogenic characteristics of hydroxyflavones is available. This study investigates the effects of various hydroxyflavones on osteoblast differentiation in MC3T3-E1 cells. The results showed that 6-hydroxyflavone (6-OH-F) and 7-hydroxyflavone (7-OH-F) stimulated ALP activity. However, baicalein and luteolin inhibited ALP activity and flavone showed no effect. Up to 50 μM of each compound was used for cytotoxic effects study; flavone, 6-OH-F, and 7-OH-F had no cytotoxicity on MC3T3-E1 cells. Moreover, 6-OH-F activated AKT and serine/threonine kinases (also known as protein kinase B or PKB), extracellular signal-regulated kinases (ERK 1/2), and the c-Jun N-terminal kinase (JNK) signaling pathways. On the other hand, 7-OH-F promoted osteoblast differentiation mainly by activating ERK 1/ 2 signaling pathways. Finally, after 5 weeks of 6-OH-F induction, MC3T3-E1 cells showed a significant increase in the calcein staining intensity relative to merely visible mineralization observed in cells cultured in the osteogenic medium only. These results suggested that 6-OH-F could activate AKT, ERK 1/2, and JNK signaling pathways to effectively promote osteoblastic differentiation.
PMCID: PMC3984785  PMID: 24795772
3.  Backpropagating Action Potentials Enable Detection of Extrasynaptic Glutamate by NMDA Receptors 
Cell reports  2012;1(5):495-505.
Synaptic NMDA receptors (NMDARs) are crucial for neural coding and plasticity. However, little is known about the adaptive function of extrasynaptic NMDARs occurring mainly on dendritic shafts. Here, we find that in CA1 pyramidal neurons, backpropagating action potentials (bAPs) recruit shaft NMDARs exposed to ambient glutamate. In contrast, spine NMDARs are “protected,” under baseline conditions, from such glutamate influences by perisynaptic transporters: we detect bAP-evoked Ca2+ entry through these receptors upon local synaptic or photolytic glutamate release. During theta-burst firing, NMDAR-dependent Ca2+ entry either downregulates or upregulates an h-channel conductance (Gh) of the cell depending on whether synaptic glutamate release is intact or blocked. Thus, the balance between activation of synaptic and extrasynaptic NMDARs can determine the sign of Gh plasticity. Gh plasticity in turn regulates dendritic input probed by local glutamate uncaging. These results uncover a metaplasticity mechanism potentially important for neural coding and memory formation.
PMCID: PMC3740263  PMID: 22832274
4.  Oral Spirochetes Implicated in Dental Diseases Are Widespread in Normal Human Subjects and Carry Extremely Diverse Integron Gene Cassettes 
Applied and Environmental Microbiology  2012;78(15):5288-5296.
The NIH Human Microbiome Project (HMP) has produced several hundred metagenomic data sets, allowing studies of the many functional elements in human-associated microbial communities. Here, we survey the distribution of oral spirochetes implicated in dental diseases in normal human individuals, using recombination sites associated with the chromosomal integron in Treponema genomes, taking advantage of the multiple copies of the integron recombination sites (repeats) in the genomes, and using a targeted assembly approach that we have developed. We find that integron-containing Treponema species are present in ∼80% of the normal human subjects included in the HMP. Further, we are able to de novo assemble the integron gene cassettes using our constrained assembly approach, which employs a unique application of the de Bruijn graph assembly information; most of these cassette genes were not assembled in whole-metagenome assemblies and could not be identified by mapping sequencing reads onto the known reference Treponema genomes due to the dynamic nature of integron gene cassettes. Our study significantly enriches the gene pool known to be carried by Treponema chromosomal integrons, totaling 826 (598 97% nonredundant) genes. We characterize the functions of these gene cassettes: many of these genes have unknown functions. The integron gene cassette arrays found in the human microbiome are extraordinarily dynamic, with different microbial communities sharing only a small number of common genes.
PMCID: PMC3416431  PMID: 22635997
5.  The gain and loss of chromosomal integron systems in the Treponema species 
Integron systems are now recognized as important agents of bacterial evolution and are prevalent in most environments. One of the human pathogens known to harbor chromosomal integrons, the Treponema spirochetes are the only clade among spirochete species found to carry integrons. With the recent release of many new Treponema genomes, we were able to study the distribution of chromosomal integrons in this genus.
We find that the Treponema spirochetes implicated in human periodontal diseases and those isolated from cow and swine intestines contain chromosomal integrons, but not the Treponema species isolated from termite guts. By examining the species tree of selected spirochetes (based on 31 phylogenetic marker genes) and the phylogenetic tree of predicted integron integrases, and assisted by our analysis of predicted integron recombination sites, we found that all integron systems identified in Treponema spirochetes are likely to have evolved from a common ancestor—a horizontal gain into the clade. Subsequent to this event, the integron system was lost in the branch leading to the speciation of T. pallidum and T. phagedenis (the Treponema sps. implicated in sexually transmitted diseases). We also find that the lengths of the integron attC sites shortened through Treponema speciation, and that the integron gene cassettes of T. denticola are highly strain specific.
This is the first comprehensive study to characterize the chromosomal integron systems in Treponema species. By characterizing integron distribution and cassette contents in the Treponema sps., we link the integrons to the speciation of the various species, especially to the pathogens T. pallidum and T. phagedenis.
PMCID: PMC3607928  PMID: 23339550
Chromosomal integron; Treponema species; Integron integrase; attC site
6.  Tonic GABAA conductance decreases membrane time constant and increases EPSP-spike precision in hippocampal pyramidal neurons 
Because of a complex dendritic structure, pyramidal neurons have a large membrane surface relative to other cells and so a large electrical capacitance and a large membrane time constant (τm). This results in slow depolarizations in response to excitatory synaptic inputs, and consequently increased and variable action potential latencies, which may be computationally undesirable. Tonic activation of GABAA receptors increases membrane conductance and thus regulates neuronal excitability by shunting inhibition. In addition, tonic increases in membrane conductance decrease the membrane time constant (τm), and improve the temporal fidelity of neuronal firing. Here we performed whole-cell current clamp recordings from hippocampal CA1 pyramidal neurons and found that bath application of 10μM GABA indeed decreases τm in these cells. GABA also decreased first spike latency and jitter (standard deviation of the latency) produced by current injection of 2 rheobases (500 ms). However, when larger current injections (3–6 rheobases) were used, GABA produced no significant effect on spike jitter, which was low. Using mathematical modeling we demonstrate that the tonic GABAA conductance decreases rise time, decay time and half-width of EPSPs in pyramidal neurons. A similar effect was observed on EPSP/IPSP pairs produced by stimulation of Schaffer collaterals: the EPSP part of the response became shorter after application of GABA. Consistent with the current injection data, a significant decrease in spike latency and jitter was obtained in cell attached recordings only at near-threshold stimulation (50% success rate, S50). When stimulation was increased to 2- or 3- times S50, GABA significantly affected neither spike latency nor spike jitter. Our results suggest that a decrease in τm associated with elevations in ambient GABA can improve EPSP-spike precision at near-threshold synaptic inputs.
PMCID: PMC3872325  PMID: 24399937
spike jitter; EPSP-spike precision; tonic conductance; GABA; hippocampus
7.  Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics 
Bioinformatics  2012;28(18):i363-i369.
Motivation: One of the difficulties in metagenomic assembly is that homologous genes from evolutionarily closely related species may behave like repeats and confuse assemblers. As a result, small contigs, each representing a short gene fragment, instead of complete genes, may be reported by an assembler. This further complicates annotation of metagenomic datasets, as annotation tools (such as gene predictors or similarity search tools) typically perform poorly on configs encoding short gene fragments.
Results: We present a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. A network matching algorithm is proposed for matching the de Bruijn graph of contigs against reference genes, to derive ‘gene paths’ in the graph (sequences of contigs containing gene fragments) that have the highest similarities to known genes, allowing gene fragments contained in multiple contigs to be connected to form more complete (or intact) genes. Tests on simulated and real datasets show that our approach (called GeneStitch) is able to significantly improve the assembly of genes from metagenomic sequences, by connecting contigs with the guidance of homologous genes—information that is orthogonal to the sequencing reads. We note that the improvement of gene assembly can be observed even when only distantly related genes are available as the reference. We further propose to use ‘gene graphs’ to represent the assembly of reads from homologous genes and discuss potential applications of gene graphs to improving functional annotation for metagenomics.
Availability: The tools are available as open source for download at
PMCID: PMC3436815  PMID: 22962453
8.  Diverse CRISPRs Evolving in Human Microbiomes 
PLoS Genetics  2012;8(6):e1002441.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci, together with cas (CRISPR–associated) genes, form the CRISPR/Cas adaptive immune system, a primary defense strategy that eubacteria and archaea mobilize against foreign nucleic acids, including phages and conjugative plasmids. Short spacer sequences separated by the repeats are derived from foreign DNA and direct interference to future infections. The availability of hundreds of shotgun metagenomic datasets from the Human Microbiome Project (HMP) enables us to explore the distribution and diversity of known CRISPRs in human-associated microbial communities and to discover new CRISPRs. We propose a targeted assembly strategy to reconstruct CRISPR arrays, which whole-metagenome assemblies fail to identify. For each known CRISPR type (identified from reference genomes), we use its direct repeat consensus sequence to recruit reads from each HMP dataset and then assemble the recruited reads into CRISPR loci; the unique spacer sequences can then be extracted for analysis. We also identified novel CRISPRs or new CRISPR variants in contigs from whole-metagenome assemblies and used targeted assembly to more comprehensively identify these CRISPRs across samples. We observed that the distributions of CRISPRs (including 64 known and 86 novel ones) are largely body-site specific. We provide detailed analysis of several CRISPR loci, including novel CRISPRs. For example, known streptococcal CRISPRs were identified in most oral microbiomes, totaling ∼8,000 unique spacers: samples resampled from the same individual and oral site shared the most spacers; different oral sites from the same individual shared significantly fewer, while different individuals had almost no common spacers, indicating the impact of subtle niche differences on the evolution of CRISPR defenses. We further demonstrate potential applications of CRISPRs to the tracing of rare species and the virus exposure of individuals. This work indicates the importance of effective identification and characterization of CRISPR loci to the study of the dynamic ecology of microbiomes.
Author Summary
Human bodies are complex ecological systems in which various microbial organisms and viruses interact with each other and with the human host. The Human Microbiome Project (HMP) has resulted in >700 datasets of shotgun metagenomic sequences, from which we can learn about the compositions and functions of human-associated microbial communities. CRISPR/Cas systems are a widespread class of adaptive immune systems in bacteria and archaea, providing acquired immunity against foreign nucleic acids: CRISPR/Cas defense pathways involve integration of viral- or plasmid-derived DNA segments into CRISPR arrays (forming spacers between repeated structural sequences), and expression of short crRNAs from these single repeat-spacer units, to generate interference to future invading foreign genomes. Powered by an effective computational approach (the targeted assembly approach for CRISPR), our analysis of CRISPR arrays in the HMP datasets provides the very first global view of bacterial immunity systems in human-associated microbial communities. The great diversity of CRISPR spacers we observed among different body sites, in different individuals, and in single individuals over time, indicates the impact of subtle niche differences on the evolution of CRISPR defenses and indicates the key role of bacteriophage (and plasmids) in shaping human microbial communities.
PMCID: PMC3374615  PMID: 22719260
9.  Backpropagating Action Potentials Enable Detection of Extrasynaptic Glutamate by NMDA Receptors 
Cell Reports  2012;1(5):495-505.
Synaptic NMDA receptors (NMDARs) are crucial for neural coding and plasticity. However, little is known about the adaptive function of extrasynaptic NMDARs occurring mainly on dendritic shafts. Here, we find that in CA1 pyramidal neurons, backpropagating action potentials (bAPs) recruit shaft NMDARs exposed to ambient glutamate. In contrast, spine NMDARs are “protected,” under baseline conditions, from such glutamate influences by perisynaptic transporters: we detect bAP-evoked Ca2+ entry through these receptors upon local synaptic or photolytic glutamate release. During theta-burst firing, NMDAR-dependent Ca2+ entry either downregulates or upregulates an h-channel conductance (Gh) of the cell depending on whether synaptic glutamate release is intact or blocked. Thus, the balance between activation of synaptic and extrasynaptic NMDARs can determine the sign of Gh plasticity. Gh plasticity in turn regulates dendritic input probed by local glutamate uncaging. These results uncover a metaplasticity mechanism potentially important for neural coding and memory formation.
Graphical Abstract
► Dendritic shaft NMDA receptors are bound to extrasynaptic glutamate ► Both dendritic shaft and spine NMDA receptors detect synaptic glutamate spillover ► Backpropagating APs help to detect both spillover and ambient glutamate ► Dendritic shaft NMDA receptors induce downregulation of h-channel conductance (Gh)
Activity-dependent synaptic plasticity holds the key to information storage in the brain. Although synaptic NMDA receptors (NMDARs) have long been implicated in the underlying mechanisms, little is known about the role of extrasynaptic NMDARs. In hippocampal pyramidal neurons, backpropagating action potentials recruit extrasynaptic NMDARs that have been exposed to ambient glutamate, thus boosting local Ca2+ entry. Semyanov and colleagues show that physiological stimulation engaging this mechanism induces an as-yet-undiscovered form of neuronal plasticity that affects synaptic input processing by the neuron.
PMCID: PMC3740263  PMID: 22832274
10.  A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-tuples 
Journal of Computational Biology  2011;18(3):523-534.
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes—two important parameters for characterizing a microbial community. We also show that AbundanceBin performed well when the sequence lengths are very short (e.g., 75 bp) or have sequencing errors. By combining AbundanceBin and a composition-based method (MetaCluster), we can achieve even higher binning accuracy. Supplementary Material is available at
PMCID: PMC3123841  PMID: 21385052
binning; EM algorithm; metagenomics; Poisson distribution
11.  Editing site analysis in a gymnosperm mitochondrial genome reveals similarities with angiosperm mitochondrial genomes 
Current Genetics  2010;56(5):439-446.
Sequence analysis of organelle genomes and comprehensive analysis of C-to-U editing sites from flowering and non-flowering plants have provided extensive sequence information from diverse taxa. This study includes the first comprehensive analysis of RNA editing sites from a gymnosperm mitochondrial genome, and utilizes informatics analyses to determine conserved features in the RNA sequence context around editing sites. We have identified 565 editing sites in 21 full-length and 4 partial cDNAs of the 39 protein-coding genes identified from the mitochondrial genome of Cycas taitungensis. The information profiles and RNA sequence context of C-to-U editing sites in the Cycas genome exhibit similarity in the immediate flanking nucleotides. Relative entropy analyses indicate that similar regions in the 5′ flanking 20 nucleotides have information content compared to angiosperm mitochondrial genomes. These results suggest that evolutionary constraints exist on the nucleotide sequences immediately adjacent to C-to-U editing sites, and similar regions are utilized in editing site recognition.
Electronic supplementary material
The online version of this article (doi:10.1007/s00294-010-0312-4) contains supplementary material, which is available to authorized users.
PMCID: PMC2943580  PMID: 20617318
RNA editing; Relative entropy; Organelle evolution
12.  Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences 
BMC Bioinformatics  2007;8:63.
When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences.
A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation.
With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL .
PMCID: PMC1805764  PMID: 17319966
13.  SinicView: A visualization environment for comparisons of multiple nucleotide sequence alignment tools 
BMC Bioinformatics  2006;7:103.
Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy.
In this paper, we present a versatile alignment visualization system, called SinicView, (for Sequence-aligning INnovative and Interactive Comparison VIEWer), which allows the user to efficiently compare and evaluate assorted nucleotide alignment results obtained by different tools. SinicView calculates similarity of the alignment outputs under a fixed window using the sum-of-pairs method and provides scoring profiles of each set of aligned sequences. The user can visually compare alignment results either in graphic scoring profiles or in plain text format of the aligned nucleotides along with the annotations information. We illustrate the capabilities of our visualization system by comparing alignment results obtained by MLAGAN, MAVID, and MULTIZ, respectively.
With SinicView, users can use their own data sequences to compare various alignment tools or scoring systems and select the most suitable one to perform alignment in the initial stage of sequence analysis.
PMCID: PMC1434773  PMID: 16509994

Results 1-13 (13)