Search tips
Search criteria

Results 1-25 (73)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Glycan Specificity of the Vibrio vulnificus Hemolysin Lectin Outlines Evolutionary History of Membrane Targeting by a Toxin Family 
Journal of molecular biology  2014;426(15):2800-2812.
Pore-forming toxins (PFTs) are a class of pathogen-secreted molecules that oligomerize to form transmembrane channels in cellular membranes. Determining the mechanism for how PFTs bind membranes is key to understanding their role in disease and possible ways to block their action. Vibrio vulnificus, an aquatic pathogen responsible for severe food poisoning and septicemia in humans, secretes a PFT called Vibrio vulnificus hemolysin (VVH), which contains a single C-terminal targeting domain predicted to resemble a β-trefoil lectin fold. In order to understand the selectivity of the lectin for glycan motifs, we expressed the isolated VVH β-trefoil domain and used glycan-chip screening to identify that VVH displays a preference for terminal galactosyl groups including N-acetyl-D-galactosamine (GalNAc) and N-acetyl-D-lactosamine (LacNAc). The X-ray crystal structure of the VVH lectin domain solved to 2.0 Å resolution reveals a heptameric ring arrangement similar to the oligomeric form of the related, but inactive, lectin from Vibrio cholerae cytolysin. Structures bound to glycerol, GalNAc, and LacNAc outline a common and versatile mode of recognition allowing VVH to target a wide variety of cell-surface ligands. Sequence analysis in light of our structural and functional data suggests that VVH may represent an earlier step in the evolution of Vibrio PFTs.
PMCID: PMC4102649  PMID: 24862282
bacterial pathogenesis; carbohydrate; crystallography; cytolysin; pore-forming toxin
2.  Subregional Hippocampal Morphology and Psychiatric Outcome in Adolescents Who Were Born Very Preterm and at Term 
PLoS ONE  2015;10(6):e0130094.
The hippocampus has been reported to be structurally and functionally altered as a sequel of very preterm birth (<33 weeks gestation), possibly due its vulnerability to hypoxic–ischemic damage in the neonatal period. We examined hippocampal volumes and subregional morphology in very preterm born individuals in mid- and late adolescence and their association with psychiatric outcome.
Structural brain magnetic resonance images were acquired at two time points (baseline and follow-up) from 65 ex-preterm adolescents (mean age = 15.5 and 19.6 years) and 36 term-born controls (mean age=15.0 and 19.0 years). Hippocampal volumes and subregional morphometric differences were measured from manual tracings and with three-dimensional shape analysis. Psychiatric outcome was assessed with the Rutter Parents’ Scale at baseline, the General Health Questionnaire at follow-up and the Peters Delusional Inventory at both time points.
In contrast to previous studies we did not find significant difference in the cross-sectional or longitudinal hippocampal volumes between individuals born preterm and controls, despite preterm individual having significantly smaller whole brain volumes. Shape analysis at baseline revealed subregional deformations in 28% of total bilateral hippocampal surface, reflecting atrophy, in ex-preterm individuals compared to controls, and in 22% at follow-up. In ex-preterm individuals, longitudinal changes in hippocampal shape accounted for 11% of the total surface, while in controls they reached 20%. In the whole sample (both groups) larger right hippocampal volume and bilateral anterior surface deformations at baseline were associated with delusional ideation scores at follow-up.
This study suggests a dynamic association between cross-sectional hippocampal volumes, longitudinal changes and surface deformations and psychosis proneness.
PMCID: PMC4474892  PMID: 26091104
3.  Solvent Properties of Water in Aqueous Solutions of Elastin-Like Polypeptide 
The phase-transition temperatures of an elastin-like polypeptide (ELP) with the (GVGVP)40 sequence and solvent dipolarity/polarizability, hydrogen-bond donor acidity, and hydrogen-bond acceptor basicity in its aqueous solutions were quantified in the absence and presence of different salts (Na2SO4, NaCl, NaClO4, and NaSCN) and various osmolytes (sucrose, sorbitol, trehalose, and trimethylamine N-oxide (TMAO)). All osmolytes decreased the ELP phase-transition temperature, whereas NaCl and Na2SO4 decreased, and NaSCN and NaClO4 increased it. The determined phase-transition temperatures may be described as a linear combination of the solvent’s dipolarity/polarizability and hydrogen-bond donor acidity. The linear relationship established for the phase-transition temperature in the presence of salts differs quantitatively from that in the presence of osmolytes, in agreement with different (direct and indirect) mechanisms of the influence of salts and osmolytes on the ELP phase-transition temperature.
PMCID: PMC4490507  PMID: 26075870
elastin-like polypeptide; phase-transition temperature; solvent properties; solvent dipolarity/polarizability; hydrogen-bond donor acidity; and hydrogen-bond acceptor basicity; osmolyte
4.  Analysis of SecA Dimerization in Solution 
Biochemistry  2014;53(19):3248-3260.
The Sec pathway mediates translocation of protein across the inner membrane of bacteria. SecA is a motor protein that drives translocation of preprotein through the SecYEG channel. SecA reversibly dimerizes under physiological conditions, but different dimer interfaces have been observed in SecA crystal structures. Here, we have used biophysical approaches to address the nature of the SecA dimer that exists in solution. We have taken advantage of the extreme salt sensitivity of SecA dimerization to compare the rates of hydrogen–deuterium exchange of the monomer and dimer and have analyzed the effects of single-alanine substitutions on dimerization affinity. Our results support the antiparallel dimer arrangement observed in one of the crystal structures of Bacillus subtilis SecA. Additional residues lying within the preprotein binding domain and the C-terminus are also protected from exchange upon dimerization, indicating linkage to a conformational transition of the preprotein binding domain from an open to a closed state. In agreement with this interpretation, normal mode analysis demonstrates that the SecA dimer interface influences the global dynamics of SecA such that dimerization stabilizes the closed conformation.
PMCID: PMC4030788  PMID: 24786965
5.  Modulatory effects of brain-derived neurotrophic factor Val66Met polymorphism on prefrontal regions in major depressive disorder 
The British Journal of Psychiatry  2015;206(5):379-384.
Brain-derived neurotrophic factor (BDNF) Val66Met polymorphism contributes to the development of depression (major depressive disorder, MDD), but it is unclear whether neural effects observed in healthy individuals are sustained in MDD.
To investigate BDNF Val66Met effects on key regions in MDD neurocircuitry: amygdala, anterior cingulate, middle frontal and orbitofrontal regions.
Magnetic resonance imaging scans were acquired in 79 persons with MDD (mean age 49 years) and 74 healthy volunteers (mean age 50 years). Effects on surface area and cortical thickness were examined with multiple comparison correction.
People who were Met allele carriers showed reduced caudal middle frontal thickness in both study groups. Significant interaction effects were found in the anterior cingulate and rostral middle frontal regions, in which participants in the MDD group who were Met carriers showed the greatest reduction in surface area.
Modulatory effects of the BDNF Val66Met polymorphism on distinct subregions in the prefrontal cortex in MDD support the neurotrophin model of depression.
PMCID: PMC4416135  PMID: 25745134
6.  Prediction of brain age suggests accelerated atrophy after traumatic brain injury 
Annals of Neurology  2015;77(4):571-581.
The long-term effects of traumatic brain injury (TBI) can resemble observed in normal ageing, suggesting that TBI may accelerate the ageing process. We investigate this using a neuroimaging model that predicts brain age in healthy individuals and then apply it to TBI patients. We define individuals' differences in chronological and predicted structural "brain age," and test whether TBI produces progressive atrophy and how this relates to cognitive function.
A predictive model of normal ageing was defined using machine learning in 1,537 healthy individuals, based on magnetic resonance imaging–derived estimates of gray matter (GM) and white matter (WM). This ageing model was then applied to test 99 TBI patients and 113 healthy controls to estimate brain age.
The initial model accurately predicted age in healthy individuals (r * 0.92). TBI brains were estimated to be "older," with a mean predicted age difference (PAD) between chronological and estimated brain age of 4.66 years (±10.8) for GM and 5.97 years (±11.22) for WM. This PAD predicted cognitive impairment and correlated strongly with the time since TBI, indicating that brain tissue loss increases throughout the chronic postinjury phase.
TBI patients' brains were estimated to be older than their chronological age. This discrepancy increases with time since injury, suggesting that TBI accelerates the rate of brain atrophy. This may be an important factor in the increased susceptibility in TBI patients for dementia and other age-associated conditions, motivating further research into the age-like effects of brain injury and other neurological diseases.
PMCID: PMC4403966  PMID: 25623048
7.  Domain interactions in Adenovirus VAI RNA mediate high affinity PKR binding 
Journal of molecular biology  2014;426(6):1285-1295.
Protein kinase R (PKR) is a component of the innate immunity antiviral pathway. PKR is activated upon binding to dsRNA to undergo dimerization and autophosphorylation. Adenovirus virus-associated RNA I (VAI) is a short, non-coding transcript whose major function is to inhibit the activity of PKR. VAI contains three domains: an apical stem-loop, a highly structured central domain, and a terminal stem. Previous studies have localized PKR binding to the apical stem and central domain. However, the molecular mechanism for inhibition of PKR is not known. We have characterized the stoichiometry and affinity of PKR binding to VAI and several domain constructs using analytical ultracentrifugation and correlated VAI binding and PKR inhibition. Although PKR binding to simple dsRNAs is not regulated by divalent ion, analysis of the interaction of the isolated dsRNA binding domain with VAI reveals that the binding affinity is enhanced by divalent ion. Dissection of VAI into its constituent domains indicates that none of the isolated domains retains the PKR binding affinity or inhibitory potency of the full length RNA. PKR is capable of binding the isolated terminal stem, but deletion of this domain from VAI does not affect PKR binding or inhibition. These results indicate that the apical stem and the central domain are both required to form a high affinity PKR binding site. Our data support a model whereby VAI functions as a PKR inhibitor because it binds a monomer tightly but does not facilitate dimerization.
PMCID: PMC3961479  PMID: 24394721
Analytical ultracentrifugation; innate immunity; protein-nucleic acid interactions; protein kinase; DMS probing
8.  History and impact of RDP 
RNA Biology  2014;11(3):239-243.
The Ribosomal Database Project (RDP) grew out of Carl Woese’s vision of how rRNA comparative methods could transform biology. First at the University of Illinois Urbana-Champaign, and later at Michigan State University’s Center for Microbial Ecology, the project has grown from a few hundred to several million rRNA gene sequences. In the years since Woese started the RDP, publications describing the database and related tools have been cited over 11 000 times in journals spanning a wide range of disciplines, while the RDP website is accessed by 10 000 researchers in over 20 000 analysis sessions each month. This article describes the history of RDP’s development over the last two decades.
PMCID: PMC4008557  PMID: 24607969
rRNA; phylogeny; microbial ecology; microbiology; database; Woese
9.  A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data 
PLoS Computational Biology  2014;10(8):e1003737.
Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at The data sets and experimental settings can be found in supplementary material.
Author Summary
Next-generation sequencing (NGS) provides an efficient and affordable way to sequence the genomes or transcriptomes of a large amount of organisms. With fast accumulation of the sequencing data from various NGS projects, the bottleneck is to efficiently mine useful knowledge from the data. As NGS platforms usually generate short and fragmented sequences (reads), one key step to annotate NGS data is to assemble short reads into longer contigs, which are then used to recover functional elements such as protein-coding genes. Short read assembly remains one of the most difficult computational problems in genomics. In particular, the performance of existing assembly tools is not satisfactory on complicated NGS data sets. They cannot reliably separate genes of high similarity, recover under-represented genes, and incur high computational time and memory usage. Hence, we propose a targeted gene assembly tool, SAT-Assembler, to assemble genes of interest directly from NGS data with low memory usage and high accuracy. Our experimental results on a transcriptomic data set and two microbial community data sets showed that SAT-Assembler used less memory and recovered more target genes with better accuracy than existing tools.
PMCID: PMC4133164  PMID: 25122209
10.  Are fluorescence – detected sedimentation velocity data reliable? 
Analytical biochemistry  2013;437(2):133-137.
Sedimentation velocity analytical ultracentrifugation is a classical biophysical technique that is commonly used to analyze the size, shape and interactions of biological macromolecules in solution. Fluorescence detection provides enhanced sensitivity and selectivity relative to the standard absorption and refractrometric detectors, but data acquisition is more complex and can be subject to interference from several photophysical effects. Here, we describe methods to configure sedimentation velocity measurements using fluorescence detection and evaluate the performance of the fluorescence optical system. The fluorescence detector output is linear over a concentration range of at least 1- 500 nM of fluorescein and Alexa Fluor 488. At high concentration, deviations from linearity can be attributed to the inner filter effect. A duplex DNA labeled with Alexa Fluor 488 was used as a standard to compare sedimentation coefficients obtained using fluorescence and absorbance detectors. Within error, the sedimentation coefficients agree. Thus, the fluorescence detector is capable of providing precise and accurate sedimentation velocity results that are consistent with measurements performed using conventional absorption optics, provided the data are collected at appropriate sample concentrations and the optics are configured correctly.
PMCID: PMC3640771  PMID: 23499970
analytical ultracentrifugation; sedimentation velocity; fluorescence
11.  Conformational dynamics of the Rpt6 ATPase in proteasome assembly and Rpn14 binding 
Juxtaposed to either or both ends of the proteasome core particle (CP) can exist a 19S regulatory particle (RP) that recognizes and prepares ubiquitinated proteins for proteolysis. RP triphosphatase proteins (Rpt1-Rpt6), which are critical for substrate translocation into the CP, bind chaperone-like proteins (Hsm3, Nas2, Nas6, Rpn14) implicated in RP assembly. We used NMR and other biophysical methods to reveal that S. cerevisiae Rpt6’s C-terminal domain undergoes dynamic helix-coil transitions enabled by helix-destabilizing glycines within its two most C-terminal α-helices. Rpn14 binds selectively to Rpt6’s 4-helix bundle, with surprisingly high affinity. Loss of Rpt6’s partially unfolded state by glycine substitution (Rpt6 G360,387A) disrupts holoenzyme formation in vitro, an effect enhanced by Rpn14. S. cerevisiae lacking Rpn14 and with Rpt6 G360,387A incorporated demonstrate hallmarks of defective proteasome assembly and synthetic growth defects. Rpt4 and Rpt5 exhibit similar exchange, suggesting that conserved structural heterogeneity among Rpt proteins may facilitate RP-CP assembly.
PMCID: PMC3670613  PMID: 23562395
12.  Fungal Diversity in Permafrost and Tallgrass Prairie Soils under Experimental Warming Conditions 
Applied and Environmental Microbiology  2013;79(22):7063-7072.
Soil fungi play a major role in terrestrial ecosystem functioning through interactions with soil structure, plants, micro- and mesofauna, and nutrient cycling through predation, pathogenesis, mutualistic, and saprotrophic roles. The diversity of soil fungi was assessed by sequencing their 28S rRNA gene in Alaskan permafrost and Oklahoma tallgrass prairie soils at experimental sites where the effect of climate warming is under investigation. A total of 226,695 reads were classified into 1,063 genera, covering 62% of the reference data set. Using the Bayesian Classifier offered by the Ribosomal Database Project (RDP) with 50% bootstrapping classification confidence, approximately 70% of sequences were returned as “unclassified” at the genus level, although the majority (∼65%) were classified at the class level, which provided insight into these lesser-known fungal lineages. Those unclassified at the genus level were subjected to BLAST analysis against the ARB-SILVA database, where ∼50% most closely matched nonfungal taxa. Compared to the more abundant sequences, a higher proportion of rare operational taxonomic units (OTU) were successfully classified to genera at 50% bootstrap confidence, indicating that the fungal rare biosphere in these sites is not composed of sequencing artifacts. There was no significant effect after 1 year of warming on the fungal community structure at both sites, except perhaps for a few minor members, but there was a significant effect of sample depth in the permafrost soils. Despite overall significant community structure differences driven by variations in OTU dominance, the prairie and permafrost soils shared 90% and 63% of all fungal sequences, respectively, indicating a fungal “seed bank” common between both sites.
PMCID: PMC3811548  PMID: 24014534
13.  Test-Retest Reliability of Diffusion Tensor Imaging in Huntington’s Disease 
PLoS Currents  2014;6:ecurrents.hd.f19ef63fff962f5cd9c0e88f4844f43b.
Diffusion tensor imaging (DTI) has shown microstructural abnormalities in patients with Huntington’s Disease (HD) and work is underway to characterise how these abnormalities change with disease progression. Using methods that will be applied in longitudinal research, we sought to establish the reliability of DTI in early HD patients and controls. Test-retest reliability, quantified using the intraclass correlation coefficient (ICC), was assessed using region-of-interest (ROI)-based white matter atlas and voxelwise approaches on repeat scan data from 22 participants (10 early HD, 12 controls). T1 data was used to generate further ROIs for analysis in a reduced sample of 18 participants. The results suggest that fractional anisotropy (FA) and other diffusivity metrics are generally highly reliable, with ICCs indicating considerably lower within-subject compared to between-subject variability in both HD patients and controls. Where ICC was low, particularly for the diffusivity measures in the caudate and putamen, this was partly influenced by outliers. The analysis suggests that the specific DTI methods used here are appropriate for cross-sectional research in HD, and give confidence that they can also be applied longitudinally, although this requires further investigation. An important caveat for DTI studies is that test-retest reliability may not be evenly distributed throughout the brain whereby highly anisotropic white matter regions tended to show lower relative within-subject variability than other white or grey matter regions.
PMCID: PMC3962450  PMID: 24672743
14.  Genomic Standards Consortium Projects 
Standards in Genomic Sciences  2014;9(3):599-601.
The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.
PMCID: PMC4148985  PMID: 25197446
15.  Defining how ubiquitin receptors hHR23a and S5a bind polyubiquitin 
Journal of molecular biology  2007;369(1):10.1016/j.jmb.2007.03.008.
Ubiquitin receptors connect substrate ubiquitylation to proteasomal degradation. HHR23a binds proteasome subunit 5a (S5a) through a surface that also binds ubiquitin. We report that S5a's UIM2 binds preferentially to hHR23a over polyubiquitin and provide a model for the ternary complex, which we expect represents one of the mechanisms used by the proteasome to capture ubiquitylated substrates. Furthermore, we demonstrate that hHR23a is surprisingly adept at sequestering the ubiquitin moieties of a polyubiquitin chain, and provide evidence that it and the ubiquitylated substrate are committed to each other after binding.
PMCID: PMC3864866  PMID: 17408689
ubiquitin receptors; proteasome subunit S5a; hHR23a; Rad23; proteasomal degradation
16.  Defining the Escherichia coli SecA Dimer Interface Residues through In Vivo Site-Specific Photo-Cross-Linking 
Journal of Bacteriology  2013;195(12):2817-2825.
The motor protein SecA is a core component of the bacterial general secretory (Sec) pathway and is essential for cell viability. Despite evidence showing that SecA exists in a dynamic monomer-dimer equilibrium favoring the dimeric form in solution and in the cytoplasm, there is considerable debate as to the quaternary structural organization of the SecA dimer. Here, a site-directed photo-cross-linking technique was utilized to identify residues on the Escherichia coli SecA (ecSecA) dimer interface in the cytosol of intact cells. The feasibility of this method was demonstrated with residue Leu6, which is essential for ecSecA dimerization based on our analytical ultracentrifugation studies of SecA L6A and shown to form the cross-linked SecA dimer in vivo with p-benzoyl-phenylalanine (pBpa) substituted at position 6. Subsequently, the amino terminus (residues 2 to 11) in the nucleotide binding domain (NBD), Phe263 in the preprotein binding domain (PBD), and Tyr794 and Arg805 in the intramolecular regulator of the ATPase 1 domain (IRA1) were identified to be involved in ecSecA dimerization. Furthermore, the incorporation of pBpa at position 805 did not form a cross-linked dimer in the SecA Δ2-11 context, indicating the possibility that the amino terminus may directly contact Arg805 or that the deletion of residues 2 to 11 alters the topology of the naturally occurring ecSecA dimer.
PMCID: PMC3697251  PMID: 23585536
17.  Ribosomal Database Project: data and tools for high throughput rRNA analysis 
Nucleic Acids Research  2013;42(Database issue):D633-D642.
Ribosomal Database Project (RDP; provides the research community with aligned and annotated rRNA gene sequence data, along with tools to allow researchers to analyze their own rRNA gene sequences in the RDP framework. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics. In addition to aligned and annotated collections of bacterial and archaeal small subunit rRNA genes, RDP now includes a collection of fungal large subunit rRNA genes. RDP tools, including Classifier and Aligner, have been updated to work with this new fungal collection. The use of high-throughput sequencing to characterize environmental microbial populations has exploded in the past several years, and as sequence technologies have improved, the sizes of environmental datasets have increased. With release 11, RDP is providing an expanded set of tools to facilitate analysis of high-throughput data, including both single-stranded and paired-end reads. In addition, most tools are now available as open source packages for download and local use by researchers with high-volume needs or who would like to develop custom analysis pipelines.
PMCID: PMC3965039  PMID: 24288368
18.  Role of the PAS Sensor Domains in the Bacillus subtilis Sporulation Kinase KinA 
Journal of Bacteriology  2013;195(10):2349-2358.
Histidine kinases are sophisticated molecular sensors that are used by bacteria to detect and respond to a multitude of environmental signals. KinA is the major histidine kinase required for initiation of sporulation upon nutrient deprivation in Bacillus subtilis. KinA has a large N-terminal region (residues 1 to 382) that is uniquely composed of three tandem Per-ARNT-Sim (PAS) domains that have been proposed to constitute a sensor module. To further enhance our understanding of this “sensor” region, we defined the boundaries that give rise to the minimal autonomously folded PAS domains and analyzed their homo- and heteroassociation properties using analytical ultracentrifugation, nuclear magnetic resonance (NMR) spectroscopy, and multiangle laser light scattering. We show that PASA self-associates very weakly, while PASC is primarily a monomer. In contrast, PASB forms a stable dimer (Kd [dissociation constant] of <10 nM), and it appears to be the main N-terminal determinant of KinA dimerization. Analysis of KinA mutants deficient for one or more PAS domains revealed a critical role for PASB, but not PASA, in autophosphorylation of KinA. Our findings suggest that dimerization of PASB is important for keeping the catalytic domain of KinA in a functional conformation. We use this information to propose a model for the structure of the N-terminal sensor module of KinA.
PMCID: PMC3650535  PMID: 23504013
19.  FunGene: the functional gene pipeline and repository 
Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
PMCID: PMC3787254  PMID: 24101916
microbial ecology; functional genes; amplification primers; phylogeny; biogeochemical cycles; amplicon analysis
20.  Functional genes to assess nitrogen cycling and aromatic hydrocarbon degradation: primers and processing matter 
Targeting sequencing to genes involved in key environmental processes, i.e., ecofunctional genes, provides an opportunity to sample nature's gene guilds to greater depth and help link community structure to process-level outcomes. Vastly different approaches have been implemented for sequence processing and, ultimately, for taxonomic placement of these gene reads. The overall quality of next generation sequence analysis of functional genes is dependent on multiple steps and assumptions of unknown diversity. To illustrate current issues surrounding amplicon read processing we provide examples for three ecofunctional gene groups. A combination of in silico, environmental and cultured strain sequences was used to test new primers targeting the dioxin and dibenzofuran degrading genes dxnA1, dbfA1, and carAa. The majority of obtained environmental sequences were classified into novel sequence clusters, illustrating the discovery value of the approach. For the nitrite reductase step in denitrification, the well-known nirK primers exhibited deficiencies in reference database coverage, illustrating the need to refine primer-binding sites and/or to design multiple primers, while nirS primers exhibited bias against five phyla. Amino acid-based OTU clustering of these two N-cycle genes from soil samples yielded only 114 unique nirK and 45 unique nirS genus-level groupings, likely a reflection of constricted primer coverage. Finally, supervised and non-supervised OTU analysis methods were compared using the nifH gene of nitrogen fixation, with generally similar outcomes, but the clustering (non-supervised) method yielded higher diversity estimates and stronger site-based differences. High throughput amplicon sequencing can provide inexpensive and rapid access to nature's related sequences by circumventing the culturing barrier, but each unique gene requires individual considerations in terms of primer design and sequence processing and classification.
PMCID: PMC3775264  PMID: 24062736
functional genes; nifH; aromatic hydrocarbon; nirS; primer specificity; clustering analysis; nirK; nitrogen cycling
21.  Ecological Patterns of nifH Genes in Four Terrestrial Climatic Zones Explored with Targeted Metagenomics Using FrameBot, a New Informatics Tool 
mBio  2013;4(5):e00592-13.
Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (nifH). We obtained 1.1 million nifH 454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all nifH reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those nifH sequences related to the Proteobacteria phylum were most abundant, followed by those for Cyanobacteria in the Alaska and Utah sites. Predominant genera with nifH sequences similar to reads included Azospirillum, Bradyrhizobium, and Rhizobium, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known nifH gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any.
High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more straightforward to assess the genes directly responsible for the ecological function (ecofunctional genes). However, analyzing these genes involves technical challenges beyond those seen for rRNA. In particular, frameshift errors cause garbled downstream protein translations. Our FrameBot tool described here both corrects frameshift errors in query reads and determines their closest matching protein sequences in a set of reference sequences. We validated this new tool with sequences from defined communities and demonstrated the tool’s utility on nifH gene fragments sequenced from soils in well-characterized and major terrestrial ecosystem types.
PMCID: PMC3781835  PMID: 24045641
22.  Meeting Report: Fungal ITS Workshop (October 2012) 
Standards in Genomic Sciences  2013;8(1):118-123.
This report summarizes a meeting held in Boulder, CO USA (19–20 October 2012) on fungal community analyses using ultra-high-throughput sequencing of the internal transcribed spacer (ITS) region of the nuclear ribosomal RNA (rRNA) genes. The meeting was organized as a two-day workshop, with the primary goal of supporting collaboration among researchers for improving fungal ITS sequence resources and developing recommendations for standard ITS primers for the research community.
PMCID: PMC3739174  PMID: 23961317
23.  A gene-targeted approach to investigate the intestinal butyrate-producing bacterial community 
Microbiome  2013;1:8.
Butyrate, which is produced by the human microbiome, is essential for a well-functioning colon. Bacteria that produce butyrate are phylogenetically diverse, which hinders their accurate detection based on conventional phylogenetic markers. As a result, reliable information on this important bacterial group is often lacking in microbiome research.
In this study we describe a gene-targeted approach for 454 pyrotag sequencing and quantitative polymerase chain reaction for the final genes in the two primary bacterial butyrate synthesis pathways, butyryl-CoA:acetate CoA-transferase (but) and butyrate kinase (buk). We monitored the establishment and early succession of butyrate-producing communities in four patients with ulcerative colitis who underwent a colectomy with ileal pouch anal anastomosis and compared it with three control samples from healthy colons. All patients established an abundant butyrate-producing community (approximately 5% to 26% of the total community) in the pouch within the 2-month study, but patterns were distinctive among individuals. Only one patient harbored a community profile similar to the healthy controls, in which there was a predominance of but genes that are similar to reference genes from Acidaminococcus sp., Eubacterium sp., Faecalibacterium prausnitzii and Roseburia sp., and an almost complete absence of buk genes. Two patients were greatly enriched in buk genes similar to those of Clostridium butyricum and C. perfringens, whereas a fourth patient displayed abundant communities containing both genes. Most butyrate producers identified in previous studies were detected and the general patterns of taxa found were supported by 16S rRNA gene pyrotag analysis, but the gene-targeted approach provided more detail about the potential butyrate-producing members of the community.
The presented approach provides quantitative and genotypic insights into butyrate-producing communities and facilitates a more specific functional characterization of the intestinal microbiome. Furthermore, our analysis refines but and buk reference annotations found in central databases.
PMCID: PMC4126176  PMID: 24451334
Butyrate; Gene-targeted metagenomics; Human microbiome project; Pouchitis; Ulcerative colitis
24.  Evaluating multicenter DTI data in Huntington's disease on site specific effects: An ex post facto approach☆ 
NeuroImage : Clinical  2013;2:161-167.
Assessment of the feasibility to average diffusion tensor imaging (DTI) metrics of MRI data acquired in the course of a multicenter study.
Materials and methods
Sixty-one early stage Huntington's disease patients and forty healthy controls were studied using four different MR scanners at four European sites with acquisition protocols as close as possible to a given standard protocol. The potential and feasibility of averaging data acquired at different sites was evaluated quantitatively by region-of-interest (ROI) based statistical comparisons of coefficients of variation (CV) across centers, as well as by testing for significant group-by-center differences on averaged fractional anisotropy (FA) values between patients and controls. In addition, a whole-brain based statistical between-group comparison was performed using FA maps.
The ex post facto statistical evaluation of CV and FA-values in a priori defined ROIs showed no differences between sites above chance indicating that data were not systematically biased by center specific factors.
Averaging FA-maps from DTI data acquired at different study sites and different MR scanner types does not appear to be systematically biased. A suitable recipe for testing on the possibility to pool multicenter DTI data is provided to permit averaging of DTI-derived metrics to differentiate patients from healthy controls at a larger scale.
► Alternative procedure to evaluate prerequisites for multicenter DTI data pooling. ► Procedure may serve as reference for future multicenter MRI-DTI trials in HD. ► FA differences between HD and controls consistent with single center reports.
PMCID: PMC3777841  PMID: 24179771
Multicenter study; Diffusion tensor imaging; Fractional anisotropy; Huntington's disease
25.  The Role of Human Dicer-dsRBD in Processing Small Regulatory RNAs 
PLoS ONE  2012;7(12):e51829.
One of the most exciting recent developments in RNA biology has been the discovery of small non-coding RNAs that affect gene expression through the RNA interference (RNAi) mechanism. Two major classes of RNAs involved in RNAi are small interfering RNA (siRNA) and microRNA (miRNA). Dicer, an RNase III enzyme, plays a central role in the RNAi pathway by cleaving precursors of both of these classes of RNAs to form mature siRNAs and miRNAs, which are then loaded into the RNA-induced silencing complex (RISC). miRNA and siRNA precursors are quite structurally distinct; miRNA precursors are short, imperfect hairpins while siRNA precursors are long, perfect duplexes. Nonetheless, Dicer is able to process both. Dicer, like the majority of RNase III enzymes, contains a dsRNA binding domain (dsRBD), but the data are sparse on the exact role this domain plays in the mechanism of Dicer binding and cleavage. To further explore the role of human Dicer-dsRBD in the RNAi pathway, we determined its binding affinity to various RNAs modeling both miRNA and siRNA precursors. Our study shows that Dicer-dsRBD is an avid binder of dsRNA, but its binding is only minimally influenced by a single-stranded – double-stranded junction caused by large terminal loops observed in miRNA precursors. Thus, the Dicer-dsRBD contributes directly to substrate binding but not to the mechanism of differentiating between pre-miRNA and pre-siRNA. In addition, NMR spin relaxation and MD simulations provide an overview of the role that dynamics contribute to the binding mechanism. We compare this current study with our previous studies of the dsRBDs from Drosha and DGCR8 to give a dynamic profile of dsRBDs in their apo-state and a mechanistic view of dsRNA binding by dsRBDs in general.
PMCID: PMC3521659  PMID: 23272173

Results 1-25 (73)