PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (65)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data 
PLoS Computational Biology  2014;10(8):e1003737.
Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.
Author Summary
Next-generation sequencing (NGS) provides an efficient and affordable way to sequence the genomes or transcriptomes of a large amount of organisms. With fast accumulation of the sequencing data from various NGS projects, the bottleneck is to efficiently mine useful knowledge from the data. As NGS platforms usually generate short and fragmented sequences (reads), one key step to annotate NGS data is to assemble short reads into longer contigs, which are then used to recover functional elements such as protein-coding genes. Short read assembly remains one of the most difficult computational problems in genomics. In particular, the performance of existing assembly tools is not satisfactory on complicated NGS data sets. They cannot reliably separate genes of high similarity, recover under-represented genes, and incur high computational time and memory usage. Hence, we propose a targeted gene assembly tool, SAT-Assembler, to assemble genes of interest directly from NGS data with low memory usage and high accuracy. Our experimental results on a transcriptomic data set and two microbial community data sets showed that SAT-Assembler used less memory and recovered more target genes with better accuracy than existing tools.
doi:10.1371/journal.pcbi.1003737
PMCID: PMC4133164  PMID: 25122209
2.  Are fluorescence – detected sedimentation velocity data reliable? 
Analytical biochemistry  2013;437(2):133-137.
Sedimentation velocity analytical ultracentrifugation is a classical biophysical technique that is commonly used to analyze the size, shape and interactions of biological macromolecules in solution. Fluorescence detection provides enhanced sensitivity and selectivity relative to the standard absorption and refractrometric detectors, but data acquisition is more complex and can be subject to interference from several photophysical effects. Here, we describe methods to configure sedimentation velocity measurements using fluorescence detection and evaluate the performance of the fluorescence optical system. The fluorescence detector output is linear over a concentration range of at least 1- 500 nM of fluorescein and Alexa Fluor 488. At high concentration, deviations from linearity can be attributed to the inner filter effect. A duplex DNA labeled with Alexa Fluor 488 was used as a standard to compare sedimentation coefficients obtained using fluorescence and absorbance detectors. Within error, the sedimentation coefficients agree. Thus, the fluorescence detector is capable of providing precise and accurate sedimentation velocity results that are consistent with measurements performed using conventional absorption optics, provided the data are collected at appropriate sample concentrations and the optics are configured correctly.
doi:10.1016/j.ab.2013.02.019
PMCID: PMC3640771  PMID: 23499970
analytical ultracentrifugation; sedimentation velocity; fluorescence
3.  Conformational dynamics of the Rpt6 ATPase in proteasome assembly and Rpn14 binding 
Summary
Juxtaposed to either or both ends of the proteasome core particle (CP) can exist a 19S regulatory particle (RP) that recognizes and prepares ubiquitinated proteins for proteolysis. RP triphosphatase proteins (Rpt1-Rpt6), which are critical for substrate translocation into the CP, bind chaperone-like proteins (Hsm3, Nas2, Nas6, Rpn14) implicated in RP assembly. We used NMR and other biophysical methods to reveal that S. cerevisiae Rpt6’s C-terminal domain undergoes dynamic helix-coil transitions enabled by helix-destabilizing glycines within its two most C-terminal α-helices. Rpn14 binds selectively to Rpt6’s 4-helix bundle, with surprisingly high affinity. Loss of Rpt6’s partially unfolded state by glycine substitution (Rpt6 G360,387A) disrupts holoenzyme formation in vitro, an effect enhanced by Rpn14. S. cerevisiae lacking Rpn14 and with Rpt6 G360,387A incorporated demonstrate hallmarks of defective proteasome assembly and synthetic growth defects. Rpt4 and Rpt5 exhibit similar exchange, suggesting that conserved structural heterogeneity among Rpt proteins may facilitate RP-CP assembly.
doi:10.1016/j.str.2013.02.021
PMCID: PMC3670613  PMID: 23562395
4.  Fungal Diversity in Permafrost and Tallgrass Prairie Soils under Experimental Warming Conditions 
Applied and Environmental Microbiology  2013;79(22):7063-7072.
Soil fungi play a major role in terrestrial ecosystem functioning through interactions with soil structure, plants, micro- and mesofauna, and nutrient cycling through predation, pathogenesis, mutualistic, and saprotrophic roles. The diversity of soil fungi was assessed by sequencing their 28S rRNA gene in Alaskan permafrost and Oklahoma tallgrass prairie soils at experimental sites where the effect of climate warming is under investigation. A total of 226,695 reads were classified into 1,063 genera, covering 62% of the reference data set. Using the Bayesian Classifier offered by the Ribosomal Database Project (RDP) with 50% bootstrapping classification confidence, approximately 70% of sequences were returned as “unclassified” at the genus level, although the majority (∼65%) were classified at the class level, which provided insight into these lesser-known fungal lineages. Those unclassified at the genus level were subjected to BLAST analysis against the ARB-SILVA database, where ∼50% most closely matched nonfungal taxa. Compared to the more abundant sequences, a higher proportion of rare operational taxonomic units (OTU) were successfully classified to genera at 50% bootstrap confidence, indicating that the fungal rare biosphere in these sites is not composed of sequencing artifacts. There was no significant effect after 1 year of warming on the fungal community structure at both sites, except perhaps for a few minor members, but there was a significant effect of sample depth in the permafrost soils. Despite overall significant community structure differences driven by variations in OTU dominance, the prairie and permafrost soils shared 90% and 63% of all fungal sequences, respectively, indicating a fungal “seed bank” common between both sites.
doi:10.1128/AEM.01702-13
PMCID: PMC3811548  PMID: 24014534
5.  Test-Retest Reliability of Diffusion Tensor Imaging in Huntington’s Disease 
PLoS Currents  2014;6:ecurrents.hd.f19ef63fff962f5cd9c0e88f4844f43b.
Diffusion tensor imaging (DTI) has shown microstructural abnormalities in patients with Huntington’s Disease (HD) and work is underway to characterise how these abnormalities change with disease progression. Using methods that will be applied in longitudinal research, we sought to establish the reliability of DTI in early HD patients and controls. Test-retest reliability, quantified using the intraclass correlation coefficient (ICC), was assessed using region-of-interest (ROI)-based white matter atlas and voxelwise approaches on repeat scan data from 22 participants (10 early HD, 12 controls). T1 data was used to generate further ROIs for analysis in a reduced sample of 18 participants. The results suggest that fractional anisotropy (FA) and other diffusivity metrics are generally highly reliable, with ICCs indicating considerably lower within-subject compared to between-subject variability in both HD patients and controls. Where ICC was low, particularly for the diffusivity measures in the caudate and putamen, this was partly influenced by outliers. The analysis suggests that the specific DTI methods used here are appropriate for cross-sectional research in HD, and give confidence that they can also be applied longitudinally, although this requires further investigation. An important caveat for DTI studies is that test-retest reliability may not be evenly distributed throughout the brain whereby highly anisotropic white matter regions tended to show lower relative within-subject variability than other white or grey matter regions.
doi:10.1371/currents.hd.f19ef63fff962f5cd9c0e88f4844f43b
PMCID: PMC3962450  PMID: 24672743
6.  Genomic Standards Consortium Projects 
Standards in Genomic Sciences  2014;9(3):599-601.
The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.
doi:10.4056/sigs.5559680
PMCID: PMC4148985  PMID: 25197446
7.  Defining how ubiquitin receptors hHR23a and S5a bind polyubiquitin 
Journal of molecular biology  2007;369(1):10.1016/j.jmb.2007.03.008.
Summary
Ubiquitin receptors connect substrate ubiquitylation to proteasomal degradation. HHR23a binds proteasome subunit 5a (S5a) through a surface that also binds ubiquitin. We report that S5a's UIM2 binds preferentially to hHR23a over polyubiquitin and provide a model for the ternary complex, which we expect represents one of the mechanisms used by the proteasome to capture ubiquitylated substrates. Furthermore, we demonstrate that hHR23a is surprisingly adept at sequestering the ubiquitin moieties of a polyubiquitin chain, and provide evidence that it and the ubiquitylated substrate are committed to each other after binding.
doi:10.1016/j.jmb.2007.03.008
PMCID: PMC3864866  PMID: 17408689
ubiquitin receptors; proteasome subunit S5a; hHR23a; Rad23; proteasomal degradation
8.  Defining the Escherichia coli SecA Dimer Interface Residues through In Vivo Site-Specific Photo-Cross-Linking 
Journal of Bacteriology  2013;195(12):2817-2825.
The motor protein SecA is a core component of the bacterial general secretory (Sec) pathway and is essential for cell viability. Despite evidence showing that SecA exists in a dynamic monomer-dimer equilibrium favoring the dimeric form in solution and in the cytoplasm, there is considerable debate as to the quaternary structural organization of the SecA dimer. Here, a site-directed photo-cross-linking technique was utilized to identify residues on the Escherichia coli SecA (ecSecA) dimer interface in the cytosol of intact cells. The feasibility of this method was demonstrated with residue Leu6, which is essential for ecSecA dimerization based on our analytical ultracentrifugation studies of SecA L6A and shown to form the cross-linked SecA dimer in vivo with p-benzoyl-phenylalanine (pBpa) substituted at position 6. Subsequently, the amino terminus (residues 2 to 11) in the nucleotide binding domain (NBD), Phe263 in the preprotein binding domain (PBD), and Tyr794 and Arg805 in the intramolecular regulator of the ATPase 1 domain (IRA1) were identified to be involved in ecSecA dimerization. Furthermore, the incorporation of pBpa at position 805 did not form a cross-linked dimer in the SecA Δ2-11 context, indicating the possibility that the amino terminus may directly contact Arg805 or that the deletion of residues 2 to 11 alters the topology of the naturally occurring ecSecA dimer.
doi:10.1128/JB.02269-12
PMCID: PMC3697251  PMID: 23585536
9.  Ribosomal Database Project: data and tools for high throughput rRNA analysis 
Nucleic Acids Research  2013;42(Database issue):D633-D642.
Ribosomal Database Project (RDP; http://rdp.cme.msu.edu/) provides the research community with aligned and annotated rRNA gene sequence data, along with tools to allow researchers to analyze their own rRNA gene sequences in the RDP framework. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics. In addition to aligned and annotated collections of bacterial and archaeal small subunit rRNA genes, RDP now includes a collection of fungal large subunit rRNA genes. RDP tools, including Classifier and Aligner, have been updated to work with this new fungal collection. The use of high-throughput sequencing to characterize environmental microbial populations has exploded in the past several years, and as sequence technologies have improved, the sizes of environmental datasets have increased. With release 11, RDP is providing an expanded set of tools to facilitate analysis of high-throughput data, including both single-stranded and paired-end reads. In addition, most tools are now available as open source packages for download and local use by researchers with high-volume needs or who would like to develop custom analysis pipelines.
doi:10.1093/nar/gkt1244
PMCID: PMC3965039  PMID: 24288368
10.  Role of the PAS Sensor Domains in the Bacillus subtilis Sporulation Kinase KinA 
Journal of Bacteriology  2013;195(10):2349-2358.
Histidine kinases are sophisticated molecular sensors that are used by bacteria to detect and respond to a multitude of environmental signals. KinA is the major histidine kinase required for initiation of sporulation upon nutrient deprivation in Bacillus subtilis. KinA has a large N-terminal region (residues 1 to 382) that is uniquely composed of three tandem Per-ARNT-Sim (PAS) domains that have been proposed to constitute a sensor module. To further enhance our understanding of this “sensor” region, we defined the boundaries that give rise to the minimal autonomously folded PAS domains and analyzed their homo- and heteroassociation properties using analytical ultracentrifugation, nuclear magnetic resonance (NMR) spectroscopy, and multiangle laser light scattering. We show that PASA self-associates very weakly, while PASC is primarily a monomer. In contrast, PASB forms a stable dimer (Kd [dissociation constant] of <10 nM), and it appears to be the main N-terminal determinant of KinA dimerization. Analysis of KinA mutants deficient for one or more PAS domains revealed a critical role for PASB, but not PASA, in autophosphorylation of KinA. Our findings suggest that dimerization of PASB is important for keeping the catalytic domain of KinA in a functional conformation. We use this information to propose a model for the structure of the N-terminal sensor module of KinA.
doi:10.1128/JB.00096-13
PMCID: PMC3650535  PMID: 23504013
11.  FunGene: the functional gene pipeline and repository 
Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
doi:10.3389/fmicb.2013.00291
PMCID: PMC3787254  PMID: 24101916
microbial ecology; functional genes; amplification primers; phylogeny; biogeochemical cycles; amplicon analysis
12.  Functional genes to assess nitrogen cycling and aromatic hydrocarbon degradation: primers and processing matter 
Targeting sequencing to genes involved in key environmental processes, i.e., ecofunctional genes, provides an opportunity to sample nature's gene guilds to greater depth and help link community structure to process-level outcomes. Vastly different approaches have been implemented for sequence processing and, ultimately, for taxonomic placement of these gene reads. The overall quality of next generation sequence analysis of functional genes is dependent on multiple steps and assumptions of unknown diversity. To illustrate current issues surrounding amplicon read processing we provide examples for three ecofunctional gene groups. A combination of in silico, environmental and cultured strain sequences was used to test new primers targeting the dioxin and dibenzofuran degrading genes dxnA1, dbfA1, and carAa. The majority of obtained environmental sequences were classified into novel sequence clusters, illustrating the discovery value of the approach. For the nitrite reductase step in denitrification, the well-known nirK primers exhibited deficiencies in reference database coverage, illustrating the need to refine primer-binding sites and/or to design multiple primers, while nirS primers exhibited bias against five phyla. Amino acid-based OTU clustering of these two N-cycle genes from soil samples yielded only 114 unique nirK and 45 unique nirS genus-level groupings, likely a reflection of constricted primer coverage. Finally, supervised and non-supervised OTU analysis methods were compared using the nifH gene of nitrogen fixation, with generally similar outcomes, but the clustering (non-supervised) method yielded higher diversity estimates and stronger site-based differences. High throughput amplicon sequencing can provide inexpensive and rapid access to nature's related sequences by circumventing the culturing barrier, but each unique gene requires individual considerations in terms of primer design and sequence processing and classification.
doi:10.3389/fmicb.2013.00279
PMCID: PMC3775264  PMID: 24062736
functional genes; nifH; aromatic hydrocarbon; nirS; primer specificity; clustering analysis; nirK; nitrogen cycling
13.  Ecological Patterns of nifH Genes in Four Terrestrial Climatic Zones Explored with Targeted Metagenomics Using FrameBot, a New Informatics Tool 
mBio  2013;4(5):e00592-13.
ABSTRACT
Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (nifH). We obtained 1.1 million nifH 454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all nifH reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those nifH sequences related to the Proteobacteria phylum were most abundant, followed by those for Cyanobacteria in the Alaska and Utah sites. Predominant genera with nifH sequences similar to reads included Azospirillum, Bradyrhizobium, and Rhizobium, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known nifH gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any.
IMPORTANCE
High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more straightforward to assess the genes directly responsible for the ecological function (ecofunctional genes). However, analyzing these genes involves technical challenges beyond those seen for rRNA. In particular, frameshift errors cause garbled downstream protein translations. Our FrameBot tool described here both corrects frameshift errors in query reads and determines their closest matching protein sequences in a set of reference sequences. We validated this new tool with sequences from defined communities and demonstrated the tool’s utility on nifH gene fragments sequenced from soils in well-characterized and major terrestrial ecosystem types.
doi:10.1128/mBio.00592-13
PMCID: PMC3781835  PMID: 24045641
14.  Meeting Report: Fungal ITS Workshop (October 2012) 
Standards in Genomic Sciences  2013;8(1):118-123.
This report summarizes a meeting held in Boulder, CO USA (19–20 October 2012) on fungal community analyses using ultra-high-throughput sequencing of the internal transcribed spacer (ITS) region of the nuclear ribosomal RNA (rRNA) genes. The meeting was organized as a two-day workshop, with the primary goal of supporting collaboration among researchers for improving fungal ITS sequence resources and developing recommendations for standard ITS primers for the research community.
doi:10.4056/sigs.3737409
PMCID: PMC3739174  PMID: 23961317
15.  A gene-targeted approach to investigate the intestinal butyrate-producing bacterial community 
Microbiome  2013;1:8.
Background
Butyrate, which is produced by the human microbiome, is essential for a well-functioning colon. Bacteria that produce butyrate are phylogenetically diverse, which hinders their accurate detection based on conventional phylogenetic markers. As a result, reliable information on this important bacterial group is often lacking in microbiome research.
Results
In this study we describe a gene-targeted approach for 454 pyrotag sequencing and quantitative polymerase chain reaction for the final genes in the two primary bacterial butyrate synthesis pathways, butyryl-CoA:acetate CoA-transferase (but) and butyrate kinase (buk). We monitored the establishment and early succession of butyrate-producing communities in four patients with ulcerative colitis who underwent a colectomy with ileal pouch anal anastomosis and compared it with three control samples from healthy colons. All patients established an abundant butyrate-producing community (approximately 5% to 26% of the total community) in the pouch within the 2-month study, but patterns were distinctive among individuals. Only one patient harbored a community profile similar to the healthy controls, in which there was a predominance of but genes that are similar to reference genes from Acidaminococcus sp., Eubacterium sp., Faecalibacterium prausnitzii and Roseburia sp., and an almost complete absence of buk genes. Two patients were greatly enriched in buk genes similar to those of Clostridium butyricum and C. perfringens, whereas a fourth patient displayed abundant communities containing both genes. Most butyrate producers identified in previous studies were detected and the general patterns of taxa found were supported by 16S rRNA gene pyrotag analysis, but the gene-targeted approach provided more detail about the potential butyrate-producing members of the community.
Conclusions
The presented approach provides quantitative and genotypic insights into butyrate-producing communities and facilitates a more specific functional characterization of the intestinal microbiome. Furthermore, our analysis refines but and buk reference annotations found in central databases.
doi:10.1186/2049-2618-1-8
PMCID: PMC4126176  PMID: 24451334
Butyrate; Gene-targeted metagenomics; Human microbiome project; Pouchitis; Ulcerative colitis
16.  Evaluating multicenter DTI data in Huntington's disease on site specific effects: An ex post facto approach☆ 
NeuroImage : Clinical  2013;2:161-167.
Purpose
Assessment of the feasibility to average diffusion tensor imaging (DTI) metrics of MRI data acquired in the course of a multicenter study.
Materials and methods
Sixty-one early stage Huntington's disease patients and forty healthy controls were studied using four different MR scanners at four European sites with acquisition protocols as close as possible to a given standard protocol. The potential and feasibility of averaging data acquired at different sites was evaluated quantitatively by region-of-interest (ROI) based statistical comparisons of coefficients of variation (CV) across centers, as well as by testing for significant group-by-center differences on averaged fractional anisotropy (FA) values between patients and controls. In addition, a whole-brain based statistical between-group comparison was performed using FA maps.
Results
The ex post facto statistical evaluation of CV and FA-values in a priori defined ROIs showed no differences between sites above chance indicating that data were not systematically biased by center specific factors.
Conclusion
Averaging FA-maps from DTI data acquired at different study sites and different MR scanner types does not appear to be systematically biased. A suitable recipe for testing on the possibility to pool multicenter DTI data is provided to permit averaging of DTI-derived metrics to differentiate patients from healthy controls at a larger scale.
Highlights
► Alternative procedure to evaluate prerequisites for multicenter DTI data pooling. ► Procedure may serve as reference for future multicenter MRI-DTI trials in HD. ► FA differences between HD and controls consistent with single center reports.
doi:10.1016/j.nicl.2012.12.005
PMCID: PMC3777841  PMID: 24179771
Multicenter study; Diffusion tensor imaging; Fractional anisotropy; Huntington's disease
17.  The Role of Human Dicer-dsRBD in Processing Small Regulatory RNAs 
PLoS ONE  2012;7(12):e51829.
One of the most exciting recent developments in RNA biology has been the discovery of small non-coding RNAs that affect gene expression through the RNA interference (RNAi) mechanism. Two major classes of RNAs involved in RNAi are small interfering RNA (siRNA) and microRNA (miRNA). Dicer, an RNase III enzyme, plays a central role in the RNAi pathway by cleaving precursors of both of these classes of RNAs to form mature siRNAs and miRNAs, which are then loaded into the RNA-induced silencing complex (RISC). miRNA and siRNA precursors are quite structurally distinct; miRNA precursors are short, imperfect hairpins while siRNA precursors are long, perfect duplexes. Nonetheless, Dicer is able to process both. Dicer, like the majority of RNase III enzymes, contains a dsRNA binding domain (dsRBD), but the data are sparse on the exact role this domain plays in the mechanism of Dicer binding and cleavage. To further explore the role of human Dicer-dsRBD in the RNAi pathway, we determined its binding affinity to various RNAs modeling both miRNA and siRNA precursors. Our study shows that Dicer-dsRBD is an avid binder of dsRNA, but its binding is only minimally influenced by a single-stranded – double-stranded junction caused by large terminal loops observed in miRNA precursors. Thus, the Dicer-dsRBD contributes directly to substrate binding but not to the mechanism of differentiating between pre-miRNA and pre-siRNA. In addition, NMR spin relaxation and MD simulations provide an overview of the role that dynamics contribute to the binding mechanism. We compare this current study with our previous studies of the dsRBDs from Drosha and DGCR8 to give a dynamic profile of dsRBDs in their apo-state and a mechanistic view of dsRNA binding by dsRBDs in general.
doi:10.1371/journal.pone.0051829
PMCID: PMC3521659  PMID: 23272173
18.  Evaluation of multi-modal, multi-site neuroimaging measures in Huntington's disease: Baseline results from the PADDINGTON study☆ 
NeuroImage : Clinical  2012;2:204-211.
Background
Macro- and micro-structural neuroimaging measures provide valuable information on the pathophysiology of Huntington's disease (HD) and are proposed as biomarkers. Despite theoretical advantages of microstructural measures in terms of sensitivity to pathology, there is little evidence directly comparing the two.
Methods
40 controls and 61 early HD subjects underwent 3 T MRI (T1- and diffusion-weighted), as part of the PADDINGTON study. Macrostructural volumetrics were obtained for the whole brain, caudate, putamen, corpus callosum (CC) and ventricles. Microstructural diffusion metrics of fractional anisotropy (FA), mean-, radial- and axial-diffusivity (MD, RD, AD) were computed for white matter (WM), CC, caudate and putamen. Group differences were examined adjusting for age, gender and site. A formal comparison of effect sizes determined which modality and metrics provided a statistically significant advantage over others.
Results
Macrostructural measures showed decreased regional and global volume in HD (p < 0.001); except the ventricles which were enlarged (p < 0.01). In HD, FA was increased in the deep grey-matter structures (p < 0.001), and decreased in the WM (CC, p = 0.035; WM, p = 0.053); diffusivity metrics (MD, RD, AD) were increased for all brain regions (p < 0.001). The largest effect sizes were for putamen volume, caudate volume and putamen diffusivity (AD, RD and MD); each was significantly larger than those for all other metrics (p < 0.05).
Conclusion
The highest performing macro- and micro-structural metrics had similar sensitivity to HD pathology quantified via effect sizes. Region-of-interest may be more important than imaging modality, with deep grey-matter regions outperforming the CC and global measures, for both volume and diffusivity. FA appears to be relatively insensitive to disease effects.
Highlights
► Macro and microstructural metrics are sensitive to HD pathology cross-sectionally. ► Largest effect sizes for putamen volume, caudate volume and putamen diffusivity ► No significant advantage of highest performing macro over microstructural metrics ► Grey matter regions outperformed CC and global measures within each modality. ► FA appears to be relatively insensitive to disease effects.
doi:10.1016/j.nicl.2012.12.001
PMCID: PMC3777685  PMID: 24179770
Huntington's disease; MRI; Diffusion; Volumetric
19.  Heparin Activates PKR by Inducing Dimerization 
Journal of Molecular Biology  2011;413(5):973-984.
Protein kinase R (PKR) is an interferon-induced kinase that plays a pivotal role in the innate immunity pathway. PKR is activated to undergo autophosphorylation upon binding to double-stranded RNAs or RNAs that contain duplex regions. Activated PKR phosphorylates the α subunit of eukaryotic initiation factor 2, thereby inhibiting protein synthesis. PKR is also activated by heparin, a highly sulfated glycosaminoglycan. We have used biophysical methods to define the mechanism of PKR activation by heparin. Heparins as short as hexasaccharide bind strongly to PKR and activate autophosphorylation. In contrast to double-stranded RNA, heparin activates PKR by binding to the kinase domain. Analytical ultracentrifugation measurements support a thermodynamic linkage model where heparin binding allosterically enhances PKR dimerization, thereby activating the kinase. These results indicate that PKR can be activated by small molecules and represents a viable target for the development of novel antiviral agents.
doi:10.1016/j.jmb.2011.09.025
PMCID: PMC3268052  PMID: 21978664
analytical ultracentrifugation; drug discovery; innate immunity; protein kinase R
20.  Analysis of high affinity binding of PKR to dsRNA 
Biochemistry  2012;51(44):8764-8770.
Protein kinase R (PKR) is an interferon-induced kinase which plays a pivotal role in the innate immunity response to viral infection. PKR is activated upon binding to dsRNA. Our previous analysis of PKR binding to dsRNAs ranging from 20-40 bp supports a dimerization model for activation where 30 bp represents the minimal length required to bind two PKR monomers and activate PKR via autophosphorylation. These studies were complicated by the formation of protein-RNA aggregates, particularly at low salt concentrations using longer dsRNAs. Here, we have taken advantage of the enhanced sensitivity afforded using fluorescence-detected analytical ultracentrifugation to reduce the RNA concentrations from micromolar to nanomolar. Under these conditions, we are able to characterize high affinity binding of PKR to longer dsRNAs in 75 mM NaCl. The PKR binding stoichiometries are increased at lower salt but remain lower than those previously obtained for the dsRNA binding domain. The dependence of the limiting PKR binding stoichiometries on dsRNA length does not conform to standard models for nonspecific binding and suggests that binding to longer sequences occurs via a different binding mode with a larger site size. Although dimerization plays a key role in the PKR activation mechanism, the ability of shorter dsRNAs to bind two PKR monomers is not sufficient to induce autophosphorylation. We propose that activation of PKR by longer RNAs is correlated with an alternative binding mode where both of the dsRNA binding motifs contact the RNA, inducing PKR to dimerize via a direct interaction of the kinase domains.
doi:10.1021/bi301226h
PMCID: PMC3495235  PMID: 23062027
21.  The use of analytical sedimentation velocity to extract thermodynamic linkage 
Biophysical chemistry  2011;159(1):120-128.
For 25 years, the Gibbs Conference on Biothermodynamics has focused on the use of thermodynamics to extract information about the mechanism and regulation of biological processes. This includes the determination of equilibrium constants for macromolecular interactions by high precision physical measurements. These approaches further reveal thermodynamic linkages to ligand binding events. Analytical ultracentrifugation has been a fundamental technique in the determination of macromolecular reaction stoichiometry and energetics for 85 years. This approach is highly amenable to the extraction of thermodynamic couplings to small molecule binding in the overall reaction pathway. In the 1980’s this approach was extended to the use of sedimentation velocity techniques, primarily by the analysis of tubulin-drug interactions by Na and Timasheff. This transport method necessarily incorporates the complexity of both hydrodynamic and thermodynamic nonideality. The advent of modern computational methods in the last 20 years has subsequently made the analysis of sedimentation velocity data for interacting systems more robust and rigorous. Here we review three examples where sedimentation velocity has been useful at extracting thermodynamic information about reaction stoichiometry and energetics. Approaches to extract linkage to small molecule binding and the influence of hydrodynamic nonideality are emphasized. These methods are shown to also apply to the collection of fluorescence data with the new Aviv FDS.
doi:10.1016/j.bpc.2011.05.014
PMCID: PMC3166974  PMID: 21703752
Biothermodynamics; coupling; linkage; analytical ultracentrifugation; sedimentation velocity; nonideality; Aviv fluorescence detection system
22.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications 
Yilmaz, Pelin | Kottmann, Renzo | Field, Dawn | Knight, Rob | Cole, James R | Amaral-Zettler, Linda | Gilbert, Jack A | Karsch-Mizrachi, Ilene | Johnston, Anjanette | Cochrane, Guy | Vaughan, Robert | Hunter, Christopher | Park, Joonhong | Morrison, Norman | Rocca-Serra, Philippe | Sterk, Peter | Arumugam, Manimozhiyan | Bailey, Mark | Baumgartner, Laura | Birren, Bruce W | Blaser, Martin J | Bonazzi, Vivien | Booth, Tim | Bork, Peer | Bushman, Frederic D | Buttigieg, Pier Luigi | Chain, Patrick S G | Charlson, Emily | Costello, Elizabeth K | Huot-Creasy, Heather | Dawyndt, Peter | DeSantis, Todd | Fierer, Noah | Fuhrman, Jed A | Gallery, Rachel E | Gevers, Dirk | Gibbs, Richard A | Gil, Inigo San | Gonzalez, Antonio | Gordon, Jeffrey I | Guralnick, Robert | Hankeln, Wolfgang | Highlander, Sarah | Hugenholtz, Philip | Jansson, Janet | Kau, Andrew L | Kelley, Scott T | Kennedy, Jerry | Knights, Dan | Koren, Omry | Kuczynski, Justin | Kyrpides, Nikos | Larsen, Robert | Lauber, Christian L | Legg, Teresa | Ley, Ruth E | Lozupone, Catherine A | Ludwig, Wolfgang | Lyons, Donna | Maguire, Eamonn | Methé, Barbara A | Meyer, Folker | Muegge, Brian | Nakielny, Sara | Nelson, Karen E | Nemergut, Diana | Neufeld, Josh D | Newbold, Lindsay K | Oliver, Anna E | Pace, Norman R | Palanisamy, Giriprakash | Peplies, Jörg | Petrosino, Joseph | Proctor, Lita | Pruesse, Elmar | Quast, Christian | Raes, Jeroen | Ratnasingham, Sujeevan | Ravel, Jacques | Relman, David A | Assunta-Sansone, Susanna | Schloss, Patrick D | Schriml, Lynn | Sinha, Rohini | Smith, Michelle I | Sodergren, Erica | Spor, Aymé | Stombaugh, Jesse | Tiedje, James M | Ward, Doyle V | Weinstock, George M | Wendel, Doug | White, Owen | Whiteley, Andrew | Wilke, Andreas | Wortman, Jennifer R | Yatsunenko, Tanya | Glöckner, Frank Oliver
Nature Biotechnology  2011;29(5):415-420.
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
doi:10.1038/nbt.1823
PMCID: PMC3367316  PMID: 21552244
23.  Energetics of SecA Dimerization 
Journal of molecular biology  2011;408(1):87-98.
Summary
Transport of many proteins to extracytoplasmic locations occurs via the general secretion (Sec) pathway. In Escherichia coli, this pathway is comprised of the SecYEG protein conducting channel and the SecA ATPase. SecA plays a central role in binding the signal peptide region of preproteins, directing preproteins to membrane-bound SecYEG and promoting translocation coupled with ATP hydrolysis. Although it is well-established that SecA is crucial for preprotein transport and thus cell viability, its oligomeric state during different stages of transport remains ill defined. We have characterized the energetics of SecA dimerization as a function of salt concentration and temperature and defined the linkage of SecA dimerization and signal peptide binding using analytical ultracentrifugation. The use of a new fluorescence detector permitted analysis of SecA dimerization down to concentrations as low as 50 nM. The dimer dissociation constants are strongly dependent on salt. Linkage analysis indicates that SecA dimerization is coupled to the release of about five ions, demonstrating that electrostatic interactions play an important role in stabilizing the SecA dimer interface. Binding of signal peptide reduces SecA dimerization affinity such that Kd increases about 9-fold from 0.28 μM in the absence of peptide to 2.68 μM in the presence of peptide. The weakening of the SecA dimer that accompanies signal peptide binding may poise the SecA dimer to dissociate upon binding to SecYEG.
doi:10.1016/j.jmb.2011.02.006
PMCID: PMC3070768  PMID: 21315086
SecA; signal peptide; analytical ultracentrifugation; thermodynamic linkage
24.  Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms 
PLoS ONE  2012;7(3):e32491.
Background
Currently, the naïve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa.
Principal Findings
Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we “train” a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naïve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3–15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present).
Conclusions
We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are “highly” similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database “doubles” in the future.
doi:10.1371/journal.pone.0032491
PMCID: PMC3293824  PMID: 22403664
25.  Subregional hippocampal deformations in major depressive disorder 
Journal of affective disorders  2010;126(1-2):272-277.
Background
Hippocampal atrophy is a well reported feature of major depressive disorder, although the evidence has been mixed. The present study sought to examine hippocampal volume and subregional morphology in patients with major depressive disorder, who were all medication-free and in an acute depressive episode of moderate severity.
Methods
Structural magnetic resonance imaging scans were acquired in 37 patients (mean age 42 years) and 37 age, gender and IQ-matched healthy individuals. Hippocampal volume and subregional structural differences were measured by manual tracings and identification of homologous surface points to the central core of each hippocampus.
Results
Both right (P = 0.001) and left (P = 0.005) hippocampal volumes were reduced in patients relative to healthy controls (n = 37 patients and n = 37 controls), while only the right hippocampus (p = 0.016) showed a reduced volume in a subgroup of first episode depression patients (n = 13) relative to healthy controls. Shape analysis localised the subregional deformations to the subiculum and CA1 subfield extending into the CA2-3 subfields predominantly in the tail regions in the right (p = 0.017) and left (p = 0.011) hippocampi.
Limitations
As all patients were in an acute depressive episode, effects associated with depressive state cannot be distinguished from trait effects.
Conclusions
Subregional hippocampal deficits are present early in the course of major depression. The deformations may reflect structural correlates underlying functional memory impairments and distinguish depression from other psychiatric disorders.
doi:10.1016/j.jad.2010.03.004
PMCID: PMC3197834  PMID: 20392498
depression; hippocampus; morphology; shape; MRI

Results 1-25 (65)