Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  Context and the human microbiome 
Microbiome  2015;3:52.
Human microbiome reference datasets provide epidemiological context for researchers, enabling them to uncover new insights into their own data through meta-analyses. In addition, large and comprehensive reference sets offer a means to develop or test hypotheses and can pave the way for addressing practical study design considerations such as sample size decisions. We discuss the importance of reference sets in human microbiome research, limitations of existing resources, technical challenges to employing reference sets, examples of their usage, and contributions of the American Gut Project to the development of a comprehensive reference set. Through engaging the general public, the American Gut Project aims to address many of the issues present in existing reference resources, characterizing health and disease, lifestyle, and dietary choices of the participants while extending its efforts globally through international collaborations.
PMCID: PMC4632476  PMID: 26530830
Microbiome; American Gut Project; Reference database; Meta-analysis
2.  The Power Decoder Simulator for the Evaluation of Pooled shRNA Screen Performance 
Journal of Biomolecular Screening  2015;20(8):965-975.
RNA interference screening using pooled, short hairpin RNA (shRNA) is a powerful, high-throughput tool for determining the biological relevance of genes for a phenotype. Assessing an shRNA pooled screen’s performance is difficult in practice; one can estimate the performance only by using reproducibility as a proxy for power or by employing a large number of validated positive and negative controls. Here, we develop an open-source software tool, the Power Decoder simulator, for generating shRNA pooled screening experiments in silico that can be used to estimate a screen’s statistical power. Using the negative binomial distribution, it models both the relative abundance of multiple shRNAs within a single screening replicate and the biological noise between replicates for each individual shRNA. We demonstrate that this simulator can successfully model the data from an actual laboratory experiment. We then use it to evaluate the effects of biological replicates and sequencing counts on the performance of a pooled screen, without the necessity of gathering additional data. The Power Decoder simulator is written in R and Python and is available for download under the GNU General Public License v3.0.
PMCID: PMC4543901  PMID: 25777298
shRNA library; pooled screening; RNA interference; Monte Carlo simulations; power analysis
3.  Advances in CRISPR-Cas9 genome engineering: lessons learned from RNA interference 
Nucleic Acids Research  2015;43(7):3407-3419.
The discovery that the machinery of the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 bacterial immune system can be re-purposed to easily create deletions, insertions and replacements in the mammalian genome has revolutionized the field of genome engineering and re-invigorated the field of gene therapy. Many parallels have been drawn between the newly discovered CRISPR-Cas9 system and the RNA interference (RNAi) pathway in terms of their utility for understanding and interrogating gene function in mammalian cells. Given this similarity, the CRISPR-Cas9 field stands to benefit immensely from lessons learned during the development of RNAi technology. We examine how the history of RNAi can inform today's challenges in CRISPR-Cas9 genome engineering such as efficiency, specificity, high-throughput screening and delivery for in vivo and therapeutic applications.
PMCID: PMC4402539  PMID: 25800748
4.  Meeting report of the RNA Ontology Consortium January 8-9, 2011 
Standards in Genomic Sciences  2011;4(2):252-256.
This report summarizes the proceedings of the structure mapping working group meeting of the RNA Ontology Consortium (ROC), held in Kona, Hawaii on January 8-9, 2011. The ROC hosted this workshop to facilitate collaborations among those researchers formalizing concepts in RNA, those developing RNA-related software, and those performing genome annotation and standardization. The workshop included three software presentations, extended round-table discussions, and the constitution of two new working groups, the first to address the need for better software integration and the second to discuss standardization and benchmarking of existing RNA annotation pipelines. These working groups have subsequently pursued concrete implementation of actions suggested during the discussion. Further information about the ROC and its activities can be found at
PMCID: PMC3111981  PMID: 21677862
5.  NoiseMaker: simulated screens for statistical assessment 
Bioinformatics  2010;26(19):2484-2485.
Summary: High-throughput screening (HTS) is a common technique for both drug discovery and basic research, but researchers often struggle with how best to derive hits from HTS data. While a wide range of hit identification techniques exist, little information is available about their sensitivity and specificity, especially in comparison to each other. To address this, we have developed the open-source NoiseMaker software tool for generation of realistically noisy virtual screens. By applying potential hit identification methods to NoiseMaker-simulated data and determining how many of the pre-defined true hits are recovered (as well as how many known non-hits are misidentified as hits), one can draw conclusions about the likely performance of these techniques on real data containing unknown true hits. Such simulations apply to a range of screens, such as those using small molecules, siRNAs, shRNAs, miRNA mimics or inhibitors, or gene over-expression; we demonstrate this utility by using it to explain apparently conflicting reports about the performance of the B score hit identification method.
Availability and implementation: NoiseMaker is written in C#, an ECMA and ISO standard language with compilers for multiple operating systems. Source code, a Windows installer and complete unit tests are available at Full documentation and support are provided via an extensive help file and tool-tips, and the developers welcome user suggestions.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2944205  PMID: 20702398
6.  Statistical Methods for Analysis of High-Throughput RNA Interference Screens 
Nature methods  2009;6(8):569-575.
RNA interference (RNAi) has become a powerful technique for reverse genetics and drug discovery and, in both of these areas, large-scale high-throughput RNAi screens are commonly performed. The statistical techniques used to analyze these screens are frequently borrowed directly from small-molecule screening; however small-molecule and RNAi data characteristics differ in meaningful ways. We examine the similarities and differences between RNAi and small-molecule screens, highlighting particular characteristics of RNAi screen data that must be addressed during analysis. Additionally, we provide guidance on selection of analysis techniques in the context of a sample workflow.
PMCID: PMC2789971  PMID: 19644458
7.  Genome Reshuffling for Advanced Intercross Permutation (GRAIP): Simulation and Permutation for Advanced Intercross Population Analysis 
PLoS ONE  2008;3(4):e1977.
Advanced intercross lines (AIL) are segregating populations created using a multi-generation breeding protocol for fine mapping complex trait loci (QTL) in mice and other organisms. Applying QTL mapping methods for intercross and backcross populations, often followed by naïve permutation of individuals and phenotypes, does not account for the effect of AIL family structure in which final generations have been expanded and leads to inappropriately low significance thresholds. The critical problem with naïve mapping approaches in AIL populations is that the individual is not an exchangeable unit.
Methodology/Principal Findings
The effect of family structure has immediate implications for the optimal AIL creation (many crosses, few animals per cross, and population expansion before the final generation) and we discuss these and the utility of AIL populations for QTL fine mapping. We also describe Genome Reshuffling for Advanced Intercross Permutation, (GRAIP) a method for analyzing AIL data that accounts for family structure. GRAIP permutes a more interchangeable unit in the final generation crosses – the parental genome – and simulating regeneration of a permuted AIL population based on exchanged parental identities. GRAIP determines appropriate genome-wide significance thresholds and locus-specific P-values for AILs and other populations with similar family structures. We contrast GRAIP with naïve permutation using a large densely genotyped mouse AIL population (1333 individuals from 32 crosses). A naïve permutation using coat color as a model phenotype demonstrates high false-positive locus identification and uncertain significance levels, which are corrected using GRAIP. GRAIP also detects an established hippocampus weight locus and a new locus, Hipp9a.
Conclusions and Significance
GRAIP determines appropriate genome-wide significance thresholds and locus-specific P-values for AILs and other populations with similar family structures. The effect of family structure has immediate implications for the optimal AIL creation and we discuss these and the utility of AIL populations.
PMCID: PMC2295257  PMID: 18431467
8.  PyCogent: a toolkit for making sense from sequence 
Genome Biology  2007;8(8):R171.
The COmparative GENomic Toolkit, a framework for probabilistic analyses of biological sequences, devising workflows and generating publication quality graphics, has been implemented in Python.
We have implemented in Python the COmparative GENomic Toolkit, a fully integrated and thoroughly tested framework for novel probabilistic analyses of biological sequences, devising workflows, and generating publication quality graphics. PyCogent includes connectors to remote databases, built-in generalized probabilistic techniques for working with biological sequences, and controllers for third-party applications. The toolkit takes advantage of parallel architectures and runs on a range of hardware and operating systems, and is available under the general public license from .
PMCID: PMC2375001  PMID: 17708774

Results 1-8 (8)