1.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses 
Nature protocols  2012;7(3):500-507.
We present PEER (probabilistic estimation of expression residuals), a software package implementing statistical models that improve the sensitivity and interpretability of genetic associations in population-scale expression data. This approach builds on factor analysis methods that infer broad variance components in the measurements. PEER takes as input transcript profiles and covariates from a set of individuals, and then outputs hidden factors that explain much of the expression variability. Optionally, these factors can be interpreted as pathway or transcription factor activations by providing prior information about which genes are involved in the pathway or targeted by the factor. The inferred factors are used in genetic association analyses. First, they are treated as additional covariates, and are included in the model to increase detection power for mapping expression traits. Second, they are analyzed as phenotypes themselves to understand the causes of global expression variability. PEER extends previous related surrogate variable models and can be implemented within hours on a desktop computer.
PMCID: PMC3398141  PMID: 22343431
2.  Dalliance: interactive genome viewing on the web 
Bioinformatics  2011;27(6):889-890.
Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.
Availability and Implementation: Dalliance runs entirely within your web browser, and relies on existing DAS server infrastructure. Browsers for several mammalian genomes are available at, and the use of DAS means you can add your own data to these browsers. In addition, the source code (Javascript) is available under the BSD license, and is straightforward to install on your own web server and embed within other documents.
PMCID: PMC3051325  PMID: 21252075
3.  The two most common histological subtypes of malignant germ cell tumour are distinguished by global microRNA profiles, associated with differential transcription factor expression 
Molecular Cancer  2010;9:290.
We hypothesised that differences in microRNA expression profiles contribute to the contrasting natural history and clinical outcome of the two most common types of malignant germ cell tumour (GCT), yolk sac tumours (YSTs) and germinomas.
By direct comparison, using microarray data for paediatric GCT samples and published qRT-PCR data for adult samples, we identified microRNAs significantly up-regulated in YSTs (n = 29 paediatric, 26 adult, 11 overlapping) or germinomas (n = 37 paediatric). By Taqman qRT-PCR we confirmed differential expression of 15 of 16 selected microRNAs and further validated six of these (miR-302b, miR-375, miR-200b, miR-200c, miR-122, miR-205) in an independent sample set. Interestingly, the miR-302 cluster, which is over-expressed in all malignant GCTs, showed further over-expression in YSTs versus germinomas, representing six of the top eight microRNAs over-expressed in paediatric YSTs and seven of the top 11 in adult YSTs. To explain this observation, we used mRNA expression profiles of paediatric and adult malignant GCTs to identify 10 transcription factors (TFs) consistently over-expressed in YSTs versus germinomas, followed by linear regression to confirm associations between TF and miR-302 cluster expression levels. Using the sequence motif analysis environment iMotifs, we identified predicted binding sites for four of the 10 TFs (GATA6, GATA3, TCF7L2 and MAF) in the miR-302 cluster promoter region. Finally, we showed that miR-302 family over-expression in YST is likely to be functionally significant, as mRNAs down-regulated in YSTs were enriched for 3' untranslated region sequences complementary to the common seed of miR-302a~miR-302d. Such mRNAs included mediators of key cancer-associated processes, including tumour suppressor genes, apoptosis regulators and TFs.
Differential microRNA expression is likely to contribute to the relatively aggressive behaviour of YSTs and may enable future improvements in clinical diagnosis and/or treatment.
PMCID: PMC2993676  PMID: 21059207
4.  Metamotifs - a generative model for building families of nucleotide position weight matrices 
BMC Bioinformatics  2010;11:348.
Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence.
We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain.
We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
PMCID: PMC2906491  PMID: 20579334
5.  iMotifs: an integrated sequence motif visualization and analysis environment 
Bioinformatics  2010;26(6):843-844.
Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important.
iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces.
The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.
Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.
PMCID: PMC2832821  PMID: 20106815
6.  An ENU-induced mutation of miR-96 associated with progressive hearing loss in mice 
Nature genetics  2009;41(5):614-618.
Progressive hearing loss is common in the human population, but little is known about the molecular basis. We report a new ENU-induced mouse mutant, diminuendo, with a single base change in the seed region of Mirn96. Heterozygotes show progressive loss of hearing and hair cell anomalies, while homozygotes have no cochlear responses. Most microRNAs are believed to downregulate target genes by binding to specific sites on their mRNAs, so mutation of the seed should lead to target gene upregulation. Microarray analysis revealed 96 transcripts with significantly altered expression in homozygotes; notably, Slc26a5, oncomodulin, Gfi1, Ptprq and Pitpnm1 were downregulated. Hypergeometric p-value analysis showed hundreds of genes were upregulated in mutants. Different genes, with target sites complementary to the mutant seed, were downregulated. This is the first microRNA found associated with deafness, and diminuendo represents a model for understanding and potentially moderating progressive hair cell degeneration in hearing loss more generally.
PMCID: PMC2705913  PMID: 19363478

