Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence 
Bioinformatics  2014;30(22):3181-3188.
Motivation: The clonal theory of adaptive immunity proposes that immunological responses are encoded by increases in the frequency of lymphocytes carrying antigen-specific receptors. In this study, we measure the frequency of different T-cell receptors (TcR) in CD4 + T cell populations of mice immunized with a complex antigen, killed Mycobacterium tuberculosis, using high throughput parallel sequencing of the TcRβ chain. Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition.
Results: To track the changes induced by immunization within this heterogeneous repertoire, the sequence data were classified by counting the frequency of different clusters of short (3 or 4) continuous stretches of amino acids within the antigen binding complementarity determining region 3 (CDR3) repertoire of different mice. Both unsupervised (hierarchical clustering) and supervised (support vector machine) analyses of these different distributions of sequence clusters differentiated between immunized and unimmunized mice with 100% efficiency. The CD4 + TcR repertoires of mice 5 and 14 days postimmunization were clearly different from that of unimmunized mice but were not distinguishable from each other. However, the repertoires of mice 60 days postimmunization were distinct both from naive mice and the day 5/14 animals. Our results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen. However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification.
Availability and implementation: The analysis was implemented in R and Python, and source code can be found in Supplementary Data.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4221123  PMID: 25095879
2.  Physical Module Networks: an integrative approach for reconstructing transcription regulation 
Bioinformatics  2011;27(13):i177-i185.
Motivation: Deciphering the complex mechanisms by which regulatory networks control gene expression remains a major challenge. While some studies infer regulation from dependencies between the expression levels of putative regulators and their targets, others focus on measured physical interactions.
Results: Here, we present Physical Module Networks, a unified framework that combines a Bayesian model describing modules of co-expressed genes and their shared regulation programs, and a physical interaction graph, describing the protein–protein interactions and protein-DNA binding events that coherently underlie this regulation. Using synthetic data, we demonstrate that a Physical Module Network model has similar recall and improved precision compared to a simple Module Network, as it omits many false positive regulators. Finally, we show the power of Physical Module Networks to reconstruct meaningful regulatory pathways in the genetically perturbed yeast and during the yeast cell cycle, as well as during the response of primary epithelial human cells to infection with H1N1 influenza.
Availability: The PMN software is available, free for academic use at
PMCID: PMC3117354  PMID: 21685068
3.  An integrative clustering and modeling algorithm for dynamical gene expression data 
Bioinformatics  2011;27(13):i392-i400.
Motivation: The precise dynamics of gene expression is often crucial for proper response to stimuli. Time-course gene-expression profiles can provide insights about the dynamics of many cellular responses, but are often noisy and measured at arbitrary intervals, posing a major analysis challenge.
Results: We developed an algorithm that interleaves clustering time-course gene-expression data with estimation of dynamic models of their response by biologically meaningful parameters. In combining these two tasks we overcome obstacles posed in each one. Moreover, our approach provides an easy way to compare between responses to different stimuli at the dynamical level. We use our approach to analyze the dynamical transcriptional responses to inflammation and anti-viral stimuli in mice primary dendritic cells, and extract a concise representation of the different dynamical response types. We analyze the similarities and differences between the two stimuli and identify potential regulators of this complex transcriptional response.
Availability: The code to our method is freely available
PMCID: PMC3117368  PMID: 21685097
4.  Modularity and directionality in genetic interaction maps 
Bioinformatics  2010;26(12):i228-i236.
Motivation: Genetic interactions between genes reflect functional relationships caused by a wide range of molecular mechanisms. Large-scale genetic interaction assays lead to a wealth of information about the functional relations between genes. However, the vast number of observed interactions, along with experimental noise, makes the interpretation of such assays a major challenge.
Results: Here, we introduce a computational approach to organize genetic interactions and show that the bulk of observed interactions can be organized in a hierarchy of modules. Revealing this organization enables insights into the function of cellular machineries and highlights global properties of interaction maps. To gain further insight into the nature of these interactions, we integrated data from genetic screens under a wide range of conditions to reveal that more than a third of observed aggravating (i.e. synthetic sick/lethal) interactions are unidirectional, where one gene can buffer the effects of perturbing another gene but not vice versa. Furthermore, most modules of genes that have multiple aggravating interactions were found to be involved in such unidirectional interactions. We demonstrate that the identification of external stimuli that mimic the effect of specific gene knockouts provides insights into the role of individual modules in maintaining cellular integrity.
Availability: We designed a freely accessible web tool that includes all our findings, and is specifically intended to allow effective browsing of our results (
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2881382  PMID: 20529911
5.  Identifying novel constrained elements by exploiting biased substitution patterns 
Bioinformatics  2009;25(12):i54-i62.
Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations.
Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection.
Availability: The algorithms are implemented in a Java software package, called SiPhy, freely available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2687944  PMID: 19478016
6.  Nucleosome positioning from tiling microarray data 
Bioinformatics  2008;24(13):i139-i146.
Motivation: The packaging of DNA around nucleosomes in eukaryotic cells plays a crucial role in regulation of gene expression, and other DNA-related processes. To better understand the regulatory role of nucleosomes, it is important to pinpoint their position in a high (5–10 bp) resolution. Toward this end, several recent works used dense tiling arrays to map nucleosomes in a high-throughput manner. These data were then parsed and hand-curated, and the positions of nucleosomes were assessed.
Results: In this manuscript, we present a fully automated algorithm to analyze such data and predict the exact location of nucleosomes. We introduce a method, based on a probabilistic graphical model, to increase the resolution of our predictions even beyond that of the microarray used. We show how to build such a model and how to compile it into a simple Hidden Markov Model, allowing for a fast and accurate inference of nucleosome positions.
We applied our model to nucleosomal data from mid-log yeast cells reported by Yuan et al. and compared our predictions to those of the original paper; to a more recent method that uses five times denser tiling arrays as explained by Lee et al.; and to a curated set of literature-based nucleosome positions. Our results suggest that by applying our algorithm to the same data used by Yuan et al. our fully automated model traced 13% more nucleosomes, and increased the overall accuracy by about 20%. We believe that such an improvement opens the way for a better understanding of the regulatory mechanisms controlling gene expression, and how they are encoded in the DNA.
PMCID: PMC2718629  PMID: 18586706

Results 1-6 (6)