1.  KD4v: comprehensible knowledge discovery system for missense variant 
Nucleic Acids Research  2012;40(Web Server issue):W71-W75.
A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at
PMCID: PMC3394327  PMID: 22641855
2.  Identifying Neighborhoods of Coordinated Gene Expression and Metabolite Profiles 
PLoS ONE  2012;7(2):e31345.
In this paper we investigate how metabolic network structure affects any coordination between transcript and metabolite profiles. To achieve this goal we conduct two complementary analyses focused on the metabolic response to stress. First, we investigate the general size of any relationship between metabolic network gene expression and metabolite profiles. We find that strongly correlated transcript-metabolite profiles are sustained over surprisingly long network distances away from any target metabolite. Secondly, we employ a novel pathway mining method to investigate the structure of this transcript-metabolite relationship. The objective of this method is to identify a minimum set of metabolites which are the target of significantly correlated gene expression pathways. The results reveal that in general, a global regulation signature targeting a small number of metabolites is responsible for a large scale metabolic response. However, our method also reveals pathway specific effects that can degrade this global regulation signature and complicates the observed coordination between transcript-metabolite profiles.
PMCID: PMC3280297  PMID: 22355360
3.  ℮-conome: an automated tissue counting platform of cone photoreceptors for rodent models of retinitis pigmentosa 
BMC Ophthalmology  2011;11:38.
Retinitis pigmentosa is characterized by the sequential loss of rod and cone photoreceptors. The preservation of cones would prevent blindness due to their essential role in human vision. Rod-derived Cone Viability Factor is a thioredoxin-like protein that is secreted by rods and is involved in cone survival. To validate the activity of Rod-derived Cone Viability Factors (RdCVFs) as therapeutic agents for treating retinitis Pigmentosa, we have developed e-conome, an automated cell counting platform for retinal flat mounts of rodent models of cone degeneration. This automated quantification method allows for faster data analysis thereby accelerating translational research.
An inverted fluorescent microscope, motorized and coupled to a CCD camera records images of cones labeled with fluorescent peanut agglutinin lectin on flat-mounted retinas. In an average of 300 fields per retina, nine Z-planes at magnification X40 are acquired after two-stage autofocus individually for each field. The projection of the stack of 9 images is subject to a threshold, filtered to exclude aberrant images based on preset variables. The cones are identified by treating the resulting image using 13 variables empirically determined. The cone density is calculated over the 300 fields.
The method was validated by comparison to the conventional stereological counting. The decrease in cone density in rd1 mouse was found to be equivalent to the decrease determined by stereological counting. We also studied the spatiotemporal pattern of the degeneration of cones in the rd1 mouse and show that while the reduction in cone density starts in the central part of the retina, cone degeneration progresses at the same speed over the whole retinal surface. We finally show that for mice with an inactivation of the Nucleoredoxin-like genes Nxnl1 or Nxnl2 encoding RdCVFs, the loss of cones is more pronounced in the ventral retina.
The automated platform ℮-conome used here for retinal disease is a tool that can broadly accelerate translational research for neurodegenerative diseases.
PMCID: PMC3271040  PMID: 22185426
4.  A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH 
BMC Genomics  2007;8:84.
Currently, two main technologies are used for screening of DNA copy number; the BAC (Bacterial Artificial Chromosome) and the recently developed oligonucleotide-based CGH (Chromosomal Comparative Genomic Hybridization) arrays which are capable of detecting small genomic regions with amplification or deletion. The correlation as well as the discriminative power of these platforms has never been compared statistically on a significant set of human patient samples.
In this paper, we present an exhaustive comparison between the two CGH platforms, undertaken at two independent sites using the same batch of DNA from 19 advanced prostate cancers. The comparison was performed directly on the raw data and a significant correlation was found between the two platforms. The correlation was greatly improved when the data were averaged over large chromosomic regions using a segmentation algorithm. In addition, this analysis has enabled the development of a statistical model to discriminate BAC outliers that might indicate microevents. These microevents were validated by the oligo platform results.
This article presents a genome-wide statistical validation of the oligo array platform on a large set of patient samples and demonstrates statistically its superiority over the BAC platform for the Identification of chromosomic events. Taking advantage of a large set of human samples treated by the two technologies, a statistical model has been developed to show that the BAC platform could also detect microevents.
PMCID: PMC1852311  PMID: 17394638
5.  PipeAlign: a new toolkit for protein family analysis 
Nucleic Acids Research  2003;31(13):3829-3832.
PipeAlign is a protein family analysis tool integrating a five step process ranging from the search for sequence homologues in protein and 3D structure databases to the definition of the hierarchical relationships within and between subfamilies. The complete, automatic pipeline takes a single sequence or a set of sequences as input and constructs a high-quality, validated MACS (multiple alignment of complete sequences) in which sequences are clustered into potential functional subgroups. For the more experienced user, the PipeAlign server also provides numerous options to run only a part of the analysis, with the possibility to modify the default parameters of each software module. For example, the user can choose to enter an existing multiple sequence alignment for refinement, validation and subsequent clustering of the sequences. The aim is to provide an interactive workbench for the validation, integration and presentation of a protein family, not only at the sequence level, but also at the structural and functional levels. PipeAlign is available at
PMCID: PMC168925  PMID: 12824430
6.  Density of points clustering, application to transcriptomic data analysis 
Nucleic Acids Research  2002;30(18):3992-4000.
With the increasing amount of data produced by high-throughput technologies in many fields of science, clustering has become an integral step in exploratory data analysis in order to group similar elements into classes. However, many clustering algorithms can only work properly if aided by human expertise. For example, one parameter which is crucial and often manually set is the number of clusters present in the analyzed set. We present a novel stopping rule to find the optimal number of clusters based on the comparison of the density of points inside the clusters and between them. The method is evaluated on synthetic as well as on real transcriptomic data and compared with two current methods. Finally, we illustrate its usefulness in the analysis of the expression profiles of promyelocytic cells before and after treatment with all-trans retinoic acid. Simultaneous clustering for gene regulation and absolute initial expression levels allowed the identification of numerous genes associated with signal transduction revealing the complexity of retinoic acid signaling.
PMCID: PMC137097  PMID: 12235383

