Search tips
Search criteria

Results 1-18 (18)

Clipboard (0)
Year of Publication
Document Types
1.  A marginalized variational bayesian approach to the analysis of array data 
BMC Proceedings  2008;2(Suppl 4):S7.
Bayesian unsupervised learning methods have many applications in the analysis of biological data. For example, for the cancer expression array datasets presented in this study, they can be used to resolve possible disease subtypes and to indicate statistically significant dysregulated genes within these subtypes.
In this paper we outline a marginalized variational Bayesian inference method for unsupervised clustering. In this approach latent process variables and model parameters are allowed to be dependent. This is achieved by marginalizing the mixing Dirichlet variables and then performing inference in the reduced variable space. An iterative update procedure is proposed.
Theoretically and experimentally we show that the proposed algorithm gives a much better free-energy lower bound than a standard variational Bayesian approach. The algorithm is computationally efficient and its performance is demonstrated on two expression array data sets.
PMCID: PMC2648311  PMID: 19091054
2.  Identification of functional modules based on transcriptional regulation structure 
BMC Proceedings  2008;2(Suppl 4):S4.
Identifying gene functional modules is an important step towards elucidating gene functions at a global scale. Clustering algorithms mostly rely on co-expression of genes, that is group together genes having similar expression profiles.
We propose to cluster genes by co-regulation rather than by co-expression. We therefore present an inference algorithm for detecting co-regulated groups from gene expression data and introduce a method to cluster genes given that inferred regulatory structure. Finally, we propose to validate the clustering through a score based on the GO enrichment of the obtained groups of genes.
We evaluate the methods on the stress response of S. Cerevisiae data and obtain better scores than clustering obtained directly from gene expression.
PMCID: PMC2654972  PMID: 19091051
3.  Towards structured output prediction of enzyme function 
BMC Proceedings  2008;2(Suppl 4):S2.
In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM3) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences.
In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction.
Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction.
PMCID: PMC2654971  PMID: 19091049
4.  Towards a semi-automatic functional annotation tool based on decision-tree techniques 
BMC Proceedings  2008;2(Suppl 4):S3.
Due to the continuous improvements of high throughput technologies and experimental procedures, the number of sequenced genomes is increasing exponentially. Ultimately, the task of annotating these data relies on the expertise of biologists. The necessity for annotation to be supervised by human experts is the rate limiting step of the data analysis. To face the deluge of new genomic data, the need for automating, as much as possible, the annotation process becomes critical.
We consider annotation of a protein with terms of the functional hierarchy that has been used to annotate Bacillus subtilis and propose a set of rules that predict classes in terms of elements of the functional hierarchy, i.e., a class is a node or a leaf of the hierarchy tree. The rules are obtained through two decision-trees techniques: first-order decision-trees and multilabel attribute-value decision-trees, by using as training data the proteins from two lactic bacteria: Lactobacillus sakei and Lactobacillus bulgaricus. We tested the two methods, first independently, then in a combined approach, and evaluated the obtained results using hierarchical evaluation measures. Results obtained for the two approaches on both genomes are comparable and show a good precision together with a high prediction rate. Using combined approaches increases the recall and the prediction rate.
The combination of the two approaches is very encouraging and we will further refine these combinations in order to get rules even more useful for the annotators. This first study is a crucial step towards designing a semi-automatic functional annotation tool.
PMCID: PMC2654970  PMID: 19091050
5.  Machine Learning in Systems Biology 
BMC Proceedings  2008;2(Suppl 4):S1.
This supplement contains extended versions of a selected subset of papers presented at the workshop MLSB 2007, Machine Learning in Systems Biology, Evry, France, from September 24 to 25, 2007.
PMCID: PMC2654969  PMID: 19091048
6.  Gene-based bin analysis of genome-wide association studies 
BMC Proceedings  2008;2(Suppl 4):S6.
With the improvement of genotyping technologies and the exponentially growing number of available markers, case-control genome-wide association studies promise to be a key tool for investigation of complex diseases. However new analytical methods have to be developed to face the problems induced by this data scale-up, such as statistical multiple testing, data quality control and computational tractability.
We present a novel method to analyze genome-wide association studies results. The algorithm is based on a Bayesian model that integrates genotyping errors and genomic structure dependencies. p-values are assigned to genomic regions termed bins, which are defined from a gene-biased partitioning of the genome, and the false-discovery rate is estimated. We have applied this algorithm to data coming from three genome-wide association studies of Multiple Sclerosis.
The method practically overcomes the scale-up problems and permits to identify new putative regions statistically associated with the disease.
PMCID: PMC2654974  PMID: 19091053
7.  Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae 
BMC Proceedings  2008;2(Suppl 4):S5.
Nitrogen is an essential nutrient for all life forms. Like most unicellular organisms, the yeast Saccharomyces cerevisiae transports and catabolizes good nitrogen sources in preference to poor ones. Nitrogen catabolite repression (NCR) refers to this selection mechanism. All known nitrogen catabolite pathways are regulated by four regulators. The ultimate goal is to infer the complete nitrogen catabolite pathways. Bioinformatics approaches offer the possibility to identify putative NCR genes and to discard uninteresting genes.
We present a machine learning approach where the identification of putative NCR genes in the yeast Saccharomyces cerevisiae is formulated as a supervised two-class classification problem. Classifiers predict whether genes are NCR-sensitive or not from a large number of variables related to the GATA motif in the upstream non-coding sequences of the genes. The positive and negative training sets are composed of annotated NCR genes and manually-selected genes known to be insensitive to NCR, respectively. Different classifiers and variable selection methods are compared. We show that all classifiers make significant and biologically valid predictions by comparing these predictions to annotated and putative NCR genes, and by performing several negative controls. In particular, the inferred NCR genes significantly overlap with putative NCR genes identified in three genome-wide experimental and bioinformatics studies.
These results suggest that our approach can successfully identify potential NCR genes. Hence, the dimensionality of the problem of identifying all genes involved in NCR is drastically reduced.
PMCID: PMC2654973  PMID: 19091052
8.  Electronic surveillance of outbreaks in Lebanon 
BMC Proceedings  2008;2(Suppl 3):S2.
This paper describes and assesses the electronic surveillance of outbreaks based on the early warning for four endemic diseases – typhoid fever, amebic dysentery, viral hepatitis A and brucellosis – in Lebanon, for the first 28 weeks of 2005 and first 26 weeks of 2007.
The electronic early warning system is based on the mandatory notification of 37 targeted diseases. The four target diseases assessed in this paper are based on monthly notification. Standards were set for case definitions and forms. Physicians and hospitals report to the Ministry of Public Health (MOPH), where data is checked and transmitted to a central location for entry into the national database, which stores historical and current data, as well as population estimates based on national surveys. The event date was selected for case dating. Indicators triggering abnormalities include number of cases, rates, and relative ratios. Four relative ratios were selected using the period of 1 week, 4 weeks or 52 weeks for the current and previous years. Screening was conducted on a weekly basis in 2005, and on a daily basis in 2007. Abnormal signals were verified, documented and grouped by alert-episodes for each disease, district, and period. MOPH teams verified and investigated case clustering.
During the first 28 weeks of 2005 and the first 26 weeks of 2007, screening operations were 68% and 89%, respectively, for completeness. Detected abnormal signals were 26 and 166 and identified alert-episodes were 11 and 22, respectively. Verified clusters were 7 and 11; positive predictive value for clusters identification was 64% and 50%, respectively. The time interval between first cases and first abnormal signals was on average 4 weeks and 5 weeks, respectively.
Timely reporting, transmission, data entry, analysis and communication are the elements of timely outbreak detection. The electronic surveillance of outbreaks for epidemic-prone diseases, which are mandatory notified on a monthly basis using indicator-based thresholds, is capable of detecting spatio-temporal clusters and outbreaks; however, with some delay. The national surveillance system needs to be reviewed in order to provide timely data for early warning surveillance and response.
PMCID: PMC2587695  PMID: 19025679
9.  Electronic public health surveillance in developing settings: meeting summary 
BMC Proceedings  2008;2(Suppl 3):S1.
In some high-income countries, public health surveillance includes systems that use computer and information technology to monitor health data in near-real time, facilitating timely outbreak detection and situational awareness. In September 2007, a meeting convened in Bangkok, Thailand to consider the adaptation of near-real time surveillance methods to developing settings. Thirty-five participants represented Ministries of Health, universities, and militaries in 13 countries, and the World Health Organization (WHO). The keynote presentation by a WHO official underscored the importance of improved national capacity for epidemic surveillance and response under the new International Health Regulations, which entered into force in June 2007. Other speakers presented innovative electronic surveillance systems for outbreak detection and disease reporting in developing countries, and methodologies employed in near-real time surveillance systems in the United States. During facilitated small- and large-group discussion, participants identified key considerations in four areas for adapting near-real time surveillance to developing settings: software, professional networking, training, and data acquisition and processing. This meeting was a first step in extending the benefits of near-real time surveillance to developing settings. Subsequent steps should include identifying funding and partnerships to pilot-test near-real time surveillance methods in developing areas.
PMCID: PMC2587694  PMID: 19025678
10.  Statistical analyses in disease surveillance systems 
BMC Proceedings  2008;2(Suppl 3):S7.
The performance of disease surveillance systems is evaluated and monitored using a diverse set of statistical analyses throughout each stage of surveillance implementation. An overview of their main elements is presented, with a specific emphasis on syndromic surveillance directed to outbreak detection in resource-limited settings. Statistical analyses are proposed for three implementation stages: planning, early implementation, and consolidation. Data sources and collection procedures are described for each analysis.
During the planning and pilot stages, we propose to estimate the average data collection, data entry and data distribution time. This information can be collected by surveillance systems themselves or through specially designed surveys. During the initial implementation stage, epidemiologists should study the completeness and timeliness of the reporting, and describe thoroughly the population surveyed and the epidemiology of the health events recorded. Additional data collection processes or external data streams are often necessary to assess reporting completeness and other indicators. Once data collection processes are operating in a timely and stable manner, analyses of surveillance data should expand to establish baseline rates and detect aberrations. External investigations can be used to evaluate whether abnormally increased case frequency corresponds to a true outbreak, and thereby establish the sensitivity and specificity of aberration detection algorithms.
Statistical methods for disease surveillance have focused mainly on the performance of outbreak detection algorithms without sufficient attention to the data quality and representativeness, two factors that are especially important in developing countries. It is important to assess data quality at each state of implementation using a diverse mix of data sources and analytical methods. Careful, close monitoring of selected indicators is needed to evaluate whether systems are reaching their proposed goals at each stage.
PMCID: PMC2587693  PMID: 19025684
11.  Visualization techniques and graphical user interfaces in syndromic surveillance systems. Summary from the Disease Surveillance Workshop, Sept. 11–12, 2007; Bangkok, Thailand 
BMC Proceedings  2008;2(Suppl 3):S6.
Timeliness is a critical asset to the detection of public health threats when using syndromic surveillance systems. In order for epidemiologists to effectively distinguish which events are indicative of a true outbreak, the ability to utilize specific data streams from generalized data summaries is necessary. Taking advantage of graphical user interfaces and visualization capacities of current surveillance systems makes it easier for users to investigate detected anomalies by generating custom graphs, maps, plots, and temporal-spatial analysis of specific syndromes or data sources.
PMCID: PMC2587692  PMID: 19025683
12.  Methodologies for data collection 
BMC Proceedings  2008;2(Suppl 3):S5.
Electronic disease surveillance systems can be extremely valuable tools; however, a critical step in system implementation is collecting data. Without accurate and complete data, statistical anomalies that are detected hold little meaning. Many people who have established successful surveillance systems acknowledge the initial data collection process to be one of the most challenging aspects of system implementation.
This discussion will describe the various methods for collecting data as well as describe some of the more common data feeds used in surveillance systems today. Given that every city/region/country looking to establish a surveillance capability has varying degrees of automated data, alternative data collection methods must be considered.
While it would be ideal to collect automated electronic data in a real-time fashion without human intervention, data may also be effectively collected via telephone (both mobile and land lines), fax, and email. Another consideration is what type of data will be used in a surveillance system. If one data source is of high value to one locality, it should not be assumed that it will be as useful in another area. Determining what data sources work best for a particular area is a critical step in system implementation.
Regardless of data type and how they are collected, surveillance systems can be successful if the implementers and end users understand the limitations of both the data and the collection methodology and incorporate that knowledge into their interpretation procedures.
PMCID: PMC2587691  PMID: 19025682
13.  Challenges in the implementation of an electronic surveillance system in a resource-limited setting: Alerta, in Peru 
BMC Proceedings  2008;2(Suppl 3):S4.
Infectious disease surveillance is a primary public health function in resource-limited settings. In 2003, an electronic disease surveillance system (Alerta) was established in the Peruvian Navy with support from the U.S. Naval Medical Research Center Detachment (NMRCD). Many challenges arose during the implementation process, and a variety of solutions were applied. The purpose of this paper is to identify and discuss these issues.
This is a retrospective description of the Alerta implementation. After a thoughtful evaluation according to the Centers for Disease Control and Prevention (CDC) guidelines, the main challenges to implementation were identified and solutions were devised in the context of a resource-limited setting, Peru.
After four years of operation, we have identified a number of challenges in implementing and operating this electronic disease surveillance system. These can be divided into the following categories: (1) issues with personnel and stakeholders; (2) issues with resources in a developing setting; (3) issues with processes involved in the collection of data and operation of the system; and (4) issues with organization at the central hub. Some of the challenges are unique to resource-limited settings, but many are applicable for any surveillance system. For each of these challenges, we developed feasible solutions that are discussed.
There are many challenges to overcome when implementing an electronic disease surveillance system, not only related to technology issues. A comprehensive approach is required for success, including: technical support, personnel management, effective training, and cultural sensitivity in order to assure the effective deployment of an electronic disease surveillance system.
PMCID: PMC2587690  PMID: 19025681
14.  EWORS: using a syndromic-based surveillance tool for disease outbreak detection in Indonesia 
BMC Proceedings  2008;2(Suppl 3):S3.
Electronic syndromic surveillance for early outbreak detection may be a simple, effective tool to rapidly bring reliable and actionable outbreak data to the attention of public health authorities in the developing world.
Twenty-nine signs and symptoms from patients with conditions compatible with infectious diseases are collected from selected Provincial hospitals and analyzed daily. Data is e-mailed on a daily basis to a central data management and analysis center. Automated data analysis may be viewed at the hospital or the Early Warning Outbreak Response System (EWORS) hub at the central level (National Institute of Health Research and Development/NIHRD).
The Indonesian Ministry of Health (MoH) has adopted EWORS since 2006 and will use it as a complementary surveillance tool in wider catchment areas throughout the country. Socialization to more users is still being conducted under collaboration of three Directorate Generals (DGs) of the MoH; DG of NIHRD, DG of Medical Services and DG of Communicable Disease Control and Prevention. Currently, EWORS is being adapted to facilitate detecting a potential outbreak of pandemic influenza in the region, and automated procedures for outbreak detection have been added.
PMCID: PMC2587689  PMID: 19025680
15.  Conclusions of the expert panel: importance of erlotinib as a second-line therapeutic option 
BMC Proceedings  2008;2(Suppl 2):S4.
During the Experts Meeting on Lung Cancer, participants emphasized the usefulness of erlotinib as second-line therapy for lung cancer. They noted that, although there are no comparative studies, erlotinib could be as effective as docetaxel and pemetrexed in second-line therapy. Regarding the toxicity profile of each of these drugs – one of the key issues considered in the meeting – specialists pointed out how important it is to clearly identify existing differences in this issue. Each drug has different degrees of toxicity, and this information is crucial at the time of choosing the therapeutic regimen. Erlotinib treatment could be an effective option for second-line therapy.
PMCID: PMC2559800  PMID: 18831720
16.  First- and second-line treatment of advanced metastatic non-small-cell lung cancer: a global view 
BMC Proceedings  2008;2(Suppl 2):S3.
Treatment of non-small-cell lung cancer is dependent on disease stage. For patients with metastasis or locally advanced disease, the importance of finding therapeutic schemes that may benefit this population is important. This review discusses therapeutic options for first- and second-line treatment in patients with advanced non-small-cell lung cancer. According to current data, the combination of two cytotoxic agents is the optimum first-line treatment for patients with non-small-cell lung cancer and performance status of 0–1. Addition of bevacizumab has shown to provide an even longer survival and to increase response rate. Within the first-line setting, erlotinib appears to be effective in the treatment of elderly patients who would not derive a benefit from standard chemotherapy or those refusing standard chemotherapy. The administration of erlotinib as first-line maintenance therapy is being assessed. There are currently three drugs approved for second-line treatment of patients with advanced non-small-cell lung cancer after failure of first-line chemotherapy. These drugs have proven to be effective in phase III trials. In the phase III trial BR.21 study, the response rate was 8.9% in the erlonitib group, and less than 1% in placebo; median response duration was 7.9 months and 3.7 months, respectively; and the median survival was 6.7 months and 4.7 with erlotinib and placebo, respectively. One-year survival was 31% and 21% with erlotinib and placebo, respectively. In addition, the BR.21 trial revealed that significantly greater improvements in overall quality of life and in both physical and emotional functioning were observed in the erlotinib arm as compared with the placebo arm. Erlotinib is not significantly associated with hematologic adverse effects. Erlotinib is administered orally, and does not require concomitant administration of other drugs, thus causing patients less inconvenience. Analysis of data from different subgroups included in the BR.21 trial show that overall survival is similar among women and men, among patients with adenocarcinoma and epidermoid carcinoma or Asian patients compared with other ethnicities. Combination of erlotinib and bevacizumab in the second-line treatment of patients with advanced disease has been evaluated as anti-angiogenic properties. This combination therapy has provided promising results which should be confirmed in future studies.
PMCID: PMC2559799  PMID: 18831719
17.  Overview of advanced non-small-cell lung cancer treatment in Mexico 
BMC Proceedings  2008;2(Suppl 2):S2.
Lung cancer is the leading cause of cancer-related deaths among males and the second among females. The importance of lung cancer is a major public health problem and there is a need to find effective therapies for its management. Erlotinib has been approved to treat non-small-cell lung cancer. The author's experience in the use of erlonitib in lung cancer patients in Mexico City is described below.
The series includes 17 consecutive patients treated for advanced lung cancer. All patients had measurable disease. Treatment continues until disease progression or significant toxicity occurs. Among patients, adenocarcinoma was the most common tumor histology, followed by bronchioloalveolar tumor, and epidermoid carcinoma. Nine patients received erlotinib as first-line therapy. Of the remaining 8 patients, 4 had undergone surgery, 2 had received chemotherapy, and 2 had received combined chemotherapy and radiotherapy.
Four patients achieved complete remission of the disease, and 7 showed partial response. Five subjects experienced disease progression, and one patient showed stable disease. The most significant cases were two non-smokers women with bronchioloalveolar cancer, who remain in complete remission after erlotinib treatment. A non-smoker male patient with adenocarcinoma histology, who rejected chemotherapy and radiotherapy, it remains in complete remission after 15 months of treatment. A man with epidermoid carcinoma, with previous surgery and treated with chemotherapy and radiotherapy, with tumor recurrence, showed a complete 15-month remission with erlotinib. It was observed clinical response due to treatment with erlotinib despite the tumor histopathology, but therapeutic response was better in patients without smoking history. The most common adverse events associated with erlotinib therapy were dermatologic. After discontinuing treatment for a short period, patients were again given erlotinib without experiencing toxic effects. Hepatotoxic side effects associated to erlotinib were mild and reversible.
Data from this small series of patients support findings reported in the literature. Female non-smokers showed the best therapeutic response to erlotinib treatment. Erlotinib could be considered as a first-line therapeutic option in elderly patients with locally advanced or metastatic lung cancer, or in women with adenocarcinoma.
PMCID: PMC2559798  PMID: 18831718
18.  Latest developments in the treatment of lung cancer 
BMC Proceedings  2008;2(Suppl 2):S1.
PMCID: PMC2559797  PMID: 18831717

Results 1-18 (18)