Pharmacogenetics aims to elucidate the genetic factors underlying the individual’s response to pharmacotherapy. Coupled with the recent (and ongoing) progress in high-throughput genotyping, sequencing and other genomic technologies, pharmacogenetics is rapidly transforming into pharmacogenomics, while pursuing the primary goals of identifying and studying the genetic contribution to drug therapy response and adverse effects, and existing drug characterization and new drug discovery. Accomplishment of both of these goals hinges on gaining a better understanding of the underlying biological systems; however, reverse-engineering biological system models from the massive datasets generated by the large-scale genetic epidemiology studies presents a formidable data analysis challenge. In this article, we review the recent progress made in developing such data analysis methodology within the paradigm of systems biology research that broadly aims to gain a ‘holistic’, or ‘mechanistic’ understanding of biological systems by attempting to capture the entirety of interactions between the components (genetic and otherwise) of the system.
biological networks; data analysis methodology; genome-wide association studies; metabolomics; pharmacogenomics; systems biology
Natural products are gaining increased applications in drug discovery and development. Being chemically diverse they are able to modulate several targets simultaneously in a complex system. Analysis of gene expression becomes necessary for better understanding of molecular mechanisms. Conventional strategies for expression profiling are optimized for single gene analysis. DNA microarrays serve as suitable high throughput tool for simultaneous analysis of multiple genes. Major practical applicability of DNA microarrays remains in DNA mutation and polymorphism analysis. This review highlights applications of DNA microarrays in pharmacodynamics, pharmacogenomics, toxicogenomics and quality control of herbal drugs and extracts.
Drug discovery; evidence-based medicine; gene expression; genotyping; pharmacodynamics; transcription profiling
Time series microarray experiments are widely used to study dynamical biological processes. Due to the cost of microarray experiments, and also in some cases the limited availability of biological material, about 80% of microarray time series experiments are short (3–8 time points). Previously short time series gene expression data has been mainly analyzed using more general gene expression analysis tools not designed for the unique challenges and opportunities inherent in short time series gene expression data.
We introduce the Short Time-series Expression Miner (STEM) the first software program specifically designed for the analysis of short time series microarray gene expression data. STEM implements unique methods to cluster, compare, and visualize such data. STEM also supports efficient and statistically rigorous biological interpretations of short time series data through its integration with the Gene Ontology.
The unique algorithms STEM implements to cluster and compare short time series gene expression data combined with its visualization capabilities and integration with the Gene Ontology should make STEM useful in the analysis of data from a significant portion of all microarray studies. STEM is available for download for free to academic and non-profit users at .
Kidney is a major target for adverse effects associated with corticosteroids. A microarray dataset was generated to examine changes in gene expression in rat kidney in response to methylprednisolone. Four control and 48 drug-treated animals were killed at 16 times after drug administration. Kidney RNA was used to query 52 individual Affymetrix chips, generating data for 15,967 different probe sets for each chip. Mining techniques applicable to time series data that identify drug-regulated changes in gene expression were applied. Four sequential filters eliminated probe sets that were not expressed in the tissue, not regulated by drug, or did not meet defined quality control standards. These filters eliminated 14,890 probe sets (94%) from further consideration. Application of judiciously chosen filters is an effective tool for data mining of time series datasets. The remaining data can then be further analyzed by clustering and mathematical modeling. Initial analysis of this filtered dataset identified a group of genes whose pattern of regulation was highly correlated with prototype corticosteroid enhanced genes. Twenty genes in this group, as well as selected genes exhibiting either downregulation or no regulation, were analyzed for 5′ GRE half-sites conserved across species. In general, the results support the hypothesis that the existence of conserved DNA binding sites can serve as an important adjunct to purely analytic approaches to clustering genes into groups with common mechanisms of regulation. This dataset, as well as similar datasets on liver and muscle, are available online in a format amenable to further analysis by others.
data mining; gene arrays; glucocorticoids; pharmacogenomics; evolutionary conservation
Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.
We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.
Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.
A data set was generated to examine global changes in gene expression in rat liver over time in response to a single bolus dose of methylprednisolone. Four control animals and 43 drug-treated animals were humanely killed at 16 different time points following drug administration. Total RNA preparation from the livers of these animals were hybridized to 47 individual Affymetrix RU34A gene chips, generating data for 8799 different probe sets for each chip. Data mining techniques that are applicable to gene array time series data sets in order to identify drug-regulated changes in gene expression were applied to this data set. A series of 4 sequentially applied filters were developed that were designed to eliminate probe sets that were not expressed in the tissue, were not regulated by the drug treatment, or did not meet defined quality control standards. These filters eliminated 7287 probe sets of the 8799 total (82%) from further consideration. Application of judiciously chosen filters is an effective tool for data mining of time series data sets. The remaining data can then be further analyzed by clustering and mathematical modeling techniques.
Data mining; gene arrays; glucocorticoids; mathematical modeling; pharmacogenomics
A data set was generated to examine global changes in gene expression in rat liver over time in response to a single bolus dose of methylprednisolone. Four control animals and 43 drug-treated animals were humanely killed at 16 different time points following drug administration. Total RNA preparations from the livers of these animals were hybridized to 47 individual Affymetrix RU34A gene chips, generating data for 8799 different probe sets for each chip. Data mining techniques that are applicable to gene array time series data sets in order to identify drug-regulated changes in gene expression were applied to this data set. A series of 4 sequentially applied filters were developed that were designed to eliminate probe sets that were not expressed in the tissue, were not regulated by the drug treatment, or did not meet defined quality control standards. These filters eliminated 7287 probe sets of the 8799 total (82%) from further consideration. Application of judiciously chosen filters is an effective tool for data mining of time series data sets. The remaining data can then be further analyzed by clustering and mathematical modeling techniques.
Data mining; gene arrays; glucocorticoids; mathematical modeling; pharmacogenomics
Pharmacogenetics/pharmacogenomics is the study of how genetic variation affects pharmacology, the use of drugs to treat disease. When drug responses are predicted in advance, it is easier to tailor medications to different diseases and individuals. Pharmacogenetics provides the tools required to identify genetic predictors of probable drug response, drug efficacy, and drug-induced adverse events—identifications that would ideally precede treatment decisions. Drug abuse and addiction genetic data have advanced the field of pharmacogenetics in general. Although major findings have emerged, pharmacotherapy remains hindered by issues such as adverse events, time lag to drug efficacy, and heterogeneity of the disorders being treated. The sequencing of the human genome and high-throughput technologies are enabling pharmacogenetics to have greater influence on treatment approaches. This review highlights key studies and identifies important genes in drug abuse pharmacogenetics that provide a basis for better diagnosis and treatment of drug abuse disorders.
Pharmacogenomics; addiction; treatment; psychiatric disease; SNP
Biological systems are complex and often composed of many subtly interacting components. Furthermore, such systems evolve through time and, as the underlying biology executes its genetic program, the relationships between components change and undergo dynamic reorganization. Characterizing these relationships precisely is a challenging task, but one that must be undertaken if we are to understand these systems in sufficient detail. One set of tools that may prove useful are the formal principles of model building and checking, which could allow the biologist to frame these inherently temporal questions in a sufficiently rigorous framework. In response to these challenges, GOALIE (Gene ontology algorithmic logic and information extractor) was developed and has been successfully employed in the analysis of high throughput biological data (e.g. time-course gene-expression microarray data and neural spike train recordings). The method has applications to a wide variety of temporal data, indeed any data for which there exist ontological descriptions. This paper describes the algorithms behind GOALIE and its use in the study of the Intraerythrocytic Developmental Cycle (IDC) of Plasmodium falciparum, the parasite responsible for a deadly form of chloroquine resistant malaria. We focus in particular on the problem of finding phase changes, times of reorganization of transcriptional control.
Information theory; Microarray data; Model checking; Ontology; Redescription; Timecourse data
The completion of the human genome sequence has led to a rapid increase in genetic information. The invention of DNA microarrays, which allow for the parallel measurement of thousands of genes on the level of mRNA, has enabled scientists to take a more global view of biological systems. Protein microarrays have a big potential to increase the throughput of proteomic research. Microarrays of antibodies can simultaneously measure the concentration of a multitude of target proteins in a very short period of time. The ability of protein microarrays to increase the quantity of data points in small biological samples on the protein level will have a major impact on basic biological research as well as on the discovery of new drug targets and diagnostic markers. This review highlights the current status of protein expression profiling arrays, their development, applications and limitations.
The detection and analysis of steady-state gene expression has become routine. Time-series microarrays are of growing interest to systems biologists for deciphering the dynamic nature and complex regulation of biosystems. Most temporal microarray data only contain a limited number of time points, giving rise to short-time-series data, which imposes challenges for traditional methods of extracting meaningful information. To obtain useful information from the wealth of short-time series data requires addressing the problems that arise due to limited sampling. Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain. This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information. Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data.
In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.
We present a novel computational approach, Sample Progression Discovery (SPD), to discover biological progression underlying a microarray dataset. In contrast to the majority of microarray data analysis methods which identify differences between sample groups (normal vs. cancer, treated vs. control), SPD aims to identify an underlying progression among individual samples, both within and across sample groups. We validated SPD's ability to discover biological progression using datasets of cell cycle, B-cell differentiation, and mouse embryonic stem cell differentiation. We view SPD as a hypothesis generation tool when applied to datasets where the progression is unclear. For example, when applied to a microarray dataset of cancer samples, SPD assumes that the cancer samples collected from individual patients represent different stages during an intrinsic progression underlying cancer development. The inferred relationship among the samples may therefore indicate a trajectory or hierarchy of cancer progression, which serves as a hypothesis to be tested. SPD is not limited to microarray data analysis, and can be applied to a variety of high-dimensional datasets. We implemented SPD using MATLAB graphical user interface, which is available at http://icbp.stanford.edu/software/SPD/.
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis.
Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing.
LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
Bioinformatics; Software; Linux; Operating system
The mission of the Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org) is to collect, encode and disseminate knowledge about the impact of human genetic variations on drug responses. It is an important worldwide resource of clinical pharmacogenomic biomarkers available to all. The PharmGKB website has evolved to highlight our knowledge curation and aggregation over our previous emphasis on collecting primary data. This review summarizes the methods we use to drive this expanded scope of ‘Knowledge Acquisition to Clinical Applications’, the new features available on our website and our future goals.
Clinical Annotations; Clinical Interpretations; genomic variation; pharmacogenetics; pharmacogenomics; Pharmacogenomics Knowledge Base; PharmGKB; Variant Annotations
Expression profiling of whole genomes, and modern high-throughput proteomics, has created a revolution in the study of disease states. Approaches for gene expression analysis (time series analysis and clustering) have been applied to functional genomics related to cancer research, and have yielded major successes in the pursuit of gene expression signatures. However, these analysis methods are primarily designed to identify correlative or causal relationships between entities, but do not consider the data in the proper biological context of a “biological pathway” model. Pathway models form a cornerstone of systems biology. They provide a framework for (1) systematic interrogation of biochemical interactions, (2) management of the collective knowledge pertaining to cellular components, and (3) discovery of emergent properties of different pathway configurations.
CFD Research Corporation has developed advanced techniques to interpret microarray data in the context of known biological pathways. We have applied this integrative biological pathway-centered approach to the specific problem of identifying a genetic cause for individuals predisposed to mefloquine neurotoxicity. Mefloquine (Lariam) is highly effective against drug-resistant malaria. However, adverse neurological effects (ataxia, mood changes) have been observed in human sub-populations. Microarray experiments were used to quantify the transcriptional response of cells exposed to mefloquine. Canonical pathway models containing the differentially expressed genes were automatically retrieved from the KEGG database, using recently developed software. The canonical pathway models were automatically concatenated together to form the final pathway model. The resultant pathway model was interrogated using a novel signaling control flux (SCF) algorithm that combines Boolean pseudodynamics (BPD) to relax the cumbersome steady-state assumptions of SCF. The SCF-BPD algorithm was used to identify and prioritize pathways critical to adverse effects of mefloquine. Further analysis resulted in the identification of specific sub-cellular targets that may explain mefloquine neurotoxicity in human subpopulations on the basis of known single-nucleotide polymorphisms.
Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically.
We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework.
By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods.
Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing.
Time course experiments are aimed at characterizing the dynamic regulation of gene expression in biological systems. Data are collected at different time points to monitor the dynamic behaviour of gene expression. The NuGO PPS Mouse Study 1 investigates the development of high fat-induced insulin resistance (IR) over time in APOE*3Leiden (E3L) mice. The study consists in a series of analyses at time points, which are crucial in the development of central and peripheral IR. Affymetrix arrays have been made on critical organs. We present the results of the preliminary statistical analysis on these microarray data. We used a non-parametric approach to identify genes the expression of which changed over time, separately for three tissues: liver, muscle and white adipose tissue. We specified for each gene a basic ANOVA model, in order to check the null hypothesis that gene expression did not vary over time. We addressed the multiple tests problem calculating positive false discovery rate and q values for the F test statistics. The appropriateness of the hypothesis of homogeneous variances over time was investigated by mean of the Bartlett’s test for homoschedasticity. This is a relevant point because heteroschedasticity could be indicative of outlying behaviour of some individuals at specific time points. The necessity to use a moderated F test was evaluated. We found that a considerable part of the genes varied expression over time. For part of the genes, the variance of the response was not homogeneous over time. Response differed by tissue.
Microarray experiments; Time course; ANOVA
Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling.
In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method.
Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.
Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge.
We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models.
We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.
Systems biology; Network inference; Data integration; Statistics; Time-series expression data; Model uncertainty
The microarray technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining these data one can identify the dynamics of the gene expression time series. The detection of genes that are periodically expressed is an important step that allows us to study the regulatory mechanisms associated with the circadian cycle. The problem of finding periodicity in biological time series poses many challenges. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, outliers and unevenly sampled time points. Consequently, the method for finding periodicity should preferably be robust against such anomalies in the data. In this paper, we propose a general and robust procedure for identifying genes with a periodic signature at a given significance level. This identification method is based on autoregressive models and the information theory. By using simulated data we show that the suggested method is capable of identifying rhythmic profiles even in the presence of noise and when the number of data points is small. By recourse of our analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis.
One of the challenges in constructing biological models involves resolving meaningful data patterns from which the mathematical models will be generated. For models that describe the change of mRNA in response to drug administration, questions exist whether the correct genes have been selected given the myriad transcriptional effects that may occur. Oftentimes, different algorithms will select or cluster different groups of genes from the same data set. A new approach was developed that focuses on identifying the underlying global dynamics of the system instead of selecting individual genes. The procedure was applied to microarray genomic data obtained from rat liver after a large single dose of methylprednisolone in 52 adrenalectomized rats. Twelve clusters of at least 30 genes each were selected, reflecting the major changes over time. This method along with isolating the underlying dynamics of the system also extracts and clusters the genes that make up this global dynamic for further analysis as to the contributions of specific mechanisms affected by the drug.
Non-tumor cell based model systems have recently gained interest in pharmacogenetic research as a hypothesis generating tool. The hypotheses generated from these model systems can be followed up in functional studies, or tested in individuals taking the same investigational agents. The current cellular phenotypes (e.g. cytotoxicity) of interest in these studies are based on effects of an individual dosage of a drug on the cell lines, or a summary of results at many dosages of a drug (e.g. dose that inhibits 50% of cell growth, GI50). A more complete analysis of the impact of genetic variation on all aspects of the dose-response curve may lend additional insight into the pharmacogenomics of a particular drug. This paper illustrates the use of a Bayesian hierarchical nonlinear model for the analysis of pharmacogenomic data with cytotoxicity endpoints. The model is illustrated with cytotoxicity and expression data collected on cell lines from a pharmacogenomic study of the drug gemcitabine. By completing an analysis based on the entire dose-response curve, we were able to detect additional genes that affect not only the GI50, but also the slope of the curve, which reflects the therapeutic index of the drug. Simulation studies also demonstrate that in comparison to the analyses based on the commonly used summary measure GI50, investigation of the impact of genetic variation on all aspects of the cytotoxicity dose-response curve are more informative, and more powerful with respect to detecting the effect of gene expression on cytotoxicity.
cell lines; cytotoxicity; hierarchical nonlinear model; mRNA expression; pharmacogenomics
DNA microarrays have rapidly emerged as an important tool for Mycobacterium tuberculosis research. While the microarray approach has generated valuable information, a recent survey has found a lack of correlation among the microarray data produced by different laboratories on related issues, raising a concern about the credibility of research findings. The Affymetrix oligonucleotide array has been shown to be more reliable for interrogating changes in gene expression than other platforms. However, this type of array system has not been applied to the pharmacogenomic study of M. tuberculosis. The goal here was to explore the strength of the Affymetrix array system for monitoring drug-induced gene expression in M. tuberculosis, compare with other related studies, and conduct cross-platform analysis. The genome-wide gene expression profiles of M. tuberculosis in response to drug treatments including INH (isoniazid) and ethionamide were obtained using the Affymetrix array system. Up-regulated or down-regulated genes were identified through bioinformatic analysis of the microarray data derived from the hybridization of RNA samples and gene probes. Based on the Affymetrix system, our method identified all drug-induced genes reported in the original reference work as well as some other genes that have not been recognized previously under the same drug treatment. For instance, the Affymetrix system revealed that Rv2524c (fas) was induced by both INH and ethionamide under the given levels of concentration, as suggested by most of the probe sets implementing this gene sequence. This finding is contradictory to previous observations that the expression of fas is not changed by INH treatment. This example illustrates that the determination of expression change for certain genes is probe-dependent, and the appropriate use of multiple probe-set representation is an advantage with the Affymetrix system. Our data also suggest that whereas the up-regulated gene expression pattern reflects the drug’s mode of action, the down-regulated pattern is largely non-specific. According to our analysis, the Affymetrix array system is a reliable tool for studying the pharmacogenomics of M. tuberculosis and lends itself well in the research and development of anti-TB drugs.
Tuberculosis; Drug; Microarray; Genome
High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge.
The field of pharmacogenomics is focused on the characterization of genetic factors contributing to the response of patients to pharmacological interventions. Drug response and toxicity are complex traits; therefore the effects are likely influenced by multiple genes. The investigation of the genetic basis of drug response has evolved from a focus on single genes to relevant pathways to the entire genome. Preclinical (cell-based models) and clinical genome-wide association studies (GWAS) in oncology provide an unprecedented opportunity for a comprehensive and unbiased assessment of the heritable factors associated with drug response. The primary challenge with attempting to identify pharmacogenomic markers from clinical studies is that they require a homogenous population of patients treated with the same dosage regimen and minimal confounding variables. Therefore, the development of cell-based models for pharmacogenomic marker identification has utility for the field since performing these types of studies in humans is difficult and costly. The scope of this review is to provide a current report on the status of genomic studies in oncology, the methods for discovery and implications for patient care. We present a perspective and summary of the challenges and opportunities in translating heritable genomic discoveries to patients.