Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  A Systems Approach to Designing Effective Clinical Trials Using Simulations 
Circulation  2012;127(4):517-526.
Pharmacogenetics in warfarin clinical trials have failed to show a significant benefit compared to standard clinical therapy. This study demonstrates a computational framework to systematically evaluate pre-clinical trial design of target population, pharmacogenetic algorithms, and dosing protocols to optimize primary outcomes.
Methods and Results
We programmatically created an end-to-end framework that systematically evaluates warfarin clinical trial designs. The framework includes options to create a patient population, multiple dosing strategies including genetic-based and non-genetic clinical-based, multiple dose adjustment protocols, pharmacokinetic/pharmacodynamics (PK/PD) modeling and international normalization ratio (INR) prediction, as well as various types of outcome measures. We validated the framework by conducting 1,000 simulations of the CoumaGen clinical trial primary endpoints. The simulation predicted a mean time in therapeutic range (TTR) of 70.6% and 72.2% (P = 0.47) in the standard and pharmacogenetic arms, respectively. Then, we evaluated another dosing protocol under the same original conditions and found a significant difference in TTR between the pharmacogenetic and standard arm (78.8% vs. 73.8%; P = 0.0065), respectively.
We demonstrate that this simulation framework is useful in the pre-clinical assessment phase to study and evaluate design options and provide evidence to optimize the clinical trial for patient efficacy and reduced risk.
PMCID: PMC3747989  PMID: 23261867
bioinformatics; clinical trials; warfarin; modeling; simulations
2.  Early Detection of Poor Adherers to Statins: Applying Individualized Surveillance to Pay for Performance 
PLoS ONE  2013;8(11):e79611.
Medication nonadherence costs $300 billion annually in the US. Medicare Advantage plans have a financial incentive to increase medication adherence among members because the Centers for Medicare and Medicaid Services (CMS) now awards substantive bonus payments to such plans, based in part on population adherence to chronic medications. We sought to build an individualized surveillance model that detects early which beneficiaries will fall below the CMS adherence threshold.
This was a retrospective study of over 210,000 beneficiaries initiating statins, in a database of private insurance claims, from 2008-2011. A logistic regression model was constructed to use statin adherence from initiation to day 90 to predict beneficiaries who would not meet the CMS measure of proportion of days covered 0.8 or above, from day 91 to 365. The model controlled for 15 additional characteristics. In a sensitivity analysis, we varied the number of days of adherence data used for prediction.
Lower adherence in the first 90 days was the strongest predictor of one-year nonadherence, with an odds ratio of 25.0 (95% confidence interval 23.7-26.5) for poor adherence at one year. The model had an area under the receiver operating characteristic curve of 0.80. Sensitivity analysis revealed that predictions of comparable accuracy could be made only 40 days after statin initiation. When members with 30-day supplies for their first statin fill had predictions made at 40 days, and members with 90-day supplies for their first fill had predictions made at 100 days, poor adherence could be predicted with 86% positive predictive value.
To preserve their Medicare Star ratings, plan managers should identify or develop effective programs to improve adherence. An individualized surveillance approach can be used to target members who would most benefit, recognizing the tradeoff between improved model performance over time and the advantage of earlier detection.
PMCID: PMC3817130  PMID: 24223977
3.  Development of a Scalable Pharmacogenomic Clinical Decision Support Service 
Advances in sequencing technology are making genomic data more accessible within the healthcare environment. Published pharmacogenetic guidelines attempt to provide a clinical context for specific genomic variants; however, the actual implementation to convert genomic data into a clinical report integrated within an electronic medical record system is a major challenge for any hospital. We created a two-part solution that integrates with the medical record system and converts genetic variant results into an interpreted clinical report based on published guidelines. We successfully developed a scalable infrastructure to support TPMT genetic testing and are currently testing approximately two individuals per week in our production version. We plan to release an online variant to clinical interpretation reporting system in order to facilitate translation of pharmacogenetic information into clinical practice.
PMCID: PMC3814487  PMID: 24303299
5.  Immunoaffinity enrichment and liquid chromatography-selected reaction monitoring mass spectrometry for quantitation of carbonic anhydrase 12 in cultured renal carcinoma cells 
Analytical chemistry  2010;82(21):10.1021/ac101981t.
Liquid chromatography-selected reaction monitoring (LC-SRM) is a highly specific and sensitive mass spectrometry (MS) technique that is widely being applied to selectively qualify and validate candidate markers within complex biological samples. However, in order for LC-SRM methods to take on these attributes, target-specific optimization of sample processing is required, in order to reduce analyte complexity, prior to LC-SRM. In this study, we have developed a targeted platform consisting of protein immunoaffinity enrichment on magnetic beads and LC-SRM for measuring carbonic anhydrase 12 (CA12) protein in a renal cell carcinoma (RCC) cell line (PRC3), a candidate biomarker for RCC whose expression at the protein level has not been previously reported. Sample processing and LC-SRM assay were optimized for signature peptides selected as surrogate markers of CA12 protein. Using LC-SRM coupled with stable isotope dilution, we achieved limits of quantitation in the low fmol range sufficient for measuring clinically relevant biomarkers with good intra- and inter-assay accuracy and precision (≤17%). Our results show that using a quantitative immunoaffinity capture approach provides specific, accurate, and robust assays amenable to high-throughput verification of potential biomarkers.
PMCID: PMC3046293  PMID: 20936840
immunoaffinity enrichment; selected reaction monitoring; enhanced signature peptide predictor; carbonic anhydrase 12; renal cell carcinoma; biomarker; Protein G magnetic beads
6.  A Simulation Platform to Examine Heterogeneity Influence on Treatment 
Although a protocol aims to guide treatment management and optimize overall outcomes, the benefits and harms for each individual vary due to heterogeneity. Some protocols integrate clinical and genetic variation to provide treatment recommendation; it is not clear whether such integration is sufficient. If not, treatment outcomes may be sub-optimal for certain patient sub-populations. Unfortunately, running a clinical trial to examine such outcome responses is cost prohibitive and requires a significant amount of time to conduct the study. We propose a simulation approach to discover this knowledge from electronic medical records; a rapid method to reach this goal. We use the well-known drug warfarin as an example to examine whether patient characteristics, including race and the genes CYP2C9 and VKORC1, have been fully integrated into dosing protocols. The two genes mentioned above have been shown to be important in patient response to warfarin.
PMCID: PMC3392060  PMID: 22779042
7.  Biomedical Cloud Computing With Amazon Web Services 
PLoS Computational Biology  2011;7(8):e1002147.
PMCID: PMC3161908  PMID: 21901085
8.  Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup 
Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs.
Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon’s Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted.
We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon’s computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
PMCID: PMC3023304  PMID: 21258651
cloud computing; elastic computing cloud; Roundup; comparative genomics; high performance computing; Amazon; orthologs
9.  Genotator: A disease-agnostic tool for genetic annotation of disease 
BMC Medical Genomics  2010;3:50.
Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone.
We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at
Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease.
As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.
PMCID: PMC2990725  PMID: 21034472
10.  Cloud computing for comparative genomics 
BMC Bioinformatics  2010;11:259.
Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes.
We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD.
The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
PMCID: PMC3098063  PMID: 20482786
11.  Prediction of high-responding peptides for targeted protein assays by mass spectrometry 
Nature biotechnology  2009;27(2):190-198.
Protein biomarker discovery produces lengthy lists of candidates that must subsequently be verified in blood or other accessible biofluids. Use of targeted mass spectrometry (MS) to verify disease- or therapy-related changes in protein levels requires the selection of peptides that are quantifiable surrogates for proteins of interest. Peptides that produce the highest ion-current response (high-responding peptides) are likely to provide the best detection sensitivity. Identification of the most effective signature peptides, particularly in the absence of experimental data, remains a major resource constraint in developing targeted MS—based assays. Here we describe a computational method that uses protein physicochemical properties to select high-responding peptides and demonstrate its utility in identifying signature peptides in plasma, a complex proteome with a wide range of protein concentrations. Our method, which employs a Random Forest classifier, facilitates the development of targeted MS—based assays for biomarker verification or any application where protein levels need to be measured.
PMCID: PMC2753399  PMID: 19169245
12.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata 
Nucleic Acids Research  2007;36(Database issue):D866-D870.
Many Microbe Microarrays Database (M3D) is designed to facilitate the analysis and visualization of expression data in compendia compiled from multiple laboratories. M3D contains over a thousand Affymetrix microarrays for Escherichia coli, Saccharomyces cerevisiae and Shewanella oneidensis. The expression data is uniformly normalized to make the data generated by different laboratories and researchers more comparable. To facilitate computational analyses, M3D provides raw data (CEL file) and normalized data downloads of each compendium. In addition, web-based construction, visualization and download of custom datasets are provided to facilitate efficient interrogation of the compendium for more focused analyses. The experimental condition metadata in M3D is human curated with each chemical and growth attribute stored as a structured and computable set of experimental features with consistent naming conventions and units. All versions of the normalized compendia constructed for each species are maintained and accessible in perpetuity to facilitate the future interpretation and comparison of results published on M3D data. M3D is accessible at
PMCID: PMC2238822  PMID: 17932051
13.  Novel Approaches to Visualization and Data Mining Reveals Diagnostic Information in the Low Amplitude Region of Serum Mass Spectra from Ovarian Cancer Patients 
Disease Markers  2004;19(4-5):197-207.
The ability to identify patterns of diagnostic signatures in proteomic data generated by high throughput mass spectrometry (MS) based serum analysis has recently generated much excitement and interest from the scientific community. These data sets can be very large, with high-resolution MS instrumentation producing 1–2 million data points per sample. Approaches to analyze mass spectral data using unsupervised and supervised data mining operations would greatly benefit from tools that effectively allow for data reduction without losing important diagnostic information. In the past, investigators have proposed approaches where data reduction is performed by a priori “peak picking” and alignment/warping/smoothing components using rule-based signal-to-noise measurements. Unfortunately, while this type of system has been employed for gene microarray analysis, it is unclear whether it will be effective in the analysis of mass spectral data, which unlike microarray data, is comprised of continuous measurement operations. Moreover, it is unclear where true signal begins and noise ends. Therefore, we have developed an approach to MS data analysis using new types of data visualization and mining operations in which data reduction is accomplished by culling via the intensity of the peaks themselves instead of by location. Applying this new analysis method on a large study set of high resolution mass spectra from healthy and ovarian cancer patients, shows that all of the diagnostic information is contained within the very lowest amplitude regions of the mass spectra. This region can then be selected and studied to identify the exact location and amplitude of the diagnostic biomarkers.
PMCID: PMC3851062  PMID: 15258334
ovarian cancer; SELDI-TOF MS; data visualization; diagnosis
14.  Biomarker Amplification by Serum Carrier Protein Binding 
Disease Markers  2004;19(1):1-10.
Mass spectroscopic analysis of the low molecular mass (LMM) range of the serum/plasma proteome is a rapidly emerging frontier for biomarker discovery. This study examined the proportion of LMM biomarkers, which are bound to circulating carrier proteins. Mass spectroscopic analysis of human serum following molecular mass fractionation, demonstrated that the majority of LMM biomarkers exist bound to carrier proteins. Moreover, the pattern of LMM biomarkers bound specifically to albumin is distinct from those bound to non-albumin carriers. Prominent SELDI-TOF ionic species (m/z 6631.7043) identified to correlate with the presence of ovarian cancer were amplified by albumin capture. Several insights emerged: a) Accumulation of LMM biomarkers on circulating carrier proteins greatly amplifies the total serum/plasma concentration of the measurable biomarker, b) The total serum/plasma biomarker concentration is largely determined by the carrier protein clearance rate, not the unbound biomarker clearance rate itself, and c) Examination of the LMM species bound to a specific carrier protein may contain important diagnostic information. These findings shift the focus of biomarker detection to the carrier protein and its biomarker content.
PMCID: PMC3851653  PMID: 14757941

Results 1-14 (14)