Recent studies have demonstrated the use of genomic data, particularly gene expression signatures, as clinical prognostic factors in complex diseases. Such studies herald the future for genomic medicine and the opportunity for personalized prognosis in a variety of clinical contexts that utilize genomescale molecular information. Several key areas represent logical and critical next steps in the use of complex genomic profiling data towards the goal of personalized medicine. First, analyses should be geared toward the development of molecular profiles that predict future events – such as major clinical events or the response, resistance, or adverse reaction to therapy. Secondly, these must move into actual clinical practice by forming the basis for the next generation of clinical trials that will employ these methodologies to stratify patients. Lastly, there remain formidable challenges is in the translation of genomic technologies into clinical medicine that will need to be addressed: professional and public education, health outcomes research, reimbursement, regulatory oversight and privacy protection.
genomic medicine, personalized medicine, human genome.
The biopsy collection data from two lung cancer trials that required fresh tumor samples be obtained for microarray analysis were reviewed. In the trial for advanced disease, microarray data were obtained on 50 patient samples, giving an overall success rate of 60.2%. The majority of the specimens were obtained through CT-guided lung biopsies (N=30). In the trial for early-stage patients, 28 tissue specimens were collected from excess tumor after surgical resection with a success rate of 85.7%. This tissue procurement program documents the feasibility in obtaining fresh tumor specimens prospectively that could be used for molecular testing.
Lung cancer; Gene profiling; Bioinformatics
Improved ways to diagnose acute respiratory viral infections could decrease inappropriate antibacterial use and serve as a vital triage mechanism in the event of a potential viral pandemic. Measurement of the host response to infection is an alternative to pathogen-based diagnostic testing and may improve diagnostic accuracy. We have developed a host-based assay with a reverse transcription polymerase chain reaction (RT-PCR) TaqMan low-density array (TLDA) platform for classifying respiratory viral infection. We developed the assay using two cohorts experimentally infected with influenza A H3N2/Wisconsin or influenza A H1N1/Brisbane, and validated the assay in a sample of adults presenting to the emergency department with fever (n = 102) and in healthy volunteers (n = 41). Peripheral blood RNA samples were obtained from individuals who underwent experimental viral challenge or who presented to the emergency department and had microbiologically proven viral respiratory infection or systemic bacterial infection. The selected gene set on the RT-PCR TLDA assay classified participants with experimentally induced influenza H3N2 and H1N1 infection with 100 and 87% accuracy, respectively. We validated this host gene expression signature in a cohort of 102 individuals arriving at the emergency department. The sensitivity of the RT-PCR test was 89% [95% confidence interval (CI), 72 to 98%], and the specificity was 94% (95% CI, 86 to 99%). These results show that RT-PCR–based detection of a host gene expression signature can classify individuals with respiratory viral infection and sets the stage for prospective evaluation of this diagnostic approach in a clinical setting.
Sepsis, a leading cause of morbidity and mortality, is not a homogeneous disease but rather a syndrome encompassing many heterogeneous pathophysiologies. Patient factors including genetics predispose to poor outcomes, though current clinical characterizations fail to identify those at greatest risk of progression and mortality.
The Community Acquired Pneumonia and Sepsis Outcome Diagnostic study enrolled 1,152 subjects with suspected sepsis. We sequenced peripheral blood RNA of 129 representative subjects with systemic inflammatory response syndrome (SIRS) or sepsis (SIRS due to infection), including 78 sepsis survivors and 28 sepsis non-survivors who had previously undergone plasma proteomic and metabolomic profiling. Gene expression differences were identified between sepsis survivors, sepsis non-survivors, and SIRS followed by gene enrichment pathway analysis. Expressed sequence variants were identified followed by testing for association with sepsis outcomes.
The expression of 338 genes differed between subjects with SIRS and those with sepsis, primarily reflecting immune activation in sepsis. Expression of 1,238 genes differed with sepsis outcome: non-survivors had lower expression of many immune function-related genes. Functional genetic variants associated with sepsis mortality were sought based on a common disease-rare variant hypothesis. VPS9D1, whose expression was increased in sepsis survivors, had a higher burden of missense variants in sepsis survivors. The presence of variants was associated with altered expression of 3,799 genes, primarily reflecting Golgi and endosome biology.
The activation of immune response-related genes seen in sepsis survivors was muted in sepsis non-survivors. The association of sepsis survival with a robust immune response and the presence of missense variants in VPS9D1 warrants replication and further functional studies.
ClinicalTrials.gov NCT00258869. Registered on 23 November 2005.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-014-0111-5) contains supplementary material, which is available to authorized users.
To develop RNA profiles that could serve as novel biomarkers for the response to aspirin.
Aspirin reduces death and myocardial infarction (MI) suggesting that aspirin interacts with biological pathways that may underlie these events.
We administered aspirin, followed by whole blood RNA microarray profiling, in a discovery cohort of healthy volunteers (HV1,n=50), and two validation cohorts of volunteers (HV2,n=53) or outpatient cardiology patients (OPC, n=25). Platelet function was assessed by platelet function score (PFS; HV1/HV2) or VerifyNow Aspirin (OPC). Bayesian sparse factor analysis identified sets of coexpressed transcripts, which were examined for association with PFS in HV1 and validated in HV2 and OPC. Proteomic analysis confirmed the association of validated transcripts in platelet proteins. Validated gene sets were tested for association with death/MI in two patient cohorts (n=587, total) from RNA samples collected at cardiac catheterization.
A set of 60 co-expressed genes named the “aspirin response signature” (ARS) was associated with PFS in HV1 (r = −0.31, p = 0.03), HV2 (r = −0.34, Bonferroni p = 0.03), and OPC (p = 0.046). Corresponding proteins for 17 ARS genes were identified in the platelet proteome, of which, six were associated with PFS. The ARS was associated with death/MI in both patient cohorts (odds ratio = 1.2, p = 0.01 and hazard ratio = 1.5, p = 0.001), independent of cardiovascular risk factors. Compared with traditional risk factors, reclassification (net reclassification index = 31 - 37%, p ≤ 0.0002) was improved by including the ARS or one of its genes, ITGA2B.
RNA profiles of platelet-specific genes are novel biomarkers for identifying those do not response adequately to aspirin and who are at risk for death/MI.
aspirin; platelets; genes; myocardial infarction; biomarkers
The application of next-generation sequencing technology to gene expression quantification analysis, namely, RNA-Sequencing, has transformed the way in which gene expression studies are conducted and analyzed. These advances are of particular interest to researchers studying organisms with missing or incomplete genomes, as the need for knowledge of sequence information is overcome. De novo assembly methods have gained widespread acceptance in the RNA-Seq community for organisms with no true reference genome or transcriptome. While such methods have tremendous utility, computational cost is still a significant challenge for organisms with large and complex genomes.
In this manuscript, we present a comparison of four reference-based mapping methods for non-human primate data. We utilize TopHat2 and GSNAP for mapping to the human genome, and Bowtie2 and Stampy for mapping to the human genome and transcriptome for a total of six mapping approaches. For each of these methods, we explore mapping rates and locations, number of detected genes, correlations between computed expression values, and the utility of the resulting data for differential expression analysis.
We show that reference-based mapping methods indeed have utility in RNA-Seq analysis of mammalian data with no true reference, and the details of mapping methods should be carefully considered when doing so. Critical algorithm features include short seed sequences, the allowance of mismatches, and the allowance of gapped alignments in addition to splice junction gaps. Such features facilitate sensitive alignment of non-human primate RNA-Seq data to a human reference.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-570) contains supplementary material, which is available to authorized users.
RNA-Sequencing; Genomics; Mapping
In this paper, we describe a surface-enhanced Raman scattering (SERS)-based detection approach, referred to as “molecular sentinel” (MS) plasmonic nanoprobes, to detect an RNA target related to viral infection. The MS method is essentially a label-free technique incorporating the SERS effect modulation scheme associated with silver nanoparticles and Raman dye-labeled DNA hairpin probes. Hybridization with target sequences opens the hairpin and spatially separates the Raman label from the silver surface thus reducing the SERS signal of the label. Herein, we have developed a MS nanoprobe to detect the human radical S-adenosyl methionine domain containing 2 (RSAD2) RNA target as a model system for method demonstration. The human RSAD2 gene has recently emerged as a novel host-response biomarker for diagnosis of respiratory infections. Our results showed that the RSAD2 MS nanoprobes exhibits high specificity and can detect as low as 1 nM target sequences. With the use of a portable Raman spectrometer and total RNA samples, we have also demonstrated for the first time the potential of the MS nanoprobe technology for detection of host-response RNA biomarkers for infectious disease diagnostics.
Surface-enhanced Raman scattering; SERS; nanoprobe; infectious disease detection
A major promise of genomic research is information that can transform health care and public health through earlier diagnosis, more effective prevention and treatment of disease, and avoidance of drug side effects. Although there is interest in the early adoption of emerging genomic applications in cancer prevention and treatment, there are substantial evidence gaps that are further compounded by the difficulties of designing adequately powered studies to generate this evidence, thus limiting the uptake of these tools into clinical practice. Comparative effectiveness research (CER) is intended to generate evidence on the “real-world” effectiveness compared with existing standards of care so informed decisions can be made to improve health care. Capitalizing on funding opportunities from the American Recovery and Reinvestment Act of 2009, the National Cancer Institute funded seven research teams to conduct CER in genomic and precision medicine and sponsored a workshop on CER on May 30, 2012, in Bethesda, Maryland. This report highlights research findings from those research teams, challenges to conducting CER, the barriers to implementation in clinical practice, and research priorities and opportunities in CER in genomic and precision medicine. Workshop participants strongly emphasized the need for conducting CER for promising molecularly targeted therapies, developing and supporting an integrated clinical network for open-access resources, supporting bioinformatics and computer science research, providing training and education programs in CER, and conducting research in economic and decision modeling.
Statin adherence is often limited by side effects. The SLCO1B1*5 variant is a risk factor for statin side effects and exhibits statin-specific effects: highest with simvastatin/atorvastatin and lowest with pravastatin/rosuvastatin. The effects of SLCO1B1*5 genotype guided statin therapy (GGST) are unknown. Primary care patients (n = 58) who were nonadherent to statins and their providers received SLCO1B1*5 genotyping and guided recommendations via the electronic medical record (EMR). The primary outcome was the change in Beliefs about Medications Questionnaire, which measured patients’ perceived needs for statins and concerns about adverse effects, measured before and after SLCO1B1*5 results. Concurrent controls (n = 59) were identified through the EMR to compare secondary outcomes: new statin prescriptions, statin utilization, and change in LDL-cholesterol (LDL-c). GGST patients had trends (p = 0.2) towards improved statin necessity and concerns. The largest changes were the “need for statin to prevent sickness” (p < 0.001) and “concern for statin to disrupt life” (p = 0.006). GGST patients had more statin prescriptions (p < 0.001), higher statin use (p < 0.001), and greater decrease in LDL-c (p = 0.059) during follow-up. EMR delivery of SLCO1B1*5 results and recommendations is feasible in the primary care setting. This novel intervention may improve patients’ perceptions of statins and physician behaviors that promote higher statin adherence and lower LDL-c.
pharmacogenetics; personalized medicine; medication adherence; risk assessment; health behavior; hyperlipidemia
Type 2 diabetes (T2D) and coronary heart disease (CHD) are prevalent chronic diseases from which military personnel are not exempt. While many genetic markers for these diseases have been identified, the clinical utility of genetic risk testing for multifactorial diseases such as these has not been established. The need for a behavioral intervention such as health coaching following a risk counseling intervention for T2D or CHD also has not been explored. Here we present the rationale, design, and protocol for evaluating the clinical utility of genetic risk testing and health coaching for active duty US Air Force (AF) retirees and beneficiaries.
Primary Study Objectives:
Determine the direct and interactive effects of health coaching and providing genetic risk information when added to standard risk counseling for CHD and T2D on health behaviors and clinical risk markers.
Four-group (2 X 2 factorial) randomized controlled trial.
Two AF primary care clinical settings on the west coast of the United States.
Adult AF primary care patients.
All participants will have a risk counseling visit with a clinic provider to discuss personal risk factors for T2D and CHD. Half of the participants (two groups) will also learn of their genetic risk testing results for T2D and CHD in this risk counseling session. Participants randomized to the two groups receiving health coaching will then receive telephonic health coaching over 6 months.
Main Outcome Measures:
Behavioral measures (self-reported dietary intake, physical activity, smoking cessation, medication adherence); clinical outcomes (AF composite fitness scores, weight, waist circumference, blood pressure, fasting glucose, lipids, T2D/CHD risk scores) and psychosocial measures (self-efficacy, worry, perceived risk) will be collected at baseline and 6 weeks, and 3, 6, and 12 months.
This study tests novel strategies deployed within existing AF primary care to increase adherence to evidence-based diet, physical activity, smoking cessation, and medication recommendations for CHD and T2D risk reduction through methods of patient engagement and self-management support.
Health coaching; genomics; chronic disease; behavior change; diabetes; coronary heart disease
Background: Variable health literacy and genetic knowledge may pose significant challenges to engaging the general public in personal genomics, specifically with respect to promoting risk comprehension and healthy behaviors. Methods: We are conducting a multistage study of individual responses to genomic risk information for Type 2 diabetes mellitus. A total of 300 individuals were recruited from the general public in Durham, North Carolina: 60% self-identified as White; 70% female; and 65% have a college degree. As part of the baseline survey, we assessed genetic knowledge and attitudes toward genetic testing. Results: Scores of factual knowledge of genetics ranged from 50% to 100% (average=84%), with significant differences in relation to racial groups, the education level, and age. Scores were significantly higher on questions pertaining to the inheritance and causes of disease (mean score 90%) compared to scientific questions (mean score 77.4%). Scores on the knowledge survey were significantly higher than scores from European populations. Participants' perceived knowledge of the social consequences of genetic testing was significantly lower than their perceived knowledge of the medical uses of testing. More than half agreed with the statement that testing may affect a person's ability to obtain health insurance (51.3%) and 16% were worried about the consequences of testing for chances of finding a job. Conclusions: Despite the relatively high educational status and genetic knowledge of the study population, we find an imbalance of knowledge between scientific and medical concepts related to genetics as well as between the medical applications and societal consequences of testing, suggesting that more effort is needed to present the benefits, risks, and limitations of genetic testing, particularly, at the social and personal levels, to ensure informed decision making.
In 2012, the National Cancer Institute (NCI) engaged the scientific community to provide a vision for cancer epidemiology in the 21st century. Eight overarching thematic recommendations, with proposed corresponding actions for consideration by funding agencies, professional societies, and the research community emerged from the collective intellectual discourse. The themes are (i) extending the reach of epidemiology beyond discovery and etiologic research to include multilevel analysis, intervention evaluation, implementation, and outcomes research; (ii) transforming the practice of epidemiology by moving towards more access and sharing of protocols, data, metadata, and specimens to foster collaboration, to ensure reproducibility and replication, and accelerate translation; (iii) expanding cohort studies to collect exposure, clinical and other information across the life course and examining multiple health-related endpoints; (iv) developing and validating reliable methods and technologies to quantify exposures and outcomes on a massive scale, and to assess concomitantly the role of multiple factors in complex diseases; (v) integrating “big data” science into the practice of epidemiology; (vi) expanding knowledge integration to drive research, policy and practice; (vii) transforming training of 21st century epidemiologists to address interdisciplinary and translational research; and (viii) optimizing the use of resources and infrastructure for epidemiologic studies. These recommendations can transform cancer epidemiology and the field of epidemiology in general, by enhancing transparency, interdisciplinary collaboration, and strategic applications of new technologies. They should lay a strong scientific foundation for accelerated translation of scientific discoveries into individual and population health benefits.
big data; clinical trials; cohort studies; epidemiology; genomics; medicine; public health; technologies; training; translational research
Sepsis is a common cause of death, but outcomes in individual patients are difficult to predict. Elucidating the molecular processes that differ between sepsis patients who survive and those who die may permit more appropriate treatments to be deployed. We examined the clinical features, and the plasma metabolome and proteome of patients with and without community-acquired sepsis, upon their arrival at hospital emergency departments and 24 hours later. The metabolomes and proteomes of patients at hospital admittance who would die differed markedly from those who would survive. The different profiles of proteins and metabolites clustered into fatty acid transport and β-oxidation, gluconeogenesis and the citric acid cycle. They differed consistently among several sets of patients, and diverged more as death approached. In contrast, the metabolomes and proteomes of surviving patients with mild sepsis did not differ from survivors with severe sepsis or septic shock. An algorithm derived from clinical features together with measurements of seven metabolites predicted patient survival. This algorithm may help to guide the treatment of individual patients with sepsis.
Studies have shown that the quality of family health history (FHH) collection in primary care is inadequate to assess disease risk. To use FHH for risk assessment, collected data must have adequate detail. To address this issue, we developed a patient facing FHH assessment tool, MeTree. In this paper we report the content and quality of the FHH collected using MeTree.
Design: A hybrid implementation-effectiveness study. Patients were recruited from 2009 to 2012. Setting: Two community primary care clinics in Greensboro, NC. Participants: All non-adopted adult English speaking patients with upcoming appointments were invited to participate. Intervention: Education about and collection of FHH with entry into MeTree. Measures: We report the proportion of pedigrees that were high-quality. High-quality pedigrees are defined as having all the following criteria: (1) three generations of relatives, (2) relatives’ lineage, (3) relatives’ gender, (4) an up-to-date FHH, (5) pertinent negatives noted, (6) age of disease onset in affected relatives, and for deceased relatives, (7) the age and (8) cause of death (Prim Care31:479–495, 2004.).
Enrollment: 1,184. Participant demographics: age range 18-92 (mean 58.8, SD 11.79), 56% male, and 75% white. The median pedigree size was 21 (range 8-71) and the FHH entered into MeTree resulted in a database of 27,406 individuals. FHHs collected by MeTree were found to be high quality in 99.8% (N = 1,182/1,184) as compared to <4% at baseline. An average of 1.9 relatives per pedigree (range 0-50, SD 4.14) had no data reported. For pedigrees where at least one relative has no data (N = 497/1,184), 4.97 relatives per pedigree (range 1-50, SD 5.44) had no data. Talking with family members before using MeTree significantly decreased the proportion of relatives with no data reported (4.98% if you talked to your relative vs. 10.85% if you did not, p-value < 0.001.).
Using MeTree improves the quantity and quality of the FHH data that is collected and talking with relatives prior to the collection of FHH significantly improves the quantity and quality of the data provided. This allows more patients to be accurately risk stratified and offered appropriate preventive care guided by their risk level.
Family history; Data quality; Patient-centered
Despite stunning advances in our understanding of the genetics and the molecular basis for cancer, many patients with cancer are not yet receiving therapy tailored specifically to their tumor biology. The translation of these advances into clinical practice has been hindered, in part, by the lack of evidence for biomarkers supporting the personalized medicine approach. Most stakeholders agree that the translation of biomarkers into clinical care requires evidence of clinical utility. The highest level of evidence comes from randomized controlled clinical trials (RCTs). However, in many instances, there may be no RCTs that are feasible for assessing the clinical utility of potentially valuable genomic biomarkers. In the absence of RCTs, evidence generation will require well-designed cohort studies for comparative effectiveness research (CER) that link detailed clinical information to tumor biology and genomic data. CER also uses systematic reviews, evidence-quality appraisal, and health outcomes research to provide a methodologic framework for assessing biologic patient subgroups. Rapid learning health care (RLHC) is a model in which diverse data are made available, ideally in a robust and real-time fashion, potentially facilitating CER and personalized medicine. Nonetheless, to realize the full potential of personalized care using RLHC requires advances in CER and biostatistics methodology and the development of interoperable informatics systems, which has been recognized by the National Cancer Institute's program for CER and personalized medicine. The integration of CER methodology and genomics linked to RLHC should enhance, expedite, and expand the evidence generation required for fully realizing personalized cancer care.
We propose a mixture model for text data designed to capture underlying structure in the history of present illness section of electronic medical records data. Additionally, we propose a method to induce bias that leads to more homogeneous sets of diagnoses for patients in each cluster. We apply our model to a collection of electronic records from an emergency department and compare our results to three other relevant models in order to assess performance. Results using standard metrics demonstrate that patient clusters from our model are more homogeneous when compared to others, and qualitative analyses suggest that our approach leads to interpretable patient sub-populations when applied to real data. Finally, we demonstrate an example of our patient clustering model to identify adverse drug events.
Capturing the host response by using genomic technologies such as transcriptional profiling provides a new paradigm for classifying and diagnosing infectious disease and for potentially distinguishing infection from other causes of serious respiratory illness. This strategy has been used to define a blood-based RNA signature as a classifier for pandemic H1N1 influenza infection that is distinct from bacterial pneumonia and other inflammatory causes of respiratory disease. To realize the full potential of this approach as a diagnostic test will require additional independent validation of the results and studies to examine the specificity of this signature for viral versus bacterial infection or co-infection.
It is anticipated that as the range of drugs for which pharmacogenetic testing becomes available expands, primary care physicians (PCPs) will become major users of these tests. To assess their training, familiarity, and attitudes toward pharmacogenetic testing in order to identify barriers to uptake that may be addressed at this early stage of test use, we conducted a national survey of a sample of PCPs. Respondents were mostly white (79%), based primarily in community-based primary care (81%) and almost evenly divided between family medicine and internal medicine. The majority of respondents had heard of PGx testing and anticipated that these tests are or would soon become a valuable tool to inform drug response. However, only a minority of respondents (13%) indicated they felt comfortable ordering PGx tests and almost a quarter reported not having any education about pharmacogenetics.
Our results indicate that primary care practitioners envision a major role for themselves in the delivery of PGx testing but recognize their lack of adequate knowledge and experience about these tests. Development of effective tools for guiding PCPs in the use of PGx tests should be a high priority.
Family health history (FHH) is the single strongest predictor of disease risk and yet is significantly underutilized in primary care. We developed a patient facing FHH collection tool, MeTree©, that uses risk stratification to generate clinical decision support for breast cancer, colorectal cancer, ovarian cancer, hereditary cancer syndromes, and thrombosis. Here we present data on the experience of patients and providers after integration of MeTree© into 2 primary care practices.
This was a Type 2 hybrid controlled implementation-effectiveness study in 3 community-based primary care clinics in Greensboro, NC. All non-adopted adult English speaking patients with upcoming routine appointments were invited. Patients were recruited from December 2009 to the present and followed for one year. Ease of integration of MeTree© into clinical practice at the two intervention clinics was evaluated through patient surveys after their appointment and at 3 months post-visit, and physician surveys 3 months after tool integration.
Total enrollment =1,184. Average time to complete MeTree© = 27 minutes. Patients found MeTree©: easy to use (93%), easy to understand (97%), useful (98%), raised awareness of disease risk (85%), and changed how they think about their health (86%). Of the 26% (N = 311) asking for assistance to complete the tool, age (65 sd 9.4 vs. 57 sd 11.8, p-value < 0.00) and large pedigree size (24.4 sd 9.81 vs. 22.2 sd 8.30, p-value < 0.00) were the only significant factors; 77% of those requiring assistance were over the age of 60. Providers (N = 14) found MeTree©: improved their practice (86%), improved their understanding of FHH (64%), made practice easier (79%), and worthy of recommending to their peers (93%).
Our study shows that MeTree© has broad acceptance and support from both patients and providers and can be implemented without disruption to workflow.
Family health history; Cancer screening; Clinical decision support; Health services
Pharmacogenetic (PGx) testing is one of the primary drivers of personalized medicine. The use of PGx testing may provide a lifetime of benefits through tailoring drug dosing and selection of multiple medications to improve therapeutic outcomes and reduce adverse responses. We aimed to assess public interest and concerns regarding sharing and storage of PGx test results that would facilitate the re-use of PGx data across a lifetime of care.
We conducted a random-digit-dial phone survey of a sample of the U.S. public.
We achieved an overall response rate of 42% (n=1,139). Most respondents indicated they were extremely or somewhat comfortable allowing their PGx test results to be shared with other doctors involved in their care management (90% ± 2.18%); significantly fewer respondents (74% ± 3.27%) indicated they were extremely or somewhat comfortable sharing results with their pharmacist (p<0.0001).
Patients, pharmacists, and physicians will all be critical players in the pharmacotherapy process. Patients are supportive of sharing PGx test results with physicians and pharmacists as well as personally maintaining their test results. However, further study is needed to understand which options are needed for sharing, appropriate storage and patient education about the relevance of PGx test results to promote consideration of this information by other prescribing practitioners.
Venous thromboembolism may recur in up to 30% of patients with a spontaneous venous thromboembolism after a standard course of anticoagulation. Identification of patients at risk for recurrent venous thromboembolism would facilitate decisions concerning the duration of anticoagulant therapy.
In this exploratory study, we investigated whether whole blood gene expression data could distinguish subjects with single venous thromboembolism from subjects with recurrent venous thromboembolism.
40 adults with venous thromboembolism (23 with single event and 17 with recurrent events) on warfarin were recruited. Individuals with antiphospholipid syndrome or cancer were excluded. Plasma and serum samples were collected for biomarker testing, and PAXgene tubes were used to collect whole blood RNA samples.
D-dimer levels were significantly higher in patients with recurrent venous thromboembolism, but P-selectin and thrombin-antithrombin complex levels were similar in the two groups. Comparison of gene expression data from the two groups provided us with a 50 gene probe model that distinguished these two groups with good receiver operating curve characteristics (AUC 0.75). This model includes genes involved in mRNA splicing and platelet aggregation. Pathway analysis between subjects with single and recurrent venous thromboembolism revealed that the Akt pathway was up-regulated in the recurrent venous thromboembolism group compared to the single venous thromboembolism group.
In this exploratory study, gene expression profiles of whole blood appear to be a useful strategy to distinguish subjects with single venous thromboembolism from those with recurrent venous thromboembolism. Prospective studies with additional patients are needed to validate these results.
genomics; risk factors; deep vein thrombosis
There is often interest in predicting an individual’s latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed with healthy human volunteers, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection. Using a nonparametric cure rate model for the latent initiation times, we allow selection of the genes in the viral response pathway, variability among individuals in infection times, and a subset of individuals who are not infected. As we demonstrate using held-out data, this statistical framework allows accurate predictions of infected individuals in advance of the development of clinical symptoms, without labeled data and even when the number of biomarkers vastly exceeds the number of individuals under study. Biological interpretation of several of the inferred pathways (factors) is provided.
Bayesian nonparametrics; Dynamic factor analysis; High-dimensional; Infectious disease; Joint model; Multidimensional longitudinal data; Multivariate functional data; Predictive model
To develop an integrated metric of non COX-1 dependent platelet function (NCDPF) to measure the temporal response to aspirin in healthy volunteers and diabetics.
NCDPF on aspirin demonstrates wide variability, despite suppression of COX-1. Although a variety of NCDPF assays are available, no standard exists and their reproducibility is not established.
We administered 325mg/day aspirin to two cohorts of volunteers (HV1, n = 52, and HV2, n = 96) and diabetics (DM, n = 74) and measured NCDPF using epinephrine, collagen, and ADP aggregometry and PFA100 (collagen/epi) before (Pre), after one dose (Post), and after several weeks (Final). COX-1 activity was assessed with arachidonic acid aggregometry (AAA). The primary outcome of the study, the platelet function score (PFS), was derived from a principal components analysis of NCDPF measures.
The PFS strongly correlated with each measure of NCDPF in each cohort. After two or four weeks of daily aspirin the Final PFS strongly correlated (r > 0.7, p<0.0001) and was higher (p < 0.01) than the Post PFS. The magnitude and direction of the change in PFS (Final - Post) in an individual subject was moderately inversely proportional to the Post PFS in HV1 (r = −0.45), HV2 (r = −0.54), DM (r = −0.68), p<0.0001 for all. AAA remained suppressed during aspirin therapy.
The PFS summarizes multiple measures of NCDPF. Despite suppression of COX-1 activity, NCDPF during aspirin therapy is predictably dynamic: those with heightened NCDPF continue to decline whereas those with low/normal NCDPF return to pre-aspirin levels over time.
aspirin; platelets; light transmittance aggregometry; PFA100; principal components analysis
This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters.
Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here.
The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.