PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1555030)

Clipboard (0)
None

Related Articles

1.  Optimizing the Power of Genome-Wide Association Studies by Using Publicly Available Reference Samples to Expand the Control Group 
Genetic Epidemiology  2010;34(4):319-326.
Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large-scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest-effect genes by making genome-wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as “genetically matched controls” for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false-positive error rate in the presence of population structure. As a remedy, we make use of genome-wide data and model selection techniques to identify “axes” of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study. Genet. Epidemiol. 34: 319–326, 2010. © 2010 Wiley-Liss, Inc.
doi:10.1002/gepi.20482
PMCID: PMC2962805  PMID: 20088020
genome-wide association study; expanded control group; population structure; multidimensional scaling; model selection
2.  Publication of Clinical Trials Supporting Successful New Drug Applications: A Literature Analysis 
PLoS Medicine  2008;5(9):e191.
Background
The United States (US) Food and Drug Administration (FDA) approves new drugs based on sponsor-submitted clinical trials. The publication status of these trials in the medical literature and factors associated with publication have not been evaluated. We sought to determine the proportion of trials submitted to the FDA in support of newly approved drugs that are published in biomedical journals that a typical clinician, consumer, or policy maker living in the US would reasonably search.
Methods and Findings
We conducted a cohort study of trials supporting new drugs approved between 1998 and 2000, as described in FDA medical and statistical review documents and the FDA approved drug label. We determined publication status and time from approval to full publication in the medical literature at 2 and 5 y by searching PubMed and other databases through 01 August 2006. We then evaluated trial characteristics associated with publication. We identified 909 trials supporting 90 approved drugs in the FDA reviews, of which 43% (394/909) were published. Among the subset of trials described in the FDA-approved drug label and classified as “pivotal trials” for our analysis, 76% (257/340) were published. In multivariable logistic regression for all trials 5 y postapproval, likelihood of publication correlated with statistically significant results (odds ratio [OR] 3.03, 95% confidence interval [CI] 1.78–5.17); larger sample sizes (OR 1.33 per 2-fold increase in sample size, 95% CI 1.17–1.52); and pivotal status (OR 5.31, 95% CI 3.30–8.55). In multivariable logistic regression for only the pivotal trials 5 y postapproval, likelihood of publication correlated with statistically significant results (OR 2.96, 95% CI 1.24–7.06) and larger sample sizes (OR 1.47 per 2-fold increase in sample size, 95% CI 1.15–1.88). Statistically significant results and larger sample sizes were also predictive of publication at 2 y postapproval and in multivariable Cox proportional models for all trials and the subset of pivotal trials.
Conclusions
Over half of all supporting trials for FDA-approved drugs remained unpublished ≥ 5 y after approval. Pivotal trials and trials with statistically significant results and larger sample sizes are more likely to be published. Selective reporting of trial results exists for commonly marketed drugs. Our data provide a baseline for evaluating publication bias as the new FDA Amendments Act comes into force mandating basic results reporting of clinical trials.
Ida Sim and colleagues investigate the publication status and publication bias of trials submitted to the US Food and Drug Administration (FDA) for a wide variety of approved drugs.
Editors' Summary
Background.
Before a new drug becomes available for the treatment of a specific human disease, its benefits and harms are carefully studied, first in the laboratory and in animals, and then in several types of clinical trials. In the most important of these trials—so-called “pivotal” clinical trials—the efficacy and safety of the new drug and of a standard treatment are compared by giving groups of patients the different treatments and measuring several predefined “outcomes.” These outcomes indicate whether the new drug is more effective than the standard treatment and whether it has any other effects on the patients' health and daily life. All this information is then submitted by the sponsor of the new drug (usually a pharmaceutical company) to the government body responsible for drug approval—in the US, this is the Food and Drug Administration (FDA).
Why Was This Study Done?
After a drug receives FDA approval, information about the clinical trials supporting the FDA's decision are included in the FDA “Summary Basis of Approval” and/or on the drug label. In addition, some clinical trials are described in medical journals. Ideally, all the clinical information that leads to a drug's approval should be publicly available to help clinicians make informed decisions about how to treat their patients. A full-length publication in a medical journal is the primary way that clinical trial results are communicated to the scientific community and the public. Unfortunately, drug sponsors sometimes publish the results only of trials where their drug performed well; as a consequence, trials where the drug did no better than the standard treatment or where it had unwanted side effects remain unpublished. Publication bias like this provides an inaccurate picture of a drug's efficacy and safety relative to other therapies and may lead to excessive prescribing of newer, more expensive (but not necessarily more effective) treatments. In this study, the researchers investigate whether selective trial reporting is common by evaluating the publication status of trials submitted to the FDA for a wide variety of approved drugs. They also ask which factors affect a trial's chances of publication.
What Did the Researchers Do and Find?
The researchers identified 90 drugs approved by the FDA between 1998 and 2000 by searching the FDA's Center for Drug Evaluation and Research Web site. From the Summary Basis of Approval for each drug, they identified 909 clinical trials undertaken to support these approvals. They then searched the published medical literature up to mid-2006 to determine if and when the results of each trial were published. Although 76% of the pivotal trials had appeared in medical journals, usually within 3 years of FDA approval, only 43% of all of the submitted trials had been published. Among all the trials, those with statistically significant results were nearly twice as likely to have been published as those without statistically significant results, and pivotal trials were three times more likely to have been published as nonpivotal trials, 5 years postapproval. In addition, a larger sample size increased the likelihood of publication. Having statistically significant results and larger sample sizes also increased the likelihood of publication of the pivotal trials.
What Do These Findings Mean?
Although the search methods used in this study may have missed some publications, these findings suggest that more than half the clinical trials undertaken to support drug approval remain unpublished 5 years or more after FDA approval. They also reveal selective reporting of results. For example, they show that a pivotal trial in which the new drug does no better than an old drug is less likely to be published than one where the new drug is more effective, a publication bias that could establish an inappropriately favorable record for the new drug in the medical literature. Importantly, these findings provide a baseline for monitoring the effects of the FDA Amendments Act 2007, which was introduced to improve the accuracy and completeness of drug trial reporting. Under this Act, all trials supporting FDA-approved drugs must be registered when they start, and the summary results of all the outcomes declared at trial registration as well as specific details about the trial protocol must be publicly posted within a year of drug approval on the US National Institutes of Health clinical trials site.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050191.
PLoS Medicine recently published an editorial discussing the FDA Amendment Act and what it means for medical journals: The PLoS Medicine Editors (2008) Next Stop, Don't Block the Doors: Opening Up Access to Clinical Trials Results. PLoS Med 5(7): e160
The US Food and Drug Administration provides information about drug approval in the US for consumers and for health care professionals; detailed information about the process by which drugs are approved is on the Web site of the FDA Center for Drug Evaluation and Research (in English and Spanish)
ClinicalTrials.gov provides information about the US National Institutes of Health clinical trial registry, background information about clinical trials, and a fact sheet detailing the requirements of the FDA Amendments Act 2007 for trial registration
The World Health Organization's International Clinical Trials Registry Platform is working toward international norms and standards for reporting the findings of clinical trials
doi:10.1371/journal.pmed.0050191
PMCID: PMC2553819  PMID: 18816163
3.  Mutations in Complement Regulatory Proteins Predispose to Preeclampsia: A Genetic Analysis of the PROMISSE Cohort 
PLoS Medicine  2011;8(3):e1001013.
Jane Salmon and colleagues studied 250 pregnant patients with SLE and/or antiphospholipid antibodies and found an association of risk variants in complement regulatory proteins in patients who developed preeclampsia, as well as in preeclampsia patients lacking autoimmune disease.
Background
Pregnancy in women with systemic lupus erythematosus (SLE) or antiphospholipid antibodies (APL Ab)—autoimmune conditions characterized by complement-mediated injury—is associated with increased risk of preeclampsia and miscarriage. Our previous studies in mice indicate that complement activation targeted to the placenta drives angiogenic imbalance and placental insufficiency.
Methods and Findings
We use PROMISSE, a prospective study of 250 pregnant patients with SLE and/or APL Ab, to test the hypothesis in humans that impaired capacity to limit complement activation predisposes to preeclampsia. We sequenced genes encoding three complement regulatory proteins—membrane cofactor protein (MCP), complement factor I (CFI), and complement factor H (CFH)—in 40 patients who had preeclampsia and found heterozygous mutations in seven (18%). Five of these patients had risk variants in MCP or CFI that were previously identified in atypical hemolytic uremic syndrome, a disease characterized by endothelial damage. One had a novel mutation in MCP that impairs regulation of C4b. These findings constitute, to our knowledge, the first genetic defects associated with preeclampsia in SLE and/or APL Ab. We confirmed the association of hypomorphic variants of MCP and CFI in a cohort of non-autoimmune preeclampsia patients in which five of 59 were heterozygous for mutations.
Conclusion
The presence of risk variants in complement regulatory proteins in patients with SLE and/or APL Ab who develop preeclampsia, as well as in preeclampsia patients lacking autoimmune disease, links complement activation to disease pathogenesis and suggests new targets for treatment of this important public health problem.
Study Registration
ClinicalTrials.gov NCT00198068
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Most pregnancies culminate in the birth of a healthy baby but, sadly, about a quarter of women lose their babies during pregnancy. A common pregnancy-related medical problem that threatens the life of both baby and mother is preeclampsia. Mild and severe preeclampsia affects up to 10% and 1%–2% of pregnancies, respectively. Preeclampsia occurs because of a problem with the function of the placenta, the organ that transfers nutrients and oxygen from mother to baby and removes waste products from the baby. Although preeclampsia begins early in pregnancy, it is diagnosed by the onset of high blood pressure (hypertension) and the appearance of protein in the urine (proteinuria) after 20 weeks of pregnancy. Other warning signs include headaches and swelling of the hands and face. The only cure for preeclampsia is delivery, and labor is usually induced early to prevent eclampsia (seizures), stroke, liver and kidney failure, and breathing and blood vessel problems developing in the mother. Although delivery before 37 weeks of pregnancy is not generally recommended, in cases of preeclampsia it may be too dangerous for both the baby and the mother to allow the pregnancy to continue. Unfortunately when severe preeclampsia occurs in the second trimester, babies weighing only 500 grams may be delivered and they may not survive.
Why Was This Study Done?
Because the exact cause of preeclampsia is unknown, it is difficult to develop treatments for the condition or to find ways to prevent it. Many experts think that immune system problems—in particular, perturbations in complement activation—may be involved in preeclampsia. The complement system is a set of blood proteins that attacks invading bacteria and viruses. The activation of complement proteins is usually tightly regulated (overactivation of the complement system causes tissue damage) and, because preeclampsia may run in families, one hypothesis is that mutations (genetic changes) in complement regulatory proteins might predispose women to preeclampsia. In this study, the researchers test this hypothesis by sequencing genes encoding complement regulatory proteins in pregnant women with the autoimmune diseases systemic lupus erythematosus (SLE) and/or antiphospholipid antibodies (APL Ab) who developed preeclampsia. In autoimmune diseases, the immune system attacks healthy human cells instead of harmful invaders. Both SLE and APL Ab are characterized by complement-mediated tissue injury and are associated with an increased risk of preeclampsia and miscarriage.
What Did the Researchers Do and Find?
Two hundred fifty women with SLE and/or APL Ab were enrolled into the PROMISSE study (a multi-center observational study to identify predictors of pregnancy outcome in women with SLE and/or APL Ab) when they were 12 weeks pregnant and followed through pregnancy. Thirty patients developed preeclampsia during the study and ten more had had preeclampsia during a previous pregnancy. The researchers sequenced the genes for complement regulatory proteins: membrane cofactor protein (MCP), factor I, and factor H in these 40 patients. Seven women (18%) had mutations in one copy of one of these genes (there are two copies of most genes in human cells). Five mutations were alterations in MCP or factor I that are gene variants that increase the risk of hemolytic uremic syndrome, a disease characterized by blood vessel damage. The sixth mutation was a new MCP mutation that impaired MCP's ability to regulate complement component C4b. The final mutation was a factor H mutation that did not have any obvious functional effect. No mutations in complement regulatory proteins were found in 34 matched participants in PROMISSE without preeclampsia but, among a group of non-autoimmune women who developed preeclampsia during pregnancy, 10% had mutations in MCP or factor I.
What Do These Findings Mean?
These findings identify MCP and factor I mutations as genetic defects associated with preeclampsia in pregnant women with SLE and/or APL Ab. Importantly, they also reveal an association between similar mutations and preeclampsia in women without any underlying autoimmune disease. Taken together with evidence from previous animal experiments, these findings suggest that dysregulation of complement activation is involved in the development of preeclampsia. Although further studies are needed to confirm and extend these findings, these results suggest that proteins involved in the regulation of complement activation could be new targets for the treatment of preeclampsia and raise the possibility that tests could be developed to identify women at risk of developing preeclampsia.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001013.
Tommy's, a UK charity that funds scientific research into the causes and prevention of miscarriage, premature birth, and stillbirth, has information on preeclampsia
The March of Dimes Foundation, a nonprofit organization for pregnancy and baby health, has information on preeclampsia
The UK National Health Services Choices website also has information about preeclampsia
Wikipedia has pages on the complement system, on autoimmune disease, and on preeclampsia (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
More information on the PROMISSE study is available
doi:10.1371/journal.pmed.1001013
PMCID: PMC3062534  PMID: 21445332
4.  Community-Based Multidisciplinary Care for Patients With Stable Chronic Obstructive Pulmonary Disease (COPD) 
Executive Summary
In July 2010, the Medical Advisory Secretariat (MAS) began work on a Chronic Obstructive Pulmonary Disease (COPD) evidentiary framework, an evidence-based review of the literature surrounding treatment strategies for patients with COPD. This project emerged from a request by the Health System Strategy Division of the Ministry of Health and Long-Term Care that MAS provide them with an evidentiary platform on the effectiveness and cost-effectiveness of COPD interventions.
After an initial review of health technology assessments and systematic reviews of COPD literature, and consultation with experts, MAS identified the following topics for analysis: vaccinations (influenza and pneumococcal), smoking cessation, multidisciplinary care, pulmonary rehabilitation, long-term oxygen therapy, noninvasive positive pressure ventilation for acute and chronic respiratory failure, hospital-at-home for acute exacerbations of COPD, and telehealth (including telemonitoring and telephone support). Evidence-based analyses were prepared for each of these topics. For each technology, an economic analysis was also completed where appropriate. In addition, a review of the qualitative literature on patient, caregiver, and provider perspectives on living and dying with COPD was conducted, as were reviews of the qualitative literature on each of the technologies included in these analyses.
The Chronic Obstructive Pulmonary Disease Mega-Analysis series is made up of the following reports, which can be publicly accessed at the MAS website at: http://www.hqontario.ca/en/mas/mas_ohtas_mn.html.
Chronic Obstructive Pulmonary Disease (COPD) Evidentiary Framework
Influenza and Pneumococcal Vaccinations for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Smoking Cessation for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Community-Based Multidisciplinary Care for Patients With Stable Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Pulmonary Rehabilitation for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Long-term Oxygen Therapy for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Noninvasive Positive Pressure Ventilation for Acute Respiratory Failure Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Noninvasive Positive Pressure Ventilation for Chronic Respiratory Failure Patients With Stable Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Hospital-at-Home Programs for Patients With Acute Exacerbations of Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Home Telehealth for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Cost-Effectiveness of Interventions for Chronic Obstructive Pulmonary Disease Using an Ontario Policy Model
Experiences of Living and Dying With COPD: A Systematic Review and Synthesis of the Qualitative Empirical Literature
For more information on the qualitative review, please contact Mita Giacomini at: http://fhs.mcmaster.ca/ceb/faculty_member_giacomini.htm.
For more information on the economic analysis, please visit the PATH website: http://www.path-hta.ca/About-Us/Contact-Us.aspx.
The Toronto Health Economics and Technology Assessment (THETA) collaborative has produced an associated report on patient preference for mechanical ventilation. For more information, please visit the THETA website: http://theta.utoronto.ca/static/contact.
Objective
The objective of this evidence-based analysis was to determine the effectiveness and cost-effectiveness of multidisciplinary care (MDC) compared with usual care (UC, single health care provider) for the treatment of stable chronic obstructive pulmonary disease (COPD).
Clinical Need: Condition and Target Population
Chronic obstructive pulmonary disease is a progressive disorder with episodes of acute exacerbations associated with significant morbidity and mortality. Cigarette smoking is linked causally to COPD in more than 80% of cases. Chronic obstructive pulmonary disease is among the most common chronic diseases worldwide and has an enormous impact on individuals, families, and societies through reduced quality of life and increased health resource utilization and mortality.
The estimated prevalence of COPD in Ontario in 2007 was 708,743 persons.
Technology
Multidisciplinary care involves professionals from a range of disciplines, working together to deliver comprehensive care that addresses as many of the patient’s health care and psychosocial needs as possible.
Two variables are inherent in the concept of a multidisciplinary team: i) the multidisciplinary components such as an enriched knowledge base and a range of clinical skills and experiences, and ii) the team components, which include but are not limited to, communication and support measures. However, the most effective number of team members and which disciplines should comprise the team for optimal effect is not yet known.
Research Question
What is the effectiveness and cost-effectiveness of MDC compared with UC (single health care provider) for the treatment of stable COPD?
Research Methods
Literature Search
Search Strategy
A literature search was performed on July 19, 2010 using OVID MEDLINE, OVID MEDLINE In-Process and Other Non-Indexed Citations, OVID EMBASE, EBSCO Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Wiley Cochrane Library, and the Centre for Reviews and Dissemination database, for studies published from January 1, 1995 until July 2010. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. Reference lists were also examined for any additional relevant studies not identified through the search.
Inclusion Criteria
health technology assessments, systematic reviews, or randomized controlled trials
studies published between January 1995 and July 2010;
COPD study population
studies comparing MDC (2 or more health care disciplines participating in care) compared with UC (single health care provider)
Exclusion Criteria
grey literature
duplicate publications
non-English language publications
study population less than 18 years of age
Outcomes of Interest
hospital admissions
emergency department (ED) visits
mortality
health-related quality of life
lung function
Quality of Evidence
The quality of each included study was assessed, taking into consideration allocation concealment, randomization, blinding, power/sample size, withdrawals/dropouts, and intention-to-treat analyses.
The quality of the body of evidence was assessed as high, moderate, low, or very low according to the GRADE Working Group criteria. The following definitions of quality were used in grading the quality of the evidence:
Summary of Findings
Six randomized controlled trials were obtained from the literature search. Four of the 6 studies were completed in the United States. The sample size of the 6 studies ranged from 40 to 743 participants, with a mean study sample between 66 and 71 years of age. Only 2 studies characterized the study sample in terms of the Global Initiative for Chronic Obstructive Lung Disease (GOLD) COPD stage criteria, and in general the description of the study population in the other 4 studies was limited. The mean percent predicted forced expiratory volume in 1 second (% predicted FEV1) among study populations was between 32% and 59%. Using this criterion, 3 studies included persons with severe COPD and 2 with moderate COPD. Information was not available to classify the population in the sixth study.
Four studies had MDC treatment groups which included a physician. All studies except 1 reported a respiratory specialist (i.e., respiratory therapist, specialist nurse, or physician) as part of the multidisciplinary team. The UC group was comprised of a single health care practitioner who may or may not have been a respiratory specialist.
A meta-analysis was completed for 5 of the 7 outcome measures of interest including:
health-related quality of life,
lung function,
all-cause hospitalization,
COPD-specific hospitalization, and
mortality.
There was only 1 study contributing to the outcome of all-cause and COPD-specific ED visits which precluded pooling data for these outcomes. Subgroup analyses were not completed either because heterogeneity was not significant or there were a small number of studies that were meta-analysed for the outcome.
Quality of Life
Three studies reported results of quality of life assessment based on the St. George’s Respiratory Questionnaire (SGRQ). A mean decrease in the SGRQ indicates an improvement in quality of life while a mean increase indicates deterioration in quality of life. In all studies the mean change score from baseline to the end time point in the MDC treatment group showed either an improvement compared with the control group or less deterioration compared with the control group. The mean difference in change scores between MDC and UC groups was statistically significant in all 3 studies. The pooled weighted mean difference in total SGRQ score was −4.05 (95% confidence interval [CI], −6.47 to 1.63; P = 0.001). The GRADE quality of evidence was assessed as low for this outcome.
Lung Function
Two studies reported results of the FEV1 % predicted as a measure of lung function. A negative change from baseline infers deterioration in lung function and a positive change from baseline infers an improvement in lung function. The MDC group showed a statistically significant improvement in lung function up to 12 months compared with the UC group (P = 0.01). However this effect is not maintained at 2-year follow-up (P = 0.24). The pooled weighted mean difference in FEV1 percent predicted was 2.78 (95% CI, −1.82 to −7.37). The GRADE quality of evidence was assessed as very low for this outcome indicating that an estimate of effect is uncertain.
Hospital Admissions
All-Cause
Four studies reported results of all-cause hospital admissions in terms of number of persons with at least 1 admission during the follow-up period. Estimates from these 4 studies were pooled to determine a summary estimate. There is a statistically significant 25% relative risk (RR) reduction in all-cause hospitalizations in the MDC group compared with the UC group (P < 0.001). The index of heterogeneity (I2) value is 0%, indicating no statistical heterogeneity between studies. The GRADE quality of evidence was assessed as moderate for this outcome, indicating that further research may change the estimate of effect.
COPD-Specific Hospitalization
Three studies reported results of COPD-specific hospital admissions in terms of number of persons with at least 1 admission during the follow-up period. Estimates from these 3 studies were pooled to determine a summary estimate. There is a statistically significant 33% RR reduction in all-cause hospitalizations in the MDC group compared with the UC group (P = 0.002). The I2 value is 0%, indicating no statistical heterogeneity between studies. The GRADE quality of evidence was assessed as moderate for this outcome, indicating that further research may change the estimate of effect.
Emergency Department Visits
All-Cause
Two studies reported results of all-cause ED visits in terms of number of persons with at least 1 visit during the follow-up period. There is a statistically nonsignificant reduction in all-cause ED visits when data from these 2 studies are pooled (RR, 0.64; 95% CI, 0.31 to −1.33; P = 0.24). The GRADE quality of evidence was assessed as very low for this outcome indicating that an estimate of effect is uncertain.
COPD-Specific
One study reported results of COPD-specific ED visits in terms of number of persons with at least 1 visit during the follow-up period. There is a statistically significant 41% reduction in COPD-specific ED visits when the data from these 2 studies are pooled (RR, 0.59; 95% CI, 0.43−0.81; P < 0.001). The GRADE quality of evidence was assessed as moderate for this outcome.
Mortality
Three studies reported the mortality during the study follow-up period. Estimates from these 3 studies were pooled to determine a summary estimate. There is a statistically nonsignificant reduction in mortality between treatment groups (RR, 0.81; 95% CI, 0.52−1.27; P = 0.36). The I2 value is 19%, indicating low statistical heterogeneity between studies. All studies had a 12-month follow-up period. The GRADE quality of evidence was assessed as low for this outcome.
Conclusions
Significant effect estimates with moderate quality of evidence were found for all-cause hospitalization, COPD-specific hospitalization, and COPD-specific ED visits (Table ES1). A significant estimate with low quality evidence was found for the outcome of quality of life (Table ES2). All other outcome measures were nonsignificant and supported by low or very low quality of evidence.
Summary of Dichotomous Data
Abbreviations: CI, confidence intervals; COPD, chronic obstructive pulmonary disease; n, number.
Summary of Continuous Data
Abbreviations: CI, confidence intervals; FEV1, forced expiratory volume in 1 second; n, number; SGRQ, St. George’s Respiratory Questionnaire.
PMCID: PMC3384374  PMID: 23074433
5.  The kSORT Assay to Detect Renal Transplant Patients at High Risk for Acute Rejection: Results of the Multicenter AART Study 
PLoS Medicine  2014;11(11):e1001759.
Minnie Sarwal and colleagues developed a gene expression assay using peripheral blood samples to detect patients with renal transplant at high risk for acute rejection.
Please see later in the article for the Editors' Summary
Background
Development of noninvasive molecular assays to improve disease diagnosis and patient monitoring is a critical need. In renal transplantation, acute rejection (AR) increases the risk for chronic graft injury and failure. Noninvasive diagnostic assays to improve current late and nonspecific diagnosis of rejection are needed. We sought to develop a test using a simple blood gene expression assay to detect patients at high risk for AR.
Methods and Findings
We developed a novel correlation-based algorithm by step-wise analysis of gene expression data in 558 blood samples from 436 renal transplant patients collected across eight transplant centers in the US, Mexico, and Spain between 5 February 2005 and 15 December 2012 in the Assessment of Acute Rejection in Renal Transplantation (AART) study. Gene expression was assessed by quantitative real-time PCR (QPCR) in one center. A 17-gene set—the Kidney Solid Organ Response Test (kSORT)—was selected in 143 samples for AR classification using discriminant analysis (area under the receiver operating characteristic curve [AUC] = 0.94; 95% CI 0.91–0.98), validated in 124 independent samples (AUC = 0.95; 95% CI 0.88–1.0) and evaluated for AR prediction in 191 serial samples, where it predicted AR up to 3 mo prior to detection by the current gold standard (biopsy). A novel reference-based algorithm (using 13 12-gene models) was developed in 100 independent samples to provide a numerical AR risk score, to classify patients as high risk versus low risk for AR. kSORT was able to detect AR in blood independent of age, time post-transplantation, and sample source without additional data normalization; AUC = 0.93 (95% CI 0.86–0.99). Further validation of kSORT is planned in prospective clinical observational and interventional trials.
Conclusions
The kSORT blood QPCR assay is a noninvasive tool to detect high risk of AR of renal transplants.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Throughout life, the kidneys filter waste products (from the normal breakdown of tissues and food) and excess water from the blood to make urine. If the kidneys stop working for any reason, the rate at which the blood is filtered decreases, and dangerous amounts of creatinine and other waste products build up in the blood. The kidneys can fail suddenly (acute kidney failure) because of injury or poisoning, but usually failing kidneys stop working gradually over many years (chronic kidney disease). Chronic kidney disease is very common, especially in people who have high blood pressure or diabetes and in elderly people. In the UK, for example, about 20% of people aged 65–74 years have some degree of chronic kidney disease. People whose kidneys fail completely (end-stage kidney disease) need regular dialysis (hemodialysis, in which blood is filtered by an external machine, or peritoneal dialysis, which uses blood vessels in the abdominal lining to do the work of the kidneys) or a renal transplant (the surgical transfer of a healthy kidney from another person into the patient's body) to keep them alive.
Why Was This Study Done?
Our immune system protects us from pathogens (disease-causing organisms) by recognizing specific molecules (antigens) on the invader's surface as foreign and initiating a sequence of events that kills the invader. Unfortunately, the immune system sometimes recognizes kidney transplants as foreign and triggers transplant rejection. The chances of rejection can be minimized by “matching” the antigens on the donated kidney to those on the tissues of the kidney recipient and by giving the recipient immunosuppressive drugs. However, acute rejection (rejection during the first year after transplantation) affects about 20% of kidney transplants. Acute rejection needs to be detected quickly and treated with a short course of more powerful immunosuppressants because it increases the risk of transplant failure. The current “gold standard” method for detecting acute rejection if the level of creatinine in the patient's blood begins to rise is to surgically remove a small piece (biopsy) of the transplanted kidney for analysis. However, other conditions can change creatinine levels, acute rejection can occur without creatinine levels changing (subclinical acute rejection), and biopsies are invasive. Here, the researchers develop a noninvasive test for acute kidney rejection called the Kidney Solid Organ Response Test (kSORT) based on gene expression levels in the blood.
What Did the Researchers Do and Find?
For the Assessment of Acute Rejection in Renal Transplantation (AART) study, the researchers used an assay called quantitative polymerase chain reaction (QPCR) to measure the expression of 43 genes whose expression levels change during acute kidney rejection in blood samples collected from patients who had had a kidney transplant. Using a training set of 143 samples and statistical analyses, the researchers identified a 17-gene set (kSORT) that discriminated between patients with and without acute rejection detected by kidney biopsy. The 17-gene set correctly identified 39 of the samples taken from 47 patients with acute rejection as being from patients with acute rejection, and 87 of 96 samples from patients without acute rejection as being from patients without acute rejection. The researchers validated the gene set using 124 independent samples. Then, using 191 serial samples, they showed that the gene set was able to predict acute rejection up to three months before detection by biopsy. Finally, the researchers used 100 blood samples to develop an algorithm (a step-wise calculation) to classify patients as being at high or low risk of acute rejection.
What Do These Findings Mean?
These findings describe the early development of a noninvasive tool (kSORT) that might, eventually, help clinicians identify patients at risk of acute rejection after kidney transplantation. kSORT needs to be tested in more patients before being used clinically, however, to validate its predictive ability, particularly given that the current gold standard test against which it was compared (biopsy) is far from perfect. An additional limitation of kSORT is that it did not discriminate between cell-mediated and antibody-mediated immune rejection. These two types of immune rejection are treated in different ways, so clinicians ideally need a test for acute rejection that indicates which form of immune rejection is involved. The authors are conducting a follow-up study to help determine whether kSORT can be used in clinical practice to identify acute rejection and to identify which patients are at greatest risk of transplant rejection and may require biopsy.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001759.
The US National Kidney and Urologic Diseases Information Clearinghouse provides links to information about all aspects of kidney disease; the US National Kidney Disease Education Program provides resources to help improve the understanding, detection, and management of kidney disease (in English and Spanish)
The UK National Health Service Choices website provides information for patients on chronic kidney disease and about kidney transplants, including some personal stories
The US National Kidney Foundation, a not-for-profit organization, provides information about chronic kidney disease and about kidney transplantation (in English and Spanish)
The not-for-profit UK National Kidney Federation provides support and information for patients with kidney disease and for their carers, including information and personal stories about kidney donation and transplantation
World Kidney Day, a joint initiative between the International Society of Nephrology and the International Federation of Kidney Foundations, aims to raise awareness about kidneys and kidney disease
MedlinePlus provides links to additional resources about kidney diseases, kidney failure, and kidney transplantation; the MedlinePlus encyclopedia has a page about transplant rejection
doi:10.1371/journal.pmed.1001759
PMCID: PMC4227654  PMID: 25386950
6.  A scalable, knowledge-based analysis framework for genetic association studies 
BMC Bioinformatics  2013;14:312.
Background
Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available.
Results
By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma.
Conclusions
We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.
doi:10.1186/1471-2105-14-312
PMCID: PMC4015032  PMID: 24152222
7.  Power and Predictive Accuracy of Polygenic Risk Scores 
PLoS Genetics  2013;9(3):e1003348.
Polygenic scores have recently been used to summarise genetic effects among an ensemble of markers that do not individually achieve significance in a large-scale association study. Markers are selected using an initial training sample and used to construct a score in an independent replication sample by forming the weighted sum of associated alleles within each subject. Association between a trait and this composite score implies that a genetic signal is present among the selected markers, and the score can then be used for prediction of individual trait values. This approach has been used to obtain evidence of a genetic effect when no single markers are significant, to establish a common genetic basis for related disorders, and to construct risk prediction models. In some cases, however, the desired association or prediction has not been achieved. Here, the power and predictive accuracy of a polygenic score are derived from a quantitative genetics model as a function of the sizes of the two samples, explained genetic variance, selection thresholds for including a marker in the score, and methods for weighting effect sizes in the score. Expressions are derived for quantitative and discrete traits, the latter allowing for case/control sampling. A novel approach to estimating the variance explained by a marker panel is also proposed. It is shown that published studies with significant association of polygenic scores have been well powered, whereas those with negative results can be explained by low sample size. It is also shown that useful levels of prediction may only be approached when predictors are estimated from very large samples, up to an order of magnitude greater than currently available. Therefore, polygenic scores currently have more utility for association testing than predicting complex traits, but prediction will become more feasible as sample sizes continue to grow.
Author Summary
Recently there has been much interest in combining multiple genetic markers into a single score for predicting disease risk. Even if many of the individual markers have no detected effect, the combined score could be a strong predictor of disease. This has allowed researchers to demonstrate that some diseases have a strong genetic basis, even if few actual genes have been identified, and it has also revealed a common genetic basis for distinct diseases. These analyses have so far been performed opportunistically, with mixed results. Here I derive formulae based on the heritability of disease and size of the study, allowing researchers to plan their analyses from a more informed position. I show that discouraging results in some previous studies were due to the low number of subjects studied, but a modest increase in study size would allow more successful analysis. However, I also show that, for genetics to become useful for predicting individual risk of disease, hundreds of thousands of subjects may be needed to estimate the gene effects. This is larger than most existing studies, but will become more common in the near future, so that gene scores will become more useful for predicting disease than has appeared to date.
doi:10.1371/journal.pgen.1003348
PMCID: PMC3605113  PMID: 23555274
8.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip 
PLoS Genetics  2009;5(5):e1000477.
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Author Summary
Genome-wide association studies are a powerful and now widely-used method for finding genetic variants that increase the risk of developing particular diseases. These studies are complex and must be planned carefully in order to maximize the probability of finding novel associations. The main design choices to be made relate to sample sizes and choice of commercially available genotyping chip and are often constrained by cost, which can currently be as much as several million dollars. No comprehensive comparisons of chips based on their power for different sample sizes or for fixed study cost are currently available. We describe in detail a method for simulating large genome-wide association samples that accounts for the complex correlations between SNPs due to LD, and we used this method to assess the power of current genotyping chips. Our results highlight the differences between the chips under a range of plausible scenarios, and we demonstrate how our results can be used to design a study with a budget constraint. We also show how genotype imputation can be used to boost the power of each chip and that this method decreases the differences between the chips. Our simulation method and software for comparing power are being made available so that future association studies can be designed in a principled fashion.
doi:10.1371/journal.pgen.1000477
PMCID: PMC2688469  PMID: 19492015
9.  Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies 
PLoS Genetics  2012;8(11):e1003032.
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.
Author Summary
This work describes a new methodology for analyzing genome-wide case-control association studies of diseases with strong correlations to clinical covariates, such as age in prostate cancer and body mass index in type 2 diabetes. Currently, researchers either ignore these clinical covariates or apply approaches that ignore the disease's prevalence and the study's ascertainment strategy. We take an alternative approach, leveraging external prevalence information from the epidemiological literature and constructing a statistic based on the classic liability threshold model of disease. Our approach not only improves the power of studies that ascertain individuals randomly or based on the disease phenotype, but also improves the power of studies that ascertain individuals based on both the disease phenotype and clinical covariates. We apply our statistic to seven datasets over six different diseases and a variety of clinical covariates. We found that there was a substantial improvement in test statistics relative to current approaches at known associated variants. This suggests that novel loci may be identified by applying our method to existing and future association studies of these diseases.
doi:10.1371/journal.pgen.1003032
PMCID: PMC3493452  PMID: 23144628
10.  Screening and Replication using the Same Data Set: Testing Strategies for Family-Based Studies in which All Probands Are Affected 
PLoS Genetics  2008;4(9):e1000197.
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.
Author Summary
The current state of genotyping technology has enabled researchers to conduct genome-wide association studies of up to 1,000,000 SNPs, allowing for systematic scanning of the genome for variants that might influence the development and progression of complex diseases. One of the largest obstacles to the successful detection of such variants is the multiple comparisons/testing problem in the genetic association analysis. For family-based designs in which all offspring are affected with the disease/trait under study, we developed a methodology that addresses this problem by partitioning the family-based data into two statistically independent components. The first component is used to screen the data and determine the most promising SNPs. The second component is used to test the SNPs for association, where information from the screening is used to weight the SNPs during testing. This methodology is more powerful than standard procedures for multiple comparisons adjustment (i.e., Bonferroni correction). Additionally, as only one data set is required for screening and testing, our testing strategy is less susceptible to study heterogeneity. Finally, as many family-based studies collect data only from affected offspring, this method addresses a major limitation of previous methodologies for multiple comparisons in family-based designs, which require variation in the disease/trait among offspring.
doi:10.1371/journal.pgen.1000197
PMCID: PMC2529406  PMID: 18802462
11.  A Population-Based Evaluation of a Publicly Funded, School-Based HPV Vaccine Program in British Columbia, Canada: Parental Factors Associated with HPV Vaccine Receipt 
PLoS Medicine  2010;7(5):e1000270.
Analysis of a telephone survey by Gina Ogilvie and colleagues identifies the parental factors associated with HPV vaccine uptake in a school-based program in Canada.
Background
Information on factors that influence parental decisions for actual human papillomavirus (HPV) vaccine receipt in publicly funded, school-based HPV vaccine programs for girls is limited. We report on the level of uptake of the first dose of the HPV vaccine, and determine parental factors associated with receipt of the HPV vaccine, in a publicly funded school-based HPV vaccine program in British Columbia, Canada.
Methods and Findings
All parents of girls enrolled in grade 6 during the academic year of September 2008–June 2009 in the province of British Columbia were eligible to participate. Eligible households identified through the provincial public health information system were randomly selected and those who consented completed a validated survey exploring factors associated with HPV vaccine uptake. Bivariate and multivariate analyses were conducted to calculate adjusted odds ratios to identify the factors that were associated with parents' decision to vaccinate their daughter(s) against HPV. 2,025 parents agreed to complete the survey, and 65.1% (95% confidence interval [CI] 63.1–67.1) of parents in the survey reported that their daughters received the first dose of the HPV vaccine. In the same school-based vaccine program, 88.4% (95% CI 87.1–89.7) consented to the hepatitis B vaccine, and 86.5% (95% CI 85.1–87.9) consented to the meningococcal C vaccine. The main reasons for having a daughter receive the HPV vaccine were the effectiveness of the vaccine (47.9%), advice from a physician (8.7%), and concerns about daughter's health (8.4%). The main reasons for not having a daughter receive the HPV vaccine were concerns about HPV vaccine safety (29.2%), preference to wait until the daughter is older (15.6%), and not enough information to make an informed decision (12.6%). In multivariate analysis, overall attitudes to vaccines, the impact of the HPV vaccine on sexual practices, and childhood vaccine history were predictive of parents having a daughter receive the HPV vaccine in a publicly funded school-based HPV vaccine program. By contrast, having a family with two parents, having three or more children, and having more education was associated with a decreased likelihood of having a daughter receive the HPV vaccine.
Conclusions
This study is, to our knowledge, one of the first population-based assessments of factors associated with HPV vaccine uptake in a publicly funded school-based program worldwide. Policy makers need to consider that even with the removal of financial and health care barriers, parents, who are key decision makers in the uptake of this vaccine, are still hesitant to have their daughters receive the HPV vaccine, and strategies to ensure optimal HPV vaccine uptake need to be employed.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
About 10% of cancers in women occur in the cervix, the structure that connects the womb to the vagina. Every year, globally, more than a quarter of a million women die because of cervical cancer, which only occurs after the cervix has been infected with a human papillomavirus (HPV) through sexual intercourse. There are many types of HPV, a virus that infects the skin and the mucosa (the moist membranes that line various parts of the body, including the cervix). Although most people become infected with HPV at some time in their life, most never know they are infected. However, some HPV types cause harmless warts on the skin or around the genital area and several—in particular, HPV 16 and HPV 18, so-called high-risk HPVs—can cause cervical cancer. HPV infections are usually cleared by the immune system, but about 10% of women infected with a high-risk HPV develop a long-term infection that puts them at risk of developing cervical cancer.
Why Was This Study Done?
Screening programs have greatly reduced cervical cancer deaths in developed countries in recent decades by detecting the cancer early when it can be treated; but it would be better to prevent cervical cancer ever developing. Because HPV is necessary for the development of cervical cancer, vaccination of girls against HPV infection before the onset of sexual activity might be one way to do this. Scientists recently developed a vaccine that prevents infection with HPV 16 and HPV 18 (and with two HPVs that cause genital warts) and that should, therefore, reduce the incidence of cervical cancer. Publicly funded HPV vaccination programs are now planned or underway in several countries; but before girls can receive the HPV vaccine, parental consent is usually needed, so it is important to know what influences parental decisions about HPV vaccination. In this study, the researchers undertake a telephone survey to determine the uptake of the HPV vaccine by 11-year-old girls (grade 6) in British Columbia, Canada, and to determine the parental factors associated with vaccine uptake; British Columbia started a voluntary school-based HPV vaccine program in September 2008.
What Did the Researchers Do and Find?
In early 2009, the researchers contacted randomly selected parents of girls enrolled in grade 6 during the 2008–2009 academic year and asked them to complete a telephone survey that explored factors associated with HPV vaccine uptake. 65.1% of the 2,025 parents who completed the survey had consented to their daughter receiving the first dose of HPV vaccine. By contrast, more than 85% of the parents had consented to hepatitis B and meningitis C vaccination of their daughters. Nearly half of the parents surveyed said their main reason for consenting to HPV vaccination was the effectiveness of the vaccine. Conversely, nearly a third of the parents said concern about the vaccine's safety was their main reason for not consenting to vaccination and one in eight said they had been given insufficient information to make an informed decision. In a statistical analysis of the survey data, the researchers found that a positive parental attitude towards vaccination, a parental belief that HPV vaccination had limited impact on sexual practices, and completed childhood vaccination increased the likelihood of a daughter receiving the HPV vaccine. Having a family with two parents or three or more children and having well-educated parents decreased the likelihood of a daughter receiving the vaccine.
What Do These Findings Mean?
These findings provide one of the first population-based assessments of the factors that affect HPV vaccine uptake in a setting where there are no financial or health care barriers to vaccination. By identifying the factors associated with parental reluctance to agree to HPV vaccination for their daughters, these findings should help public-health officials design strategies to ensure optimal HPV vaccine uptake, although further studies are needed to discover why, for example, parents with more education are less likely to agree to vaccination than parents with less education. Importantly, the findings of this study, which are likely to be generalizable to other high-income countries, indicate that there is a continued need to ensure that the public receives credible, clear information about both the benefits and long-term safety of HPV vaccination.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000270.
The US National Cancer Institute provides information about cervical cancer for patients and for health professionals, including information on HPV vaccines (in English and Spanish)
The US Centers for Disease Control and Prevention also has information about cervical cancer and about HPV
The UK National Health Service Choices website has pages on cervical cancer and on HPV vaccination
More information about cervical cancer and HPV vaccination is available from the Macmillan cancer charity
ImmunizeBC provides general information about vaccination and information about HPV vaccination in British Columbia
MedlinePlus provides links to additional resources about cervical cancer (in English and Spanish)
doi:10.1371/journal.pmed.1000270
PMCID: PMC2864299  PMID: 20454567
12.  Sequencing and analysis of the gene-rich space of cowpea 
BMC Genomics  2008;9:103.
Background
Cowpea, Vigna unguiculata (L.) Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing.
Results
We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF) technology. Over 250,000 gene-space sequence reads (GSRs) with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa), and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO) with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A total of 5,888 GSRs had homology to genes encoding transcription factors (TFs) and transcription associated factors (TAFs) representing about 5% of the total annotated sequences in the dataset. Sixty-two (62) of the 64 well-characterized plant transcription factor (TF) gene families are represented in the cowpea GSRs, and these families are of similar size and phylogenetic organization to those characterized in other plants. The cowpea GSRs also provides a rich source of genes involved in photoperiodic control, symbiosis, and defense-related responses. Comparisons to available databases revealed that about 74% of cowpea ESTs and 70% of all legume ESTs were represented in the GSR dataset. As approximately 12% of all GSRs contain an identifiable simple-sequence repeat, the dataset is a powerful resource for the design of microsatellite markers.
Conclusion
The availability of extensive publicly available genomic data for cowpea, a non-model legume with significant importance in the developing world, represents a significant step forward in legume research. Not only does the gene space sequence enable the detailed analysis of gene structure, gene family organization and phylogenetic relationships within cowpea, but it also facilitates the characterization of syntenic relationships with other cultivated and model legumes, and will contribute to determining patterns of chromosomal evolution in the Leguminosae. The micro and macrosyntenic relationships detected between cowpea and other cultivated and model legumes should simplify the identification of informative markers for marker-assisted trait selection and map-based gene isolation necessary for cowpea improvement.
doi:10.1186/1471-2164-9-103
PMCID: PMC2279124  PMID: 18304330
13.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
doi:10.1038/msb.2010.64
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
14.  Genetic Predictors of Response to Serotonergic and Noradrenergic Antidepressants in Major Depressive Disorder: A Genome-Wide Analysis of Individual-Level Data and a Meta-Analysis 
PLoS Medicine  2012;9(10):e1001326.
Testing whether genetic information could inform the selection of the best drug for patients with depression, Rudolf Uher and colleagues searched for genetic variants that could predict clinically meaningful responses to two major groups of antidepressants.
Background
It has been suggested that outcomes of antidepressant treatment for major depressive disorder could be significantly improved if treatment choice is informed by genetic data. This study aims to test the hypothesis that common genetic variants can predict response to antidepressants in a clinically meaningful way.
Methods and Findings
The NEWMEDS consortium, an academia–industry partnership, assembled a database of over 2,000 European-ancestry individuals with major depressive disorder, prospectively measured treatment outcomes with serotonin reuptake inhibiting or noradrenaline reuptake inhibiting antidepressants and available genetic samples from five studies (three randomized controlled trials, one part-randomized controlled trial, and one treatment cohort study). After quality control, a dataset of 1,790 individuals with high-quality genome-wide genotyping provided adequate power to test the hypotheses that antidepressant response or a clinically significant differential response to the two classes of antidepressants could be predicted from a single common genetic polymorphism. None of the more than half million genetic markers significantly predicted response to antidepressants overall, serotonin reuptake inhibitors, or noradrenaline reuptake inhibitors, or differential response to the two types of antidepressants (genome-wide significance p<5×10−8). No biological pathways were significantly overrepresented in the results. No significant associations (genome-wide significance p<5×10−8) were detected in a meta-analysis of NEWMEDS and another large sample (STAR*D), with 2,897 individuals in total. Polygenic scoring found no convergence among multiple associations in NEWMEDS and STAR*D.
Conclusions
No single common genetic variant was associated with antidepressant response at a clinically relevant level in a European-ancestry cohort. Effects specific to particular antidepressant drugs could not be investigated in the current study.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Genetic and environmental factors can influence a person's response to medications. Taking advantage of the recent advancements in genetics, scientists are working to match specific gene variations with responses to particular medications. Knowing whether a patient is likely to respond to a drug or have serious side effects would allow doctors to select the best treatment up front. Right now, there are only a handful of examples where a patient's version of a particular gene predicts their response to a particular drug. Some scientists believe that there will be many more such matches between genetic variants and treatment responses. Others think that because the action of most drugs is influenced by many different genes, a variant in one of those genes is unlikely to have measurable effect in most cases.
Why Was This Study Done?
One of the areas where patients' responses to available drugs vary widely is severe depression (or major depressive disorder). Prescription of an antidepressant is often the first step in treating the disease. However, less than half of patients get well taking the first antidepressant prescribed. Those who don't respond to the first drug need to, together with their doctors, try multiple courses of treatment to find the right drug and the right dose for them. For some patients none of the existing drugs work well.
To see whether genetic information could help improve the choice of antidepressant, researchers from universities and the pharmaceutical industry joined forces in this large study. They examined two ways to use genetic information to improve the treatment of depression. First, they searched all genes for common genetic variants that could predict which patients would not respond to the two major groups of antidepressants (serotonin reuptake inhibitors, or SRIs, and noradrenaline reuptake inhibitors, or NRIs). They hoped that this would help with the development of new drugs that could help these patients. Second, they looked for common genetic variants in all genes that could identify patients who responded to one of the two major groups of antidepressants. Such predictors would make it possible to know which drug to prescribe for which patient.
What Did the Researchers Do and Find?
The researchers selected 1,790 patients with severe depression who had participated in one of several research studies; 1,222 of the patients had been treated with an SRI, the remaining 568 with an NRI, and it was recorded how well the drugs worked for each patient. The researchers also had a detailed picture of the genetic make-up of each patient, with information for over half a million genetic variants. They then looked for an association between genetic variants and responses to drugs.
They found not a single genetic variant that could predict clearly whether a person would respond to antidepressants in general, to one of the two main groups (SRIs and NRIs), or much better to one than the other. They also didn't find any combination of variants in groups of genes that work together that could predict responses. Combining their data with those from another large study did not yield any robust predictors either.
What Do These Findings Mean?
This study was large enough that it should have been possible to find common genetic variants that by themselves could predict a clinically meaningful response to SRIs and/or NRIs, had such variants existed. The fact that the study failed to find such variants suggests that such variants do not exist. It is still possible, however, that variants that are less common could predict response, or that combinations of variants could. To find those, if they do exist, even larger studies will need to be done.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001326
The National Institute of General Medical Sciences at the US National Institutes of Health has a fact sheet on personalized medicine
PubMed Health at the US National Library of Medicine has a page on major depressive disorder
Wikipedia has pages on major depressive disorder and pharmacogenetics, the study of how genetic variation affects response to certain drugs (note that Wikipedia is a free online encyclopedia that anyone can edit)
The UK National Health Service has comprehensive information pages on depression
doi:10.1371/journal.pmed.1001326
PMCID: PMC3472989  PMID: 23091423
15.  The Role of Abcb5 Alleles in Susceptibility to Haloperidol-Induced Toxicity in Mice and Humans 
PLoS Medicine  2015;12(2):e1001782.
Background
We know very little about the genetic factors affecting susceptibility to drug-induced central nervous system (CNS) toxicities, and this has limited our ability to optimally utilize existing drugs or to develop new drugs for CNS disorders. For example, haloperidol is a potent dopamine antagonist that is used to treat psychotic disorders, but 50% of treated patients develop characteristic extrapyramidal symptoms caused by haloperidol-induced toxicity (HIT), which limits its clinical utility. We do not have any information about the genetic factors affecting this drug-induced toxicity. HIT in humans is directly mirrored in a murine genetic model, where inbred mouse strains are differentially susceptible to HIT. Therefore, we genetically analyzed this murine model and performed a translational human genetic association study.
Methods and Findings
A whole genome SNP database and computational genetic mapping were used to analyze the murine genetic model of HIT. Guided by the mouse genetic analysis, we demonstrate that genetic variation within an ABC-drug efflux transporter (Abcb5) affected susceptibility to HIT. In situ hybridization results reveal that Abcb5 is expressed in brain capillaries, and by cerebellar Purkinje cells. We also analyzed chromosome substitution strains, imaged haloperidol abundance in brain tissue sections and directly measured haloperidol (and its metabolite) levels in brain, and characterized Abcb5 knockout mice. Our results demonstrate that Abcb5 is part of the blood-brain barrier; it affects susceptibility to HIT by altering the brain concentration of haloperidol. Moreover, a genetic association study in a haloperidol-treated human cohort indicates that human ABCB5 alleles had a time-dependent effect on susceptibility to individual and combined measures of HIT. Abcb5 alleles are pharmacogenetic factors that affect susceptibility to HIT, but it is likely that additional pharmacogenetic susceptibility factors will be discovered.
Conclusions
ABCB5 alleles alter susceptibility to HIT in mouse and humans. This discovery leads to a new model that (at least in part) explains inter-individual differences in susceptibility to a drug-induced CNS toxicity.
Gary Peltz and colleagues examine the role of ABCB5 alleles in haloperidol-induced toxicity in a murine genetic model and humans treated with haloperidol.
Editors' Summary
Background
The brain is the control center of the human body. This complex organ controls thoughts, memory, speech, and movement, it is the seat of intelligence, and it regulates the function of many organs. The brain comprises many different parts, all of which work together but all of which have their own special functions. For example, the forebrain is involved in intellectual activities such as thinking whereas the hindbrain controls the body’s vital functions and movements. Messages are passed between the various regions of the brain and to other parts of the body by specialized cells called neurons, which release and receive signal molecules known as neurotransmitters. Like all the organs in the body, blood vessels supply the brain with the oxygen, water, and nutrients it needs to function. Importantly, however, the brain is protected from infectious agents and other potentially dangerous substances circulating in the blood by the “blood-brain barrier,” a highly selective permeability barrier that is formed by the cells lining the fine blood vessels (capillaries) within the brain.
Why Was This Study Done?
Although drugs have been developed to treat various brain disorders, more active and less toxic drugs are needed to improve the treatment of many if not most of these conditions. Unfortunately, relatively little is known about how the blood-brain barrier regulates the entry of drugs into the brain or about the genetic factors that affect the brain’s susceptibility to drug-induced toxicities. It is not known, for example, why about half of patients given haloperidol—a drug used to treat psychotic disorders (conditions that affect how people think, feel, or behave)—develop tremors and other symptoms caused by alterations in the brain region that controls voluntary movements. Here, to improve our understanding of how drugs enter the brain and impact its function, the researchers investigate the genetic factors that affect haloperidol-induced toxicity by genetically analyzing several inbred mouse strains (every individual in an inbred mouse strain is genetically identical) with different susceptibilities to haloperidol-induced toxicity and by undertaking a human genetic association study (a study that looks for non-chance associations between specific traits and genetic variants).
What Did the Researchers Do and Find?
The researchers used a database of genetic variants called single nucleotide polymorphisms (SNPs) and a computational genetic mapping approach to show first that variations within the gene encoding Abcb5 affected susceptibility to haloperidol-induced toxicity (indicated by changes in the length of time taken by mice to move their paws when placed on an inclined wire-mesh screen) among inbred mouse strains. Abcb5 is an ATP-binding cassette transporter, a type of protein that moves molecules across cell membranes. The researchers next showed that Abcb5 is expressed in brain capillaries, which is the location of the blood-brain barrier. Abcb5 was also expressed in cerebellar Purkinje cells, which help to control motor (intentional) movements. They also measured the measured the effect of haloperidol and the haloperidol concentration in brain tissue sections in mice that were genetically engineered to make no Abcb5 (Abcb5 knockout mice). Finally, the researchers investigated whether specific alleles (alternative versions) of ABCB5 are associated with haloperidol-induced toxicity in people. Among a group of 85 patients treated with haloperidol for a psychotic illness, one specific ABCB5 allele was associated with haloperidol-induced toxicity during the first few days of treatment.
What Do These Findings Mean?
These findings indicate that Abcb5 is a component of the blood-brain barrier in mice and suggest that genetic variants in the gene encoding this protein underlie, at least in part, the differences in susceptibility to haloperidol-induced toxicity seen among inbred mice strains. Moreover, the human genetic association study indicates that a specific ABCB5 allele also affects the susceptibility of people to haloperidol-induced toxicity. The researchers note that other ABCB5 alleles or other genetic factors that affect haloperidol-induced toxicity in people might emerge if larger groups of patients were studied. However, based on their findings, the researchers propose a new model for the genetic mechanisms that underlie inter-individual and cell type-specific differences in susceptibility to haloperidol-induced brain toxicity. If confirmed in future studies, this model might facilitate the development of more effective and less toxic drugs to treat a range of brain disorders.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001782.
The US National Institute of Neurological Disorders and Stroke provides information about a wide range of brain diseases (in English and Spanish); its fact sheet “Brain Basics: Know Your Brain” is a simple introduction to the human brain; its “Blueprint Neurotherapeutics Network” was established to develop new drugs for disorders affecting the brain and other parts of the nervous system
MedlinePlus provides links to additional resources about brain diseases and their treatment (in English and Spanish)
Wikipedia provides information about haloperidol, about ATP-binding cassette transporters and about genetic association (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001782
PMCID: PMC4315575  PMID: 25647612
16.  A Kernel of Truth: Statistical Advances in Polygenic Variance Component Models for Complex Human Pedigrees 
Advances in genetics  2013;81:1-31.
Statistical genetic analysis of quantitative traits in large pedigrees is a formidable computational task due to the necessity of taking the non-independence among relatives into account. With the growing awareness that rare sequence variants may be important in human quantitative variation, heritability and association study designs involving large pedigrees will increase in frequency due to the greater chance of observing multiple copies of rare variants amongst related individuals. Therefore, it is important to have statistical genetic test procedures that utilize all available information for extracting evidence regarding genetic association. Optimal testing for marker/phenotype association involves the exact calculation of the likelihood ratio statistic which requires the repeated inversion of potentially large matrices. In a whole genome sequence association context, such computation may be prohibitive. Toward this end, we have developed a rapid and efficient eigensimplification of the likelihood that makes analysis of family data commensurate with the analysis of a comparable sample of unrelated individuals. Our theoretical results which are based on a spectral representation of the likelihood yield simple exact expressions for the expected likelihood ratio test statistic (ELRT) for pedigrees of arbitrary size and complexity. For heritability, the ELRT is: −∑ln[1+ĥ2(λgi−1)], where ĥ2 and λgi are respectively the heritability and eigenvalues of the pedigree-derived genetic relationship kernel (GRK). For association analysis of sequence variants, the ELRT is given by ELRT[hq2>0:unrelateds]−(ELRT[ht2>0:pedigrees]−ELRT[hr2>0:pedigrees]), where ht2,hq2, and hr2 are the total, quantitative trait nucleotide, and residual heritabilities, respectively. Using these results, fast and accurate analytical power analyses are possible, eliminating the need for computer simulation. Additional benefits of eigensimplification include a simple method for calculation of the exact distribution of the ELRT under the null hypothesis which turns out to differ from that expected under the usual asymptotic theory. Further, when combined with the use of empirical GRKs—estimated over a large number of genetic markers— our theory reveals potential problems associated with non positive semi-definite kernels. These procedures are being added to our general statistical genetic computer package, SOLAR.
doi:10.1016/B978-0-12-407677-8.00001-4
PMCID: PMC4019427  PMID: 23419715
17.  Statistical Power of Model Selection Strategies for Genome-Wide Association Studies 
PLoS Genetics  2009;5(7):e1000582.
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/.
Author Summary
Almost all published genome-wide association studies are based on single-marker analysis. Intuitively, joint consideration of multiple markers should be more informative when multiple genes and their interactions are involved in disease etiology. For example, an exhaustive search among models involving multiple markers and their interactions can identify certain gene–gene interactions that will be missed by single-marker analysis. However, an exhaustive search is difficult, or even impossible, to perform because of the computational requirements. Moreover, searching more models does not necessarily increase statistical power, because there may be an increased chance of finding false positive results when more models are explored. For power comparisons of different model selection methods, the published studies have relied on limited simulations due to the highly computationally intensive nature of such simulation studies. To enable researchers to compare different model search strategies without resorting to extensive simulations, we develop a novel analytical approach to evaluating the statistical power of these methods. Our results offer insights into how different parameters in a genetic model affect the statistical power of a given model selection strategy. We developed an R package to implement our results. This package can be used by researchers to compare and select an effective approach to detecting SNPs.
doi:10.1371/journal.pgen.1000582
PMCID: PMC2712761  PMID: 19649321
18.  Integrating Computational Biology and Forward Genetics in Drosophila 
PLoS Genetics  2009;5(1):e1000351.
Genetic screens are powerful methods for the discovery of gene–phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of “omics” data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene–gene association discovery.
Author Summary
Genome sequencing and annotation, combined with large-scale molecular experiments to query gene expression and molecular interactions, collectively known as Systems Biology, have resulted in an enormous wealth in biological databases. Yet, it remains a daunting task to use these data to decipher the rules that govern biological systems. One of the most trusted approaches in biology is genetic analysis because of its emphasis on gene function in living organisms. Genetics, however, proceeds slowly and unravels small-scale interactions. Turning genetics into an effective tool of Systems Biology requires harnessing the large-scale molecular data for the design and execution of genetic screens. In this work, we test the idea of exploiting a computational approach known as gene prioritization to pre-rank genes for the likelihood of their involvement in a process of interest. By carrying out a gene prioritization–supported genetic screen, we greatly enhance the speed and output of in vivo genetic screens without compromising their sensitivity. These results mean that future genetic screens can be custom-catered for any process of interest and carried out with a speed and efficiency that is comparable to other large-scale molecular experiments. We refer to this combined approach as Systems Genetics.
doi:10.1371/journal.pgen.1000351
PMCID: PMC2628282  PMID: 19165344
19.  Association of NCF2, IKZF1, IRF8, IFIH1, and TYK2 with Systemic Lupus Erythematosus 
PLoS Genetics  2011;7(10):e1002341.
Systemic lupus erythematosus (SLE) is a complex trait characterised by the production of a range of auto-antibodies and a diverse set of clinical phenotypes. Currently, ∼8% of the genetic contribution to SLE in Europeans is known, following publication of several moderate-sized genome-wide (GW) association studies, which identified loci with a strong effect (OR>1.3). In order to identify additional genes contributing to SLE susceptibility, we conducted a replication study in a UK dataset (870 cases, 5,551 controls) of 23 variants that showed moderate-risk for lupus in previous studies. Association analysis in the UK dataset and subsequent meta-analysis with the published data identified five SLE susceptibility genes reaching genome-wide levels of significance (Pcomb<5×10−8): NCF2 (Pcomb = 2.87×10−11), IKZF1 (Pcomb = 2.33×10−9), IRF8 (Pcomb = 1.24×10−8), IFIH1 (Pcomb = 1.63×10−8), and TYK2 (Pcomb = 3.88×10−8). Each of the five new loci identified here can be mapped into interferon signalling pathways, which are known to play a key role in the pathogenesis of SLE. These results increase the number of established susceptibility genes for lupus to ∼30 and validate the importance of using large datasets to confirm associations of loci which moderately increase the risk for disease.
Author Summary
Genome-wide association studies have revolutionised our ability to identify common susceptibility alleles for systemic lupus erythematosus (SLE). In complex diseases such as SLE, where many different genes make a modest contribution to disease susceptibility, it is necessary to perform large-scale association studies to combine results from several datasets, to have sufficient power to identify highly significant novel loci (P<5×10−8). Using a large SLE collection of 870 UK SLE cases and 5,551 UK unaffected individuals, we firstly replicated ten moderate-risk alleles (P<0.05) from a US–Swedish study of 3,273 SLE cases and 12,188 healthy controls. Combining our results with the US-Swedish data identified five new loci, which crossed the level for genome-wide significance: NCF2 (neutrophil cytosolic factor 2), IKZF1 (Ikaros family zinc-finger 1), IRF8 (interferon regulatory factor 8), IFIH1 (interferon-induced helicase C domain-containing protein 1), and TYK2 (tyrosine kinase 2). Each of these five genes regulates a different aspect of the immune response and contributes to the production of type-I and type-II interferons. Although further studies will be required to identify the causal alleles within these loci, the confirmation of five new susceptibility genes for lupus makes a significant step forward in our understanding of the genetic contribution to SLE.
doi:10.1371/journal.pgen.1002341
PMCID: PMC3203198  PMID: 22046141
20.  Increasing Power of Groupwise Association Test with Likelihood Ratio Test 
Journal of Computational Biology  2011;18(11):1611-1624.
Abstract
Sequencing studies have been discovering a numerous number of rare variants, allowing the identification of the effects of rare variants on disease susceptibility. As a method to increase the statistical power of studies on rare variants, several groupwise association tests that group rare variants in genes and detect associations between genes and diseases have been proposed. One major challenge in these methods is to determine which variants are causal in a group, and to overcome this challenge, previous methods used prior information that specifies how likely each variant is causal. Another source of information that can be used to determine causal variants is the observed data because case individuals are likely to have more causal variants than control individuals. In this article, we introduce a likelihood ratio test (LRT) that uses both data and prior information to infer which variants are causal and uses this finding to determine whether a group of variants is involved in a disease. We demonstrate through simulations that LRT achieves higher power than previous methods. We also evaluate our method on mutation screening data of the susceptibility gene for ataxia telangiectasia, and show that LRT can detect an association in real data. To increase the computational speed of our method, we show how we can decompose the computation of LRT, and propose an efficient permutation test. With this optimization, we can efficiently compute an LRT statistic and its significance at a genome-wide level. The software for our method is publicly available at http://genetics.cs.ucla.edu/rarevariants.
doi:10.1089/cmb.2011.0161
PMCID: PMC3216097  PMID: 21919745
rare variants; association studies; SNPs; genetics; statistics
21.  Survival-Related Profile, Pathways, and Transcription Factors in Ovarian Cancer 
PLoS Medicine  2009;6(2):e1000024.
Background
Ovarian cancer has a poor prognosis due to advanced stage at presentation and either intrinsic or acquired resistance to classic cytotoxic drugs such as platinum and taxoids. Recent large clinical trials with different combinations and sequences of classic cytotoxic drugs indicate that further significant improvement in prognosis by this type of drugs is not to be expected. Currently a large number of drugs, targeting dysregulated molecular pathways in cancer cells have been developed and are introduced in the clinic. A major challenge is to identify those patients who will benefit from drugs targeting these specific dysregulated pathways.The aims of our study were (1) to develop a gene expression profile associated with overall survival in advanced stage serous ovarian cancer, (2) to assess the association of pathways and transcription factors with overall survival, and (3) to validate our identified profile and pathways/transcription factors in an independent set of ovarian cancers.
Methods and Findings
According to a randomized design, profiling of 157 advanced stage serous ovarian cancers was performed in duplicate using ∼35,000 70-mer oligonucleotide microarrays. A continuous predictor of overall survival was built taking into account well-known issues in microarray analysis, such as multiple testing and overfitting. A functional class scoring analysis was utilized to assess pathways/transcription factors for their association with overall survival. The prognostic value of genes that constitute our overall survival profile was validated on a fully independent, publicly available dataset of 118 well-defined primary serous ovarian cancers. Furthermore, functional class scoring analysis was also performed on this independent dataset to assess the similarities with results from our own dataset. An 86-gene overall survival profile discriminated between patients with unfavorable and favorable prognosis (median survival, 19 versus 41 mo, respectively; permutation p-value of log-rank statistic = 0.015) and maintained its independent prognostic value in multivariate analysis. Genes that composed the overall survival profile were also able to discriminate between the two risk groups in the independent dataset. In our dataset 17/167 pathways and 13/111 transcription factors were associated with overall survival, of which 16 and 12, respectively, were confirmed in the independent dataset.
Conclusions
Our study provides new clues to genes, pathways, and transcription factors that contribute to the clinical outcome of serous ovarian cancer and might be exploited in designing new treatment strategies.
Ate van der Zee and colleagues analyze the gene expression profiles of ovarian cancer samples from 157 patients, and identify an 86-gene expression profile that seems to predict overall survival.
Editors' Summary
Background.
Ovarian cancer kills more than 100,000 women every year and is one of the most frequent causes of cancer death in women in Western countries. Most ovarian cancers develop when an epithelial cell in one of the ovaries (two small organs in the pelvis that produce eggs) acquires genetic changes that allow it to grow uncontrollably and to spread around the body (metastasize). In its early stages, ovarian cancer is confined to the ovaries and can often be treated successfully by surgery alone. Unfortunately, early ovarian cancer rarely has symptoms so a third of women with ovarian cancer have advanced disease when they first visit their doctor with symptoms that include vague abdominal pains and mild digestive disturbances. That is, cancer cells have spread into their abdominal cavity and metastasized to other parts of the body (so-called stage III and IV disease). The outlook for women diagnosed with stage III and IV disease, which are treated with a combination of surgery and chemotherapy, is very poor. Only 30% of women with stage III, and 5% with stage IV, are still alive five years after their cancer is diagnosed.
Why Was This Study Done?
If the cellular pathways that determine the biological behavior of ovarian cancer could be identified, it might be possible to develop more effective treatments for women with stage III and IV disease. One way to identify these pathways is to use gene expression profiling (a technique that catalogs all the genes expressed by a cell) to compare gene expression patterns in the ovarian cancers of women who survive for different lengths of time. Genes with different expression levels in tumors with different outcomes could be targets for new treatments. For example, it might be worth developing inhibitors of proteins whose expression is greatest in tumors with short survival times. In this study, the researchers develop an expression profile that is associated with overall survival in advanced-stage serous ovarian cancer (more than half of ovarian cancers originate in serous cells, epithelial cells that secrete a watery fluid). The researchers also assess the association of various cellular pathways and transcription factors (proteins that control the expression of other proteins) with survival in this type of ovarian carcinoma.
What Did the Researchers Do and Find?
The researchers analyzed the gene expression profiles of tumor samples taken from 157 patients with advanced stage serous ovarian cancer and used the “supervised principal components” method to build a predictor of overall survival from these profiles and patient survival times. This 86-gene predictor discriminated between patients with favorable and unfavorable outcomes (average survival times of 41 and 19 months, respectively). It also discriminated between groups of patients with these two outcomes in an independent dataset collected from 118 additional serous ovarian cancers. Next, the researchers used “functional class scoring” analysis to assess the association between pathway and transcription factor expression in the tumor samples and overall survival. Seventeen of 167 KEGG pathways (“wiring” diagrams of molecular interactions, reactions and relations involved in cellular processes and human diseases listed in the Kyoto Encyclopedia of Genes and Genomes) were associated with survival, 16 of which were confirmed in the independent dataset. Finally, 13 of 111 analyzed transcription factors were associated with overall survival in the tumor samples, 12 of which were confirmed in the independent dataset.
What Do These Findings Mean?
These findings identify an 86-gene overall survival gene expression profile that seems to predict overall survival for women with advanced serous ovarian cancer. However, before this profile can be used clinically, further validation of the profile and more robust methods for determining gene expression profiles are needed. Importantly, these findings also provide new clues about the genes, pathways and transcription factors that contribute to the clinical outcome of serous ovarian cancer, clues that can now be exploited in the search for new treatment strategies. Finally, these findings suggest that it might eventually be possible to tailor therapies to the needs of individual patients by analyzing which pathways are activated in their tumors and thus improve survival times for women with advanced ovarian cancer.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000024.
This study is further discussed in a PLoS Medicine Perspective by Simon Gayther and Kate Lawrenson
See also a related PLoS Medicine Research Article by Huntsman and colleagues
The US National Cancer Institute provides a brief description of what cancer is and how it develops, and information on all aspects of ovarian cancer for patients and professionals (in English and Spanish)
The UK charity Cancerbackup provides general information about cancer, and more specific information about ovarian cancer
MedlinePlus also provides links to other information about ovarian cancer (in English and Spanish)
The KEGG Pathway database provides pathway maps of known molecular networks involved in a wide range of cellular processes
doi:10.1371/journal.pmed.1000024
PMCID: PMC2634794  PMID: 19192944
22.  Construction of gene clusters resembling genetic causal mechanisms for common complex disease with an application to young-onset hypertension 
BMC Genomics  2013;14:497.
Background
Lack of power and reproducibility are caveats of genetic association studies of common complex diseases. Indeed, the heterogeneity of disease etiology demands that causal models consider the simultaneous involvement of multiple genes. Rothman’s sufficient-cause model, which is well known in epidemiology, provides a framework for such a concept. In the present work, we developed a three-stage algorithm to construct gene clusters resembling Rothman’s causal model for a complex disease, starting from finding influential gene pairs followed by grouping homogeneous pairs.
Results
The algorithm was trained and tested on 2,772 hypertensives and 6,515 normotensives extracted from four large Caucasian and Taiwanese databases. The constructed clusters, each featured by a major gene interacting with many other genes and identified a distinct group of patients, reproduced in both ethnic populations and across three genotyping platforms. We present the 14 largest gene clusters which were capable of identifying 19.3% of hypertensives in all the datasets and 41.8% if one dataset was excluded for lack of phenotype information. Although a few normotensives were also identified by the gene clusters, they usually carried less risky combinatory genotypes (insufficient causes) than the hypertensive counterparts. After establishing a cut-off percentage for risky combinatory genotypes in each gene cluster, the 14 gene clusters achieved a classification accuracy of 82.8% for all datasets and 98.9% if the information-short dataset was excluded. Furthermore, not only 10 of the 14 major genes but also many other contributing genes in the clusters are associated with either hypertension or hypertension-related diseases or functions.
Conclusions
We have shown with the constructed gene clusters that a multi-causal pie-multi-component approach can indeed improve the reproducibility of genetic markers for complex disease. In addition, our novel findings including a major gene in each cluster and sufficient risky genotypes in a cluster for disease onset (which coincides with Rothman’s sufficient cause theory) may not only provide a new research direction for complex diseases but also help to reveal the disease etiology.
doi:10.1186/1471-2164-14-497
PMCID: PMC3751083  PMID: 23879630
Genetic causal mechanism; Sufficient cause; Data-mining; Young-onset hypertension; Complex disease
23.  A Novel Substrate-Based HIV-1 Protease Inhibitor Drug Resistance Mechanism 
PLoS Medicine  2007;4(1):e36.
Background
HIV protease inhibitor (PI) therapy results in the rapid selection of drug resistant viral variants harbouring one or two substitutions in the viral protease. To combat PI resistance development, two approaches have been developed. The first is to increase the level of PI in the plasma of the patient, and the second is to develop novel PI with high potency against the known PI-resistant HIV protease variants. Both approaches share the requirement for a considerable increase in the number of protease mutations to lead to clinical resistance, thereby increasing the genetic barrier. We investigated whether HIV could yet again find a way to become less susceptible to these novel inhibitors.
Methods and Findings
We have performed in vitro selection experiments using a novel PI with an increased genetic barrier (RO033-4649) and demonstrated selection of three viruses 4- to 8-fold resistant to all PI compared to wild type. These PI-resistant viruses did not have a single substitution in the viral protease. Full genomic sequencing revealed the presence of NC/p1 cleavage site substitutions in the viral Gag polyprotein (K436E and/or I437T/V) in all three resistant viruses. These changes, when introduced in a reference strain, conferred PI resistance. The mechanism leading to PI resistance is enhancement of the processing efficiency of the altered substrate by wild-type protease. Analysis of genotypic and phenotypic resistance profiles of 28,000 clinical isolates demonstrated the presence of these NC/p1 cleavage site mutations in some clinical samples (codon 431 substitutions in 13%, codon 436 substitutions in 8%, and codon 437 substitutions in 10%). Moreover, these cleavage site substitutions were highly significantly associated with reduced susceptibility to PI in clinical isolates lacking primary protease mutations. Furthermore, we used data from a clinical trial (NARVAL, ANRS 088) to demonstrate that these NC/p1 cleavage site changes are associated with virological failure during PI therapy.
Conclusions
HIV can use an alternative mechanism to become resistant to PI by changing the substrate instead of the protease. Further studies are required to determine to what extent cleavage site mutations may explain virological failure during PI therapy.
Changes in the cleavage site of the Gag substrate for the HIV protease can convey resistance to protease inhibitors and might contribute to virologic failure during therapy that includes these drugs.
Editors' Summary
Background.
Twenty-five years ago, infection with the human immunodeficiency virus (HIV)—the causative agent of AIDS—was a death sentence. However, drugs that attack various stages of the HIV life cycle were soon developed that, although not curing the infection, kept it in check when used in combination and greatly increased the life expectancy of people infected with HIV. Unfortunately, viruses resistant to these drugs have rapidly emerged and antiviral therapy now fails in many patients. The use of HIV protease inhibitors (PIs) in combination therapies, for example, has led to the stepwise selection of viral variants resistant to these drugs. Resistance is first acquired when the viral protease changes so that PIs no longer bind to it and inhibit it efficiently. These changes often reduce the efficiency with which the protease binds its substrates—polyproteins called Gag and GagPol that it chops up into smaller proteins to make new viral particles. So the next step is the accumulation of changes elsewhere in the protease that make it work better, and sometimes changes in its substrate that make it easier to cut; these compensatory changes do not directly affect viral resistance to PIs.
Why Was This Study Done?
To prevent viruses with resistance to PIs emerging, drug doses are kept high in patients and new PIs are being developed with high potency against known PI-resistant HIV variants. Both approaches set a “high genetic barrier” to the development of PI resistance by ensuring that HIV has to incorporate many changes in its protease to become resistant. But, the HIV genome naturally changes—mutates—very rapidly, so novel HIV variants could emerge that are less susceptible to the new potent PIs without the virus having to leap this high genetic barrier. In this study, the researchers have investigated whether HIV can find an alternative route to PI resistance that does not involve the introduction of multiple changes into its protease.
What Did the Researchers Do and Find?
The researchers took wild-type HIV and treated it in the laboratory with a new PI regimen that has a high genetic barrier. By gradually increasing its concentration, the researchers selected three viral populations that were able to grow in 4- to 8-fold higher concentrations of the PI than wild-type virus. None of these populations had mutations in the viral protease. Instead, they all had mutations near one of the sites—the NC/p1 site—where the protease normally cuts the Gag polyprotein. These mutations, the researchers report, enhanced the overall efficiency with which the wild-type protease cleaved the polyprotein, and a selection experiment with another PI showed that the development of PI resistance through alterations near the NC/p1 cleavage site was not unique to one PI. The researchers also investigated the potential clinical significance of this new drug resistance mechanism by looking for the same mutations in nearly 30,000 patient samples. Many of the samples did indeed have these mutations. Finally, they showed that mutations at the NC/p1 cleavage site were associated with virological failure (increased viral replication) during PI therapy in an ongoing clinical trial.
What Do These Findings Mean?
These results suggest that increased polyprotein processing because of mutations in the natural substrate of the HIV protease might be a new mechanism by which HIV can become resistant to PIs. This strategy, which occurs in the laboratory and in patients, allows HIV to develop PI resistance without the need for multiple changes in its protease and so avoids the high genetic barrier to resistance that new PIs provide. Clinical studies are now needed to test which of the mutations seen in this study contribute to virological failure, whether the degree of this failure is clinically relevant, and whether these substrate mutations enhance the effect of protease mutations. If the clinical importance of the new mechanism is confirmed, genetic examination of both the polyprotein and the protease will be needed when trying to figure out why a PI-containing therapy is failing in individual patients. Furthermore, it will be necessary to test whether this mechanism can contribute to the development of resistance when evaluating new drugs.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040036.
US National Institute of Allergy and Infectious Diseases factsheet on HIV infection and AIDS
US Department of Health and Human Services information on AIDS
US Centers for Disease Control and Prevention information on HIV/AIDS
Aidsmap information on HIV and AIDS provided by the charity NAM
BioAfrica, Bioinformatics for HIV Research, information on HIV-1 protease cleavage sites
doi:10.1371/journal.pmed.0040036
PMCID: PMC1769415  PMID: 17227139
24.  Clustered Environments and Randomized Genes: A Fundamental Distinction between Conventional and Genetic Epidemiology  
PLoS Medicine  2007;4(12):e352.
Background
In conventional epidemiology confounding of the exposure of interest with lifestyle or socioeconomic factors, and reverse causation whereby disease status influences exposure rather than vice versa, may invalidate causal interpretations of observed associations. Conversely, genetic variants should not be related to the confounding factors that distort associations in conventional observational epidemiological studies. Furthermore, disease onset will not influence genotype. Therefore, it has been suggested that genetic variants that are known to be associated with a modifiable (nongenetic) risk factor can be used to help determine the causal effect of this modifiable risk factor on disease outcomes. This approach, mendelian randomization, is increasingly being applied within epidemiological studies. However, there is debate about the underlying premise that associations between genotypes and disease outcomes are not confounded by other risk factors. We examined the extent to which genetic variants, on the one hand, and nongenetic environmental exposures or phenotypic characteristics on the other, tend to be associated with each other, to assess the degree of confounding that would exist in conventional epidemiological studies compared with mendelian randomization studies.
Methods and Findings
We estimated pairwise correlations between nongenetic baseline variables and genetic variables in a cross-sectional study comparing the number of correlations that were statistically significant at the 5%, 1%, and 0.01% level (α = 0.05, 0.01, and 0.0001, respectively) with the number expected by chance if all variables were in fact uncorrelated, using a two-sided binomial exact test. We demonstrate that behavioural, socioeconomic, and physiological factors are strongly interrelated, with 45% of all possible pairwise associations between 96 nongenetic characteristics (n = 4,560 correlations) being significant at the p < 0.01 level (the ratio of observed to expected significant associations was 45; p-value for difference between observed and expected < 0.000001). Similar findings were observed for other levels of significance. In contrast, genetic variants showed no greater association with each other, or with the 96 behavioural, socioeconomic, and physiological factors, than would be expected by chance.
Conclusions
These data illustrate why observational studies have produced misleading claims regarding potentially causal factors for disease. The findings demonstrate the potential power of a methodology that utilizes genetic variants as indicators of exposure level when studying environmentally modifiable risk factors.
In a cross-sectional study Davey Smith and colleagues show why observational studies can produce misleading claims regarding potential causal factors for disease, and illustrate the use of mendelian randomization to study environmentally modifiable risk factors.
Editors' Summary
Background.
Epidemiology is the study of the distribution and causes of human disease. Observational epidemiological studies investigate whether particular modifiable factors (for example, smoking or eating healthily) are associated with the risk of a particular disease. The link between smoking and lung cancer was discovered in this way. Once the modifiable factors associated with a disease are established as causal factors, individuals can reduce their risk of developing that disease by avoiding causative factors or by increasing their exposure to protective factors. Unfortunately, modifiable factors that are associated with risk of a disease in observational studies sometimes turn out not to cause or prevent disease. For example, higher intake of vitamins C and E apparently protected people against heart problems in observational studies, but taking these vitamins did not show any protection against heart disease in randomized controlled trials (studies in which identical groups of patients are randomly assigned various interventions and then their health monitored). One explanation for this type of discrepancy is known as confounding—the distortion of the effect of one factor by the presence of another that is associated both with the exposure under study and with the disease outcome. So in this example, people who took vitamin supplements might have also have exercised more than people who did not take supplements and it could have been the exercise rather than the supplements that was protective against heart disease.
Why Was This Study Done?
It isn't always possible to check the results of observational studies in randomized controlled trials so epidemiologists have developed other ways to minimize confounding. One approach is known as mendelian randomization. Several gene variants have been identified that affect risk factors. For example, variants in a gene called APOE affect the level of cholesterol in an individual's blood, a risk factor for heart disease. People inherit gene variants randomly from their parents to build up their own unique genotype (total genetic makeup). Consequently, a study that examines the associations between a gene variant and a disease can indicate whether the risk factor affected by that gene variant causes the disease. There should be no confounding in this type of study, the argument goes, because different genetic variants should not be associated with each other or with nongenetic variables that typically confound directly assessed associations between risk factors and disease. But is this true? In this study, the researchers have tested whether nongenetic risk factors are confounded by each other and also whether genetic variants are confounded by nongenetic risk factors and also by other genetic variants
What Did the Researchers Do and Find?
Using data collected in the British Women's Heart and Health Study, the researchers calculated how many pairs of nongenetic variables (for example, frequency of eating meat, alcohol intake) were significantly correlated with each other. That is, the number of pairs of nongenetic variables in which a high correlation between both variables occurred in more study participants than expected by chance. They compared this number with the number of correlations that would occur by chance if all the variables were totally independent. When the researchers assumed that 1 in 100 combinations of pairs of variables would have been correlated by chance, the ratio of observed to expected significant correlations was seen 45 times more frequently than would be expected by chance. When the researchers repeated this exercise with genetic variants, the ratio of observed to expected significant correlations was 1.58, a figure not significantly different from 1. Similarly, the ratio of observed to expected significant correlations when pairwise combinations between genetic and nongenetic variants were considered was 1.22.
What Do These Findings Mean?
These findings have two main implications. First, the large excess of observed over expected associations among the nongenetic variables indicates that many nongenetic modifiable factors occur in clusters—for example, people with healthy diets often have other healthy habits. Researchers doing observational studies always try to adjust for confounding but this result suggests that this adjustment will be hard to do, in part because it will not always be clear which factors are confounders. Second, the lack of a large excess of observed over expected associations among the genetic variables (and also among genetic variables paired with nongenetic variables) indicates that little confounding is likely to occur in studies that use mendelian randomization. In other words, this approach is a valid way to identify which environmentally modifiable risk factors cause human disease.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040352.
Wikipedia has pages on epidemiology and on mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages).
Epidemiology for the Uninitiated is a primer from the British Medical Journal
Information is available on the British Women's Heart and Health Study
doi:10.1371/journal.pmed.0040352
PMCID: PMC2121108  PMID: 18076282
25.  Using Pre-existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance 
PLoS Computational Biology  2010;6(3):e1000718.
Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available software package that is immediately applicable to any human microarray study.
Author Summary
Though the use of microarrays to identify differentially expressed (DE) genes has become commonplace, it is still not a trivial task. Microarray data are notorious for being noisy, and current DE gene methods do not fully utilize pre-existing biological knowledge to help control this noise. One such source of knowledge is the vast number of publicly available microarray datasets. To leverage this information, we have developed the SVD Augmented Gene expression Analysis Tool (SAGAT) for identifying DE genes. SAGAT extracts transcriptional modules from publicly available microarray data and integrates this information with a dataset of interest. We explore SAGAT's ability to improve DE gene identification on simulated data, and we validate the method on three highly replicated biological datasets. Finally, we demonstrate SAGAT's effectiveness on a novel human dataset investigating the transcriptional response to insulin resistance. Use of SAGAT leads to an increased number of insulin resistant candidate genes, and we validate a subset of these with qPCR. We provide SAGAT as an open source R package that is applicable to any human microarray study.
doi:10.1371/journal.pcbi.1000718
PMCID: PMC2845644  PMID: 20361040

Results 1-25 (1555030)