To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P=5.4×10−60) and 9q31.2 (P=2.2×10−33), we identified 30 novel menarche loci (all P<5×10−8) and found suggestive evidence for a further 10 loci (P<1.9×10−6). New loci included four previously associated with BMI (in/near FTO, SEC16B, TRA2B and TMEM18), three in/near other genes implicated in energy homeostasis (BSX, CRTC1, and MCHR2), and three in/near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and MAGENTA pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.
Analysis of the biological gene networks involved in a disease may lead to the identification of therapeutic targets. Such analysis requires exploring network properties, in particular the importance of individual network nodes (i.e., genes). There are many measures that consider the importance of nodes in a network and some may shed light on the biological significance and potential optimality of a gene or set of genes as therapeutic targets. This has been shown to be the case in cancer therapy. A dilemma exists, however, in finding the best therapeutic targets based on network analysis since the optimal targets should be nodes that are highly influential in, but not toxic to, the functioning of the entire network. In addition, cancer therapeutics targeting a single gene often result in relapse since compensatory, feedback and redundancy loops in the network may offset the activity associated with the targeted gene. Thus, multiple genes reflecting parallel functional cascades in a network should be targeted simultaneously, but require the identification of such targets. We propose a methodology that exploits centrality statistics characterizing the importance of nodes within a gene network that is constructed from the gene expression patterns in that network. We consider centrality measures based on both graph theory and spectral graph theory. We also consider the origins of a network topology, and show how different available representations yield different node importance results. We apply our techniques to tumor gene expression data and suggest that the identification of optimal therapeutic targets involving particular genes, pathways and sub-networks based on an analysis of the nodes in that network is possible and can facilitate individualized cancer treatments. The proposed methods also have the potential to identify candidate cancer therapeutic targets that are not thought to be oncogenes but nonetheless play important roles in the functioning of a cancer-related network or pathway.
network analysis; centrality; cancer; pathway; drug targets; personalized treatment; gene expression
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Sequencing; functional analysis; computer modeling; genomic variation
Dental caries remains a significant public health problem and is considered pandemic worldwide. The prediction of dental caries based on profiling of microbial species involved in disease and equally important, the identification of species conferring dental health has proven more difficult than anticipated due to high interpersonal and geographical variability of dental plaque microbiota. We have used RNA-Seq to perform global gene expression analysis of dental plaque microbiota derived from 19 twin pairs that were either concordant (caries-active or caries-free) or discordant for dental caries. The transcription profiling allowed us to define a functional core microbiota consisting of nearly 60 species. Similarities in gene expression patterns allowed a preliminary assessment of the relative contribution of human genetics, environmental factors and caries phenotype on the microbiota's transcriptome. Correlation analysis of transcription allowed the identification of numerous functional networks, suggesting that inter-personal environmental variables may co-select for groups of genera and species. Analysis of functional role categories allowed the identification of dominant functions expressed by dental plaque biofilm communities, that highlight the biochemical priorities of dental plaque microbes to metabolize diverse sugars and cope with the acid and oxidative stress resulting from sugar fermentation. The wealth of data generated by deep sequencing of expressed transcripts enables a greatly expanded perspective concerning the functional expression of dental plaque microbiota.
caries; oral microbiota; dental plaque; biofilm; transcriptome
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2,907 cases with AN from 14 countries (15 sites) and 14,860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery datasets. Seventy-six (72 independent) SNPs were taken forward for in silico (two datasets) or de novo (13 datasets) replication genotyping in 2,677 independent AN cases and 8,629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication datasets comprised 5,551 AN cases and 21,080 controls. AN subtype analyses (1,606 AN restricting; 1,445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01×10-7) in SOX2OT and rs17030795 (P=5.84×10-6) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76×10-6) between CUL3 and FAM124B and rs1886797 (P=8.05×10-6) near SPATA13. Comparing discovery to replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4×10-6), strongly suggesting that true findings exist but that our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
anorexia nervosa; eating disorders; GWAS; genome-wide association study; body mass index; metabolic
Because altered serotonin (5-HT) function appears to persist after recovery from bulimia nervosa (RBN), we investigated the 5-HT1A receptor, which could contribute to regulation of appetite, mood, impulse control, or the response to antidepressants.
Thirteen RBN individuals were compared to 21 healthy control women (CW) using positron emission tomography and [carbonyl-11C]WAY100635 ([11C]WAY).
RBN had a 23–34% elevation of [11C]WAY binding potential (BP)P in subgenual cingulate, mesial temporal, and parietal regions after adjustments for multiple comparisons. For CW, [11C]WAY BPP was related negatively to novelty seeking, whereas for RBN, [11C]WAY BPP was related positively to harm avoidance and negatively related to sensation seeking.
Alterations of 5-HT1A receptor function may provide new insight into efficacy of 5-HT medication in BN, as well as symptoms such as the ability to inhibit or self-control the expression of behaviors related to stimulus seeking, aggression, and impulsivity.
bulimia nervosa; 5-HT1A receptor; positron emission tomography; behavioral inhibition; subgenual cingulate; mesial temporal cortex
The limitations of genome-wide association (GWA) studies that focus on the phenotypic influence of common genetic variants have motivated human geneticists to consider the contribution of rare variants to phenotypic expression. The increasing availability of high-throughput sequencing technology has enabled studies of rare variants, but will not be sufficient for their success since appropriate analytical methods are also needed. We consider data analysis approaches to testing associations between a phenotype and collections of rare variants in a defined genomic region or set of regions. Ultimately, although a wide variety of analytical approaches exist, more work is needed to refine them and determine their properties and power in different contexts.
To evaluate the characteristics of direct-to-consumer (DTC) genomic test consumers who spontaneously shared their test results with their health care provider.
Utilizing data from the Scripps Genomic Health Initiative we compared demographic, behavioral, and attitudinal characteristics of DTC genomic test consumers who shared their results with their physician or health care provider versus those who did not share. We also compared genomic risk estimates between the two groups.
Of 2024 individuals assessed at approximately 6 months post-testing, a total of 540 individuals (26.5%) reported sharing their results with their physician or health care provider. Those who shared were older (p<.001), had a higher income (p=.01), were more likely to be married (p=.005), and more likely to identify with a religion (p=.004). As assessed prior to undergoing testing, sharers also showed higher exercise (p=.003) and lower fat intake (p=.02), and expressed fewer overall concerns about testing (p=.001) and fewer concerns related to the privacy of their genomic information (p=.03). The genomic disease risk estimates disclosed were not associated with sharing.
In a DTC genomic testing context, physicians and other health care providers may be more likely to encounter patients who are more health conscious and have fewer concerns about the privacy of their genomic information. Genomic risk itself does not appear to be a primary determinant of sharing behavior among consumers.
direct-to-consumer; genomic testing; genetic risk assessment; disclosure of genetic results; consumer characteristics; personalized medicine
Individuals with anorexia nervosa (AN) and bulimia nervosa (BN) have alterations of measures of serotonin (5-HT) and dopamine (DA) function, which persist after long-term recovery and are associated with elevated harm avoidance (HA), a measure of anxiety and behavioral inhibition.
Based on theories that 5-HT is an aversive motivational system that may oppose a DA-related appetitive system, we explored interactions of positron emission tomography (PET) radioligand measures that reflect portions of these systems.
Twenty-seven individuals recovered (REC) from eating disorders (EDs) (7 AN-BN, 11 AN, 9 BN) and 9 control women (CW) were analyzed for correlations between [11C]McN5652 and [11C]raclopride binding.
There was a positive correlation between [11C]McN5652 binding potential BPnon displaceable(ND)) and [11C]raclopride BPND for the dorsal caudate (r(27) = .62; p < .001), antero-ventral striatum (r(27) = .55, p = .003), middle caudate (r(27) = .68; p < .001), ventral (r(27) = .64; p < .001) and dorsal putamen (r(27) = .42; p = .03). No significant correlations were found in CW. [11C]raclopride BPND, but not [11C]McN5652 BPND, was significantly related to HA in REC EDs. A linear regression analysis showed that the interaction between [11C]McN5652 BPND and [11C]raclopride BPND in the dorsal putamen significantly (b = 140.04; t (22) = 2.21; p = .04) predicted HA.
This is the first study using PET and the radioligands [11C]McN5652 and [11C]raclopride to show a direct relationship between 5-HT transporter and striatal DA D2/D3 receptor binding in humans, supporting the possibility that 5-HT and DA interactions contribute to HA behaviors in EDs.
anorexia nervosa; bulimia nervosa; positron emission tomography; dopamine; serotonin; harm avoidance
The determination of the ancestry and genetic backgrounds of the subjects in genetic and general epidemiology studies is a crucial component in the analysis of relevant outcomes or associations. Although there are many methods for differentiating ancestral subgroups among individuals based on genetic markers only a few of these methods provide actual estimates of the fraction of an individual’s genome that is likely to be associated with different ancestral populations. We propose a method for assigning ancestry that works in stages to refine estimates of ancestral population contributions to individual genomes. The method leverages genotype data in the public domain obtained from individuals with known ancestries. Although we showcase the method in the assessment of ancestral genome proportions leveraging largely continental populations, the strategy can be used for assessing within-continent or more subtle ancestral origins with the appropriate data.
genetic ancestry; admixture; population genetics; admixture proportions
Written and verbal language are neurobehavioral traits vital to the development of communication skills. Unfortunately, disorders involving these traits—specifically reading disability (RD) and language impairment (LI)—are common and prevent affected individuals from developing adequate communication skills, leaving them at risk for adverse academic, socioeconomic, and psychiatric outcomes. Both RD and LI are complex traits that frequently co-occur, leading us to hypothesize that these disorders share genetic etiologies. To test this, we performed a genome wide association study on individuals affected with both RD and LI in the Avon Longitudinal Study of Parents and Children. The strongest associations were seen with markers in ZNF385D (OR=1.81, p=5.45 × 10−7) and COL4A2 (OR=1.71, p=7.59×10−7). Markers within NDST4 showed the strongest associations with LI individually (OR=1.827, p=1.40×10−7). We replicated association of ZNF385D using receptive vocabulary measures in the Pediatric Imaging Neurocognitive Genetics study (p=0.00245). We then used diffusion tensor imaging fiber tract volume data on 16 fiber tracts to examine the implications of replicated markers. ZNF385D was a predictor of overall fiber tract volumes in both hemispheres, as well as global brain volume. Here, we present evidence for ZNF385D as a candidate gene for RD and LI. The implication of transcription factor ZNF385D in RD and LI underscores the importance of transcriptional regulation in the development of higher order neurocognitive traits. Further study is necessary to discern target genes of ZNF385D and how it functions within neural development of fluent language.
ALSPAC; Language Impairment; Reading Disability; Dyslexia GWAS; ZNF385D; PING
The ongoing controversy surrounding direct-to-consumer (DTC) personal genomic tests intensified last year when the U.S. Government Accountability Office (GAO) released results of an undercover investigation of four companies that offer such testing. Among their findings, they reported that some of their donors received DNA-based predictions that conflicted with their actual medical histories. We aimed to more rigorously evaluate the relationship between DTC genomic risk estimates and self-reported disease by leveraging data from the Scripps Genomic Health Initiative (SGHI). We prospectively collected self-reported personal and family health history data for 3,416 individuals who went on to purchase a commercially available DTC genomic test. For 5 out of 15 total conditions studied, we found that risk estimates from the test were significantly associated with self-reported family and/or personal health history. The 5 conditions, included Graves’ disease, Type 2 Diabetes, Lupus, Alzheimer’s disease, and Restless Leg Syndrome. To further investigate these findings, we ranked each of the 15 conditions based on published heritability estimates and conducted post-hoc power analyses based on the number of individuals in our sample who reported significant histories of each condition. We found that high heritability, coupled with high prevalence in our sample and thus adequate statistical power, explained the pattern of associations observed. Our study represents one of the first evaluations of the relationship between risk estimates from a commercially available DTC personal genomic test and self-reported health histories in the consumers of that test.
direct-to-consumer; genetic testing; genetic risk estimates; clinical validity; consumer genomics
There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.
clinical sequencing; congenital disease; whole genome sequencing; population genetics
Multivariate distance matrix regression (MDMR) analysis is a statistical technique that allows researchers to relate P variables to an additional M factors collected on N individuals, where P ≫ N. The technique can be applied to a number of research settings involving high-dimensional data types such as DNA sequence data, gene expression microarray data, and imaging data. MDMR analysis involves computing the distance between all pairs of individuals with respect to P variables of interest and constructing an N × N matrix whose elements reflect these distances. Permutation tests can be used to test linear hypotheses that consider whether or not the M additional factors collected on the individuals can explain variation in the observed distances between and among the N individuals as reflected in the matrix. Despite its appeal and utility, properties of the statistics used in MDMR analysis have not been explored in detail. In this paper we consider the level accuracy and power of MDMR analysis assuming different distance measures and analysis settings. We also describe the utility of MDMR analysis in assessing hypotheses about the appropriate number of clusters arising from a cluster analysis.
regression analysis; multivariate analysis; distance matrix; simulation
The antisaccade task is a widely used technique to measure failure of inhibition, an important cause of cognitive and clinical abnormalities found in schizophrenia. Although antisaccade performance, which reflects the ability to inhibit prepotent responses, is a putative schizophrenia endophenotype, researchers have not consistently reported the expected differences between first-degree relatives and comparison groups. Schizophrenia participants(n=219) from the large Consortium on the Genetics of Schizophrenia (COGS) sample (n=1078) demonstrated significant deficits on an overlap version of the antisaccade task compared to their first-degree relatives (n=443) and community comparison subjects (CCS; n=416). Although mean antisaccade performance of first-degree relatives was intermediate between schizophrenia participants and CCS, a linear mixed-effects model adjusting for group, site, age, and gender found no significant performance differences between the first-degree relatives and CCS. However, admixture analyses showed that two components best explained the distributions in all three groups, suggesting two distinct doses of an etiological factor. Given the significant heritability of antisaccade performance, the effects of a genetic polymorphism is one possible explanation of our results.
Oculomotor; Endophenotype; Antisaccade; Schizophrenia; Family
Human skull and brain morphology are strongly influenced by genetic factors, and skull size and shape vary worldwide. However, the relationship between specific brain morphology and genetically-determined ancestry is largely unknown.
We used two independent data sets to characterize variation in skull and brain morphology among individuals of European ancestry. The first data set is a historical sample of 1,170 male skulls with 37 shape measurements drawn from 27 European populations. The second data set includes 626 North American individuals of European ancestry participating in the Alzheimer's Disease Neuroimaging Initiative (ADNI) with magnetic resonance imaging, height and weight, neurological diagnosis, and genome-wide single nucleotide polymorphism (SNP) data.
We found that both skull and brain morphological variation exhibit a population-genetic fingerprint among individuals of European ancestry. This fingerprint shows a Northwest to Southeast gradient, is independent of body size, and involves frontotemporal cortical regions.
Our findings are consistent with prior evidence for gene flow in Europe due to historical population movements and indicate that genetic background should be considered in studies seeking to identify genes involved in human cortical development and neuropsychiatric disease.
Biological anthropology; Cortex; Craniometry; Genetic drift; Imaging genomics; Neuroimaging; Population genetics
Hypertension is a common hereditary syndrome with unclear pathogenesis. Chromogranin A (Chga), which catalyzes formation and cargo storage of regulated secretory granules in neuroendocrine cells, contributes to blood pressure homeostasis centrally and peripherally. Elevated Chga occurs in spontaneously hypertensive rat (SHR) adrenal glands and plasma, but central expression is unexplored. In this report, we measured SHR and Wistar–Kyoto rat (control) Chga expression in central and peripheral nervous systems, and found Chga protein to be decreased in the SHR brainstem, yet increased in the adrenal and the plasma. By re-sequencing, we systematically identified five promoter, two coding and one 3′-untranslated region (3′-UTR) polymorphism at the SHR (versus WKY or BN) Chga locus. Using HXB/BXH recombinant inbred (RI) strain linkage and correlations, we demonstrated genetic determination of Chga expression in SHR, including a cis-quantitative trait loci (QTLs) (i.e. at the Chga locus), and such expression influenced biochemical determinants of blood pressure, including a cascade of catecholamine biosynthetic enzymes, catecholamines themselves and steroids. Luciferase reporter assays demonstrated that the 3′-UTR polymorphism (which disrupts a microRNA miR-22 motif) and promoter polymorphisms altered gene expression consistent with the decline in SHR central Chga expression. Coding region polymorphisms did not account for changes in Chga expression or function. Thus, we hypothesized that the 3′-UTR and promoter mutations lead to dysregulation (diminution) of Chga in brainstem cardiovascular control nuclei, ultimately contributing to the pathogenesis of hypertension in SHR. Accordingly, we demonstrated that in vivo administration of miR-22 antagomir to SHR causes substantial (∼18 mmHg) reductions in blood pressure, opening a novel therapeutic avenue for hypertension.
Cortical thickness is a highly heritable structural brain measurement and reduced thickness has been associated with both schizophrenia and bipolar disorder as well as decreased cognitive performance among healthy controls. Identifying genes that contribute to variation in cortical thickness provides a path to elucidate some of the biological mechanisms underlying these diseases as well as general cognitive abilities.
To identify common genetic variants that affect cortical thickness in schizophrenia, bipolar disorder, and controls and secondarily to test these variants for association with cognitive performance.
597,198 single nucleotide polymorphisms (SNPs) were tested for association with average cortical thickness in a genome-wide association study (GWAS). Significantly associated SNPs were tested for their affect on several measures of cognitive performance.
Four major hospitals in Oslo, Norway.
The GWAS included controls (n = 181) and individuals with DSM-IV diagnosed schizophrenia spectrum disorder (n = 94), bipolar spectrum disorder (n = 97), and other psychotic and affective disorders (n = 49). The follow-up cognitive study included an additional 622 cases and controls.
Main Outcome Measures
Cortical thickness measured with magnetic resonance imaging and cognitive performance as assessed by several neuropsychological tests.
Two closely linked genetic variants (rs4906844 and rs11633924) within the Prader-Willi/Angelman syndrome region on chromosome 15q12 showed genome-wide significant association (p = 1.08 × 10−8) with average cortical thickness as well as modest association with cognitive performance (p = 0.028) specifically among subjects diagnosed with schizophrenia.
This is the first GWAS to identify a common genetic variant that contributes to the heritable reduction of cortical thickness in schizophrenia. These results highlight the utility of cortical thickness as an intermediate phenotype for neuropsychiatric diseases. Future independent replication studies are required to confirm these findings.
African-American (AA) women have earlier menarche on average than women of European ancestry (EA), and earlier menarche is a risk factor for obesity and type 2 diabetes among other chronic diseases. Identification of common genetic variants associated with age at menarche has a potential value in pointing to the genetic pathways underlying chronic disease risk, yet comprehensive genome-wide studies of age at menarche are lacking for AA women. In this study, we tested the genome-wide association of self-reported age at menarche with common single-nucleotide polymorphisms (SNPs) in a total of 18 089 AA women in 15 studies using an additive genetic linear regression model, adjusting for year of birth and population stratification, followed by inverse-variance weighted meta-analysis (Stage 1). Top meta-analysis results were then tested in an independent sample of 2850 women (Stage 2). First, while no SNP passed the pre-specified P < 5 × 10−8 threshold for significance in Stage 1, suggestive associations were found for variants near FLRT2 and PIK3R1, and conditional analysis identified two independent SNPs (rs339978 and rs980000) in or near RORA, strengthening the support for this suggestive locus identified in EA women. Secondly, an investigation of SNPs in 42 previously identified menarche loci in EA women demonstrated that 25 (60%) of them contained variants significantly associated with menarche in AA women. The findings provide the first evidence of cross-ethnic generalization of menarche loci identified to date, and suggest a number of novel biological links to menarche timing in AA women.
The enormous advances in genetics and genomics of the past decade have the potential to revolutionize health care, including mental health care, and bring about a system predominantly characterized by the practice of genomic and personalized medicine. We briefly review the history of genetics and genomics and present heritability estimates for major chronic diseases of aging and neuropsychiatric disorders. We then assess the extent to which the results of genetic and genomic studies are currently being leveraged clinically for disease treatment and prevention and identify priority research areas in which further work is needed. Pharmacogenomics has emerged as one area of genomics that already has had notable impacts on disease treatment and the practice of medicine. Little evidence, however, for the clinical validity and utility of predictive testing based on genomic information is available, and thus has, to some extent, hindered broader-scale preventive efforts for common, complex diseases. Furthermore, although other disease areas have had greater success in identifying genetic factors responsible for various conditions, progress in identifying the genetic basis of neuropsychiatric diseases has lagged behind. We review social, economic, and policy issues relevant to genomic medicine, and find that a new model of health care based on proactive and preventive health planning and individualized treatment will require major advances in health care policy and administration. Specifically, incentives for relevant stakeholders are critical, as are realignment of incentives and education initiatives for physicians, and updates to pertinent legislation. Moreover, the translational behavioral and public health research necessary for fully integrating genomics into health care is lacking, and further work in these areas is needed. In short, while the pace of advances in genetic and genomic science and technology has been rapid, more work is needed to fully realize the potential for impacting disease treatment and prevention generally, and mental health specifically.
genomics; genetic testing; genetic risk assessment; public health genomics; pharmacogenomics
Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples, or novel expression assays are used. It is therefore important to assess the validity and impact of the assumptions built in to preprocessing schemes for a dataset. We developed and assessed a data preprocessing strategy for use with the Illumina DASL-based gene expression assay with partially degraded postmortem prefrontal cortex samples. The samples were obtained from individuals with autism as part of an investigation of the pathogenic factors contributing to autism. Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarray-based gene expression values obtained with degraded brain samples. Key steps in the pipeline included outlier exclusion, data transformation and normalization, and batch effect and covariate corrections. Our goal was to produce a clean dataset for subsequent downstream differential expression analysis. We ultimately settled on available transformation and normalization algorithms in the R/Bioconductor package lumi based on an assessment of their use in various combinations. A log2-transformed, quantile-normalized, and batch and seizure-corrected procedure was likely the most appropriate for our data. We empirically tested different components of our proposed preprocessing strategy and believe that our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects can be applied to all studies, even those pursued with degraded samples.
gene expression; microarray; data preprocessing; quality control
The Marine Resiliency Study (MRS) is a prospective study of factors predictive of posttraumatic stress disorder (PTSD) among approximately 2,600 Marines in 4 battalions deployed to Iraq or Afghanistan. We describe the MRS design and predeployment participant characteristics. Starting in 2008, our research team conducted structured clinical interviews on Marine bases and collected data 4 times: at predeployment and at 1 week, 3 months, and 6 months postdeployment. Integrated with these data are medical and career histories from the Career History Archival Medical and Personnel System (CHAMPS) database. The CHAMPS database showed that 7.4% of the Marines enrolled in MRS had at least 1 mental health diagnosis. Of enrolled Marines, approximately half (51.3%) had prior deployments. We found a moderate positive relationship between deployment history and PTSD prevalence in these baseline data.
Plasmodium vivax infects a hundred million people annually and endangers 40% of the world's population. Unlike Plasmodium falciparum, P. vivax parasites can persist as a dormant stage in the liver, known as the hypnozoite, and these dormant forms can cause malaria relapses months or years after the initial mosquito bite. Here we analyze whole genome sequencing data from parasites in the blood of a patient who experienced consecutive P. vivax relapses over 33 months in a non-endemic country. By analyzing patterns of identity, read coverage, and the presence or absence of minor alleles in the initial polyclonal and subsequent monoclonal infections, we show that the parasites in the three infections are likely meiotic siblings. We infer that these siblings are descended from a single tetrad-like form that developed in the infecting mosquito midgut shortly after fertilization. In this natural cross we find the recombination rate for P. vivax to be 10 kb per centimorgan and we further observe areas of disequilibrium surrounding major drug resistance genes. Our data provide new strategies for studying multiclonal infections, which are common in all types of infectious diseases, and for distinguishing P. vivax relapses from reinfections in malaria endemic regions. This work provides a theoretical foundation for studies that aim to determine if new or existing drugs can provide a radical cure of P. vivax malaria.
Plasmodium vivax is capable of remaining dormant in the human liver for months to years after an initial infection, creating an asymptomatic human reservoir. This unique aspect of parasite biology makes eliminating P. vivax distinctly different from P. falciparum elimination, and yet very little is known about this dormant parasite stage. Lack of knowledge about the dormant liver stage prevents the creation of new drugs and public health interventions directed at P. vivax. In order to better understand this particular parasite stage, we used whole genome sequencing to analyze three sequential P. vivax infections, two of which could be definitively categorized as having arisen from dormant liver stages. Our whole genome sequencing data demonstrates that dormant liver stage parasites are closely related yet not, as had previously been postulated, identical. These data highlight the need for a new paradigm to investigate P. vivax dormant liver stages in order to design the next generation of P. vivax drugs and effective global health interventions.