The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations – changes specific to a tumor and not within an individual’s germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific.
We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity.
We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.
Cancer genomics; Next generation sequencing; Somatic mutation detection
We developed a generalized framework for multiplexed resequencing of targeted regions of the human genome on the Illumina Genome Analyzer using degenerate indexed DNA sequence barcodes ligated to fragmented DNA prior to sequencing. Using this method, the DNA of multiple HapMap individuals was simultaneously sequenced at several ENCODE (ENCyclopedia of DNA Elements) regions. We then evaluated the use of Bayes factors for discovering and genotyping polymorphisms from aligned sequenced reads. If we required that predicted polymorphisms be either previously identified by dbSNP or be visually evident upon reinspection of archived ENCODE traces, we observed a false-positive rate of 11.3% using strict thresholds (Ks>1,000) for predicting variants and 69.6% for lax thresholds (Ks>10). Conversely, false-negative rates ranged from 10.8% to 90.8%, with those at stricter cut-offs occurring at lower coverage (< 10 aligned reads). These results suggest that >90% of genetic variants are discoverable using multiplexed sequencing provided sufficient coverage at the polymorphic base.
Pancreatic adenocarcinoma (PAC) is among the most lethal malignancies. While research has implicated multiple genes in disease pathogenesis, identification of therapeutic leads has been difficult and the majority of currently available therapies provide only marginal benefit. To address this issue, our goal was to genomically characterize individual PAC patients to understand the range of aberrations that are occurring in each tumor. Because our understanding of PAC tumorigenesis is limited, evaluation of separate cases may reveal aberrations, that are less common but may provide relevant information on the disease, or that may represent viable therapeutic targets for the patient. We used next generation sequencing to assess global somatic events across 3 PAC patients to characterize each patient and to identify potential targets. This study is the first to report whole genome sequencing (WGS) findings in paired tumor/normal samples collected from 3 separate PAC patients. We generated on average 132 billion mappable bases across all patients using WGS, and identified 142 somatic coding events including point mutations, insertion/deletions, and chromosomal copy number variants. We did not identify any significant somatic translocation events. We also performed RNA sequencing on 2 of these patients' tumors for which tumor RNA was available to evaluate expression changes that may be associated with somatic events, and generated over 100 million mapped reads for each patient. We further performed pathway analysis of all sequencing data to identify processes that may be the most heavily impacted from somatic and expression alterations. As expected, the KRAS signaling pathway was the most heavily impacted pathway (P<0.05), along with tumor-stroma interactions and tumor suppressive pathways. While sequencing of more patients is needed, the high resolution genomic and transcriptomic information we have acquired here provides valuable information on the molecular composition of PAC and helps to establish a foundation for improved therapeutic selection.
Human induced pluripotent stem cells (iPSCs) have become an intriguing approach for neurological disease modeling, because neural lineage-specific cell types that retain the donors' complex genetics can be established in vitro. The statistical power of these iPSC-based models, however, is dependent on accurate diagnoses of the somatic cell donors; unfortunately, many neurodegenerative diseases are commonly misdiagnosed in live human subjects. Postmortem histopathological examination of a donor's brain, combined with premortem clinical criteria, is often the most robust approach to correctly classify an individual as a disease-specific case or unaffected control. In this study, we describe iPSCs generated from a skin biopsy collected postmortem during the rapid autopsy of a 75-year-old male, whole body donor, defined as an unaffected neurological control by both clinical and histopathological criteria. These iPSCs were established in a feeder-free system by lentiviral transduction of the Yamanaka factors, Oct3/4, Sox2, Klf4, and c-Myc. Selected iPSC clones expressed both nuclear and surface antigens recognized as pluripotency markers of human embryonic stem cells (hESCs) and were able to differentiate in vitro into neurons and glia. Statistical analysis also demonstrated that fibroblast proliferation was significantly affected by biopsy site, but not donor age (within an elderly cohort). These results provide evidence that autopsy donor-derived fibroblasts can be successfully reprogrammed into iPSCs, and may provide an advantageous approach for generating iPSC-based neurological disease models.
induced pluripotent stem cells; genetic disease models; diagnostics; neurodegenerative diseases; postmortem; autopsy; neural differentiation
Recent advances in the treatment of cancer have focused on targeting genomic aberrations with selective therapeutic agents. In rare tumors, where large-scale clinical trials are daunting, this targeted genomic approach offers a new perspective and hope for improved treatments. Cancers of the ampulla of Vater are rare tumors that comprise only about 0.2% of gastrointestinal cancers. Consequently, they are often treated as either distal common bile duct or pancreatic cancers.
We analyzed DNA from a resected cancer of the ampulla of Vater and whole blood DNA from a 63 year-old man who underwent a pancreaticoduodenectomy by whole genome sequencing, achieving 37× and 40× coverage, respectively. We determined somatic mutations and structural alterations.
We identified relevant aberrations, including deleterious mutations of KRAS and SMAD4 as well as a homozygous focal deletion of the PTEN tumor suppressor gene. These findings suggest that these tumors have a distinct oncogenesis from either common bile duct cancer or pancreatic cancer. Furthermore, this combination of genomic aberrations suggests a therapeutic context for dual mTOR/PI3K inhibition.
Whole genome sequencing can elucidate an oncogenic context and expose potential therapeutic vulnerabilities in rare cancers.
Access to genetic data across studies is an important aspect of identifying new genetic associations through genome-wide association studies (GWAS). Meta-analysis across multiple GWAS with combined cohort sizes of tens of thousands of individuals often uncovers many more genome-wide associated loci than the original individual studies, which emphasizes the importance of tools and mechanisms for data sharing. However, even sharing summary-level data, such as allele frequencies, inherently carries some degree of privacy risk to study participants. Here we discuss mechanisms and resources for sharing data from GWAS, particularly focusing on approaches for assessing and quantifying privacy risks to participants from sharing of summary-level data.
Amyloid imaging with [11 C]Pittsburgh Compound-B (PiB) provides in vivo data on plaque deposition in those with, or at risk for, Alzheimer’s disease (AD). We performed a gene-based association analysis of 15 quality-controlled amyloid-pathway associated candidate genes in 103 Alzheimer’s Disease Neuroimaging Initiative participants. The mean normalized PiB uptake value across four brain regions known to have amyloid deposition in AD was used as a quantitative phenotype. The minor allele of an intronic SNP within DHCR24 was identified and associated with a lower average PiB uptake. Further investigation at whole-brain voxel-wise level indicated that non-carriers of the minor allele had higher PiB uptake in frontal regions compared to carriers. DHCR24 has been previously shown to confer resistance against beta-amyloid and oxidative stress-induced apoptosis, thus our findings support a neuroprotective role. Pathway-based genetic analysis of targeted molecular imaging phenotypes appears promising to help elucidate disease pathophysiology and identify potential therapeutic targets.
Alzheimer’s disease; ADNI; Pathway-based gene analysis; PiB-PET; Endophenotype; Voxel-based analysis
Copy number variants (CNVs) are DNA sequence alterations, resulting in gains (duplications) and losses (deletions) of genomic segments. They often overlap genes and may play important roles in disease. Only one published study has examined CNVs in late-onset Alzheimer's disease (AD), and none have examined mild cognitive impairment (MCI). CNV calls were generated in 288 AD, 183 MCI, and 184 healthy control (HC) non-Hispanic Caucasian Alzheimer's Disease Neuroimaging Initiative participants. After quality control, 222 AD, 136 MCI, and 143 HC participants were entered into case/control association analyses, including candidate gene and whole genome approaches. Although no excess CNV burden was observed in cases (AD and/or MCI) relative to controls (HC), gene-based analyses revealed CNVs overlapping the candidate gene CHRFAM7A, as well as CSMD1, SLC35F2, HNRNPCL1, NRXN1, and ERBB4 regions, only in cases. Replication in larger samples is important, after which regions detected here may be promising targets for resequencing.
We recently reported evidence for an association between the individual variation in normal human episodic memory and a common variant of the KIBRA gene, KIBRA rs17070145 (T-allele). Since memory impairment is a cardinal clinical feature of Alzheimer’s disease (AD), we investigated the possibility of an association between the KIBRA gene and AD using data from neuronal gene expression, brain imaging studies, and genetic association tests. KIBRA was significantly over-expressed and 3 of its 4 known binding partners under-expressed in AD-affected hippocampal, posterior cingulate and temporal cortex regions (p<0.010, corrected) in a study of laser capture microdissected neurons. Using positron emission tomography in a cohort of cognitively normal, late-middle-aged persons genotyped for KIBRA rs17070145, KIBRA T non-carriers exhibited lower glucose metabolism than did carriers in posterior cingulate and precuneus brain regions (P<0.001, uncorrected). Lastly, non-carriers of the KIBRA rs17070145 T-allele had increased risk of late-onset AD in an association study of 702 neuropathologically verified expired subjects (p=0.034; OR=1.29) and in a combined analysis of 1026 additional living and expired subjects (p=0.039; OR=1.26). Our findings suggest that KIBRA is associated with both individual variation in normal episodic memory and predisposition to AD.
genetics; imaging; expression profiling; memory
Family history is a significant risk factor for prostate cancer, although the molecular basis for this association is poorly understood. Linkage studies have implicated chromosome 17q21-22 as a possible location of a prostate-cancer susceptibility gene.
We screened more than 200 genes in the 17q21-22 region by sequencing germline DNA from 94 unrelated patients with prostate cancer from families selected for linkage to the candidate region. We tested family members, additional case subjects, and control subjects to characterize the frequency of the identified mutations.
Probands from four families were discovered to have a rare but recurrent mutation (G84E) in HOXB13 (rs138213197), a homeobox transcription factor gene that is important in prostate development. All 18 men with prostate cancer and available DNA in these four families carried the mutation. The carrier rate of the G84E mutation was increased by a factor of approximately 20 in 5083 unrelated subjects of European descent who had prostate cancer, with the mutation found in 72 subjects (1.4%), as compared with 1 in 1401 control subjects (0.1%) (P = 8.5×10−7). The mutation was significantly more common in men with early-onset, familial prostate cancer (3.1%) than in those with late-onset, nonfamilial prostate cancer (0.6%) (P = 2.0×10−6).
The novel HOXB13 G84E variant is associated with a significantly increased risk of hereditary prostate cancer. Although the variant accounts for a small fraction of all prostate cancers, this finding has implications for prostate-cancer risk assessment and may provide new mechanistic insights into this common cancer. (Funded by the National Institutes of Health and others.)
We tested whether telomere length is altered in the brains of patients diagnosed with major depression (MD), bipolar disorder (BD) and schizophrenia (SZ) by measuring mean telomere length (mTL) with real-time PCR. The samples are cerebellar gray matter from 46 SZ, 46 BP, and 15 MD patients, and 48 healthy controls. We found no difference in mTL between SZ and controls, BD and controls, MD and controls, or all cases and controls; no correlation between mTL and age was observed, either. This suggests that brain gray matter is unlikely to be related to the telomere length shortening reported in blood of psychiatric patients. White matter deserves further investigation as it has been reported to have a different mTL dynamic from gray matter. Since mTL has been reported to be a heritable quantitative trait, we also carried out genome-wide mapping of genetic factors for mTL, treating mTL as a quantitative trait. No association survived correction of multiple testing for the number of SNPs studied. The previously reported rs2630578 (BICD1) association was not replicated. This suggests that telomere length of cerebellar gray matter is determined by multiple loci with “weak effects.”
Mean telomere length; Bipolar disorder; Major depression; Schizophrenia; Mapping; Quantitative trait
A causal role of mutations in multiple general transcription factors in neurodevelopmental disorders including autism suggested that alterations in global levels of gene expression regulation might also relate to disease risk in sporadic cases of autism. This premise can be tested by evaluating for changes in the overall distribution of gene expression levels. For instance, in mice, variability in hippocampal-dependent behaviors was associated with variability in the pattern of the overall distribution of gene expression levels, as assessed by variance in the distribution of gene expression levels in the hippocampus. We hypothesized that a similar change in variance might be found in children with autism. Gene expression microarrays covering greater than 47,000 unique RNA transcripts were done on RNA from peripheral blood lymphocytes (PBL) of children with autism (n = 82) and controls (n = 64). Variance in the distribution of gene expression levels from each microarray was compared between groups of children. Also tested was whether a risk factor for autism, increased paternal age, was associated with variance. A decrease in the variance in the distribution of gene expression levels in PBL was associated with the diagnosis of autism and a risk factor for autism, increased paternal age. Traditional approaches to microarray analysis of gene expression suggested a possible mechanism for decreased variance in gene expression. Gene expression pathways involved in transcriptional regulation were down-regulated in the blood of children with autism and children of older fathers. Thus, results from global and gene specific approaches to studying microarray data were complimentary and supported the hypothesis that alterations at the global level of gene expression regulation are related to autism and increased paternal age. Global regulation of transcription, thus, represents a possible point of convergence for multiple etiologies of autism and other neurodevelopmental disorders.
A genome-wide, whole brain approach to investigate genetic effects on neuroimaging phenotypes for identifying quantitative trait loci is described. The Alzheimer's Disease Neuroimaging Initiative 1.5 T MRI and genetic dataset was investigated using voxel-based morphometry (VBM) and FreeSurfer parcellation followed by genome-wide association studies (GWAS). One hundred forty-two measures of grey matter (GM) density, volume, and cortical thickness were extracted from baseline scans. GWAS, using PLINK, were performed on each phenotype using quality-controlled genotype and scan data including 530,992 of 620,903 single nucleotide polymorphisms (SNPs) and 733 of 818 participants (175 AD, 354 amnestic mild cognitive impairment, MCI, and 204 healthy controls, HC). Hierarchical clustering and heat maps were used to analyze the GWAS results and associations are reported at two significance thresholds (p<10−7 and p<10−6). As expected, SNPs in the APOE and TOMM40 genes were confirmed as markers strongly associated with multiple brain regions. Other top SNPs were proximal to the EPHA4, TP63 and NXPH1 genes. Detailed image analyses of rs6463843 (flanking NXPH1) revealed reduced global and regional GM density across diagnostic groups in TT relative to GG homozygotes. Interaction analysis indicated that AD patients homozygous for the T allele showed differential vulnerability to right hippocampal GM density loss. NXPH1 codes for a protein implicated in promotion of adhesion between dendrites and axons, a key factor in synaptic integrity, the loss of which is a hallmark of AD. A genome-wide, whole brain search strategy has the potential to reveal novel candidate genes and loci warranting further investigation and replication.
The structure of the human brain is highly heritable, and is thought to be influenced by many common genetic variants, many of which are currently unknown. Recent advances in neuroimaging and genetics have allowed collection of both highly detailed structural brain scans and genome-wide genotype information. This wealth of information presents a new opportunity to find the genes influencing brain structure. Here we explore the relation between 448,293 single nucleotide polymorphisms in each of 31,622 voxels of the entire brain across 740 elderly subjects (mean age±s.d.: 75.52±6.82 years; 438 male) including subjects with Alzheimer's disease, Mild Cognitive Impairment, and healthy elderly controls from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We used tensor-based morphometry to measure individual differences in brain structure at the voxel level relative to a study-specific template based on healthy elderly subjects. We then conducted a genome-wide association at each voxel to identify genetic variants of interest. By studying only the most associated variant at each voxel, we developed a novel method to address the multiple comparisons problem and computational burden associated with the unprecedented amount of data. No variant survived the strict significance criterion, but several genes worthy of further exploration were identified, including CSMD2 and CADPS2. These genes have high relevance to brain structure. This is the first voxelwise genome wide association study to our knowledge, and offers a novel method to discover genetic influences on brain structure.
The role of the Alzheimer’s Disease Neuroimaging Initiative Genetics Core is to facilitate the investigation of genetic influences on disease onset and trajectory as reflected in structural, functional, and molecular imaging changes; fluid biomarkers; and cognitive status. Major goals include (1) blood sample processing, genotyping, and dissemination, (2) genome-wide association studies (GWAS) of longitudinal phenotypic data, and (3) providing a central resource, point of contact and planning group for genetics within Alzheimer’s Disease Neuroimaging Initiative. Genome-wide array data have been publicly released and updated, and several neuroimaging GWAS have recently been reported examining baseline magnetic resonance imaging measures as quantitative phenotypes. Other preliminary investigations include copy number variation in mild cognitive impairment and Alzheimer’s disease and GWAS of baseline cerebrospinal fluid biomarkers and longitudinal changes on magnetic resonance imaging. Blood collection for RNA studies is a new direction. Genetic studies of longitudinal phenotypes hold promise for elucidating disease mechanisms and risk, development of therapeutic strategies, and refining selection criteria for clinical trials.
Alzheimer’s Disease Neuroimaging Initiative (ADNI); Alzheimer’s disease; Mild cognitive impairment (MCI); Genome-wide association studies (GWAS); Copy number variation (CNV); Magnetic resonance imaging (MRI); Cerebrospinal fluid (CSF)
As a first step in analyzing high-throughput data in genome-wide studies, several algorithms are available to identify and prioritize candidates lists for downstream fine-mapping. The prioritized candidates could be differentially expressed genes, aberrations in comparative genomics hybridization studies, or single nucleotide polymorphisms (SNPs) in association studies. Different analysis algorithms are subject to various experimental artifacts and analytical features that lead to different candidate lists. However, little research has been carried out to theoretically quantify the consensus between different candidate lists and to compare the study specific accuracy of the analytical methods based on a known reference candidate list. Within the context of genome-wide studies, we propose a generic mathematical framework to statistically compare ranked lists of candidates from different algorithms with each other or, if available, with a reference candidate list. To cope with the growing need for intuitive visualization of high-throughput data in genome-wide studies, we describe a complementary customizable visualization tool. As a case study, we demonstrate application of our framework to the comparison and visualization of candidate lists generated in a DNA-pooling based genome-wide association study of CEPH data in the HapMap project, where prior knowledge from individual genotyping can be used to generate a true reference candidate list. The results provide a theoretical basis to compare the accuracy of various methods and to identify redundant methods, thus providing guidance for selecting the most suitable analysis method in genome-wide studies.
genome-wide association studies; candidate lists
We conducted genome-wide association studies of non-Hodgkin lymphoma using Illumina HumanHap550 BeadChips to identify subtype-specific associations in follicular, diffuse large B-cell and chronic lymphocytic leukemia/small lymphocytic lymphomas. We found that rs6457327 on 6p21.33 was associated with susceptibility to follicular lymphoma (FL, N=189 cases/592 controls) with validation in an additional 456 FL cases and 2,785 controls (combined allelic p-value=4.7×10−11). The region of strongest association overlaps C6orf15(STG), located near psoriasis susceptibility region 1(PSORS1).
We conducted a genome-wide association pooling study for cutaneous melanoma and performed validation in samples totalling 2019 cases and 2105 controls. Using pooling we identified a novel melanoma risk locus on chromosome 20 (rs910873, rs1885120), with replication in two further samples (combined P <1 × 10-15). The odds ratio is 1.75 (1.53, 2.01), with evidence for stronger association in early onset cases.
For late onset Alzheimer's disease (LOAD), the only confirmed, genetic association is with the apolipoprotein E (APOE) locus on chromosome 19. Meta-analysis is often employed to sort the true associations from the false positives. LOAD research has the advantage of a continuously updated meta-analysis of candidate gene association studies in the web-based AlzGene database. The top 30 AlzGene loci on May 1st, 2007 were investigated in our whole genome association data set consisting of 1411 LOAD cases and neuropathoiogicaiiy verified controls genotyped at 312,316 SNPs using the Affymetrix 500K Mapping Platform. Of the 30 “top AlzGenes", 32 SNPs in 24 genes had odds ratios (OR) whose 95% confidence intervals that did not include 1. Of these 32 SNPs, six were part of the Affymetrix 500K Mapping panel and another ten had proxies on the Affymetrix array that had >80% power to detect an association with α=0.001. Two of these 16 SNPs showed significant association with LOAD in our sample series. One was rs4420638 at the APOE locus (uncorrected p-value=4.58E-37) and the other was rs4293, located in the angiotensin converting enzyme (ACE) locus (uncorrected p-value=0.014). Since this result was nominally significant, but did not survive multiple testing correction for 16 independent tests, this association at rs4293 was verified in a geographically distinct German cohort (p-value=0.03). We present the results of our ACE replication aiongwith a discussion of the statistical limitations of multiple test corrections in whole genome studies.
Late-onset Alzheimer disease; single nucleotide polymorphism; genome-wide association study; meta-analysis; ACE
The apolipoprotein E (APOE) ε4 allele is the best established genetic risk factor for late-onset Alzheimer’s disease (LOAD). We conducted genome-wide surveys of 502,627 single-nucleotide polymorphisms (SNPs) to characterize and confirm other LOAD susceptibility genes. In ε4 carriers from neuropathologically verified discovery, neuropathologically verified replication, and clinically characterized replication cohorts of 1411 cases and controls, LOAD was associated with six SNPs from the GRB-associated binding protein 2 (GAB2) gene and a common haplotype encompassing the entire GAB2 gene. SNP rs2373115 (p = 9 × 10−11) was associated with an odds ratio of 4.06 (confidence interval 2.81–14.69), which interacts with APOE ε4 to further modify risk. GAB2 was overexpressed in pathologically vulnerable neurons; the Gab2 protein was detected in neurons, tangle-bearing neurons, and dystrophic neuritis; and interference with GAB2 gene expression increased tau phosphorylation. Our findings suggest that GAB2 modifies LOAD risk in APOE ε4 carriers and influences Alzheimer’s neuropathology.
Multiple sclerosis is a chronic inflammatory demyelinating disease of the central nervous system with an important genetic component and strongest association driven by the HLA genes. We performed a pooling-based genome-wide association study of 500,000 SNPs in order to find new loci associated with the disease. After applying several criteria, 320 SNPs were selected from the microarrays and individually genotyped in a first and independent Spanish Caucasian replication cohort. The 8 most significant SNPs validated in this cohort were also genotyped in a second US Caucasian replication cohort for confirmation. The most significant association was obtained for SNP rs3129934, which neighbors the HLA-DRB/DQA loci and validates our pooling-based strategy. The second strongest association signal was found for SNP rs1327328, which resides in an unannotated region of chromosome 13 but is in linkage disequilibrium with nearby functional elements that may play important roles in disease susceptibility. This region of chromosome 13 has not been previously identified in MS linkage genome screens and represents a novel risk locus for the disease.
High throughput microarray-based single nucleotide polymorphism (SNP) genotyping has revolutionized the way genome-wide linkage scans and association analyses are performed. One of the key features of the array-based GeneChip® Mapping 10K Array from Affymetrix is the automated SNP calling algorithm. The Affymetrix algorithm was trained on a database of ethnically diverse DNA samples to create SNP call zones that are used as static models to make genotype calls for experimental data. We describe here the implementation of clustering algorithms on large training datasets resulting in improved SNP call rates on the 10K GeneChip.
A database of 948 individuals genotyped on the GeneChip® Mapping 10K 2.0 Array was used to identify 822 SNPs that were called consistently less than 75% of the time. These SNPs represent on average 8.25% of the total SNPs on each chromosome with chromosome 19, the most gene-rich chromosome, containing the highest proportion of poor performers (18.7%). To remedy this, we created SNiPer, a new application which uses two clustering algorithms to yield increased call rates and equivalent concordance to Affymetrix called genotypes. We include a training set for these algorithms based on individual genotypes for 705 samples. SNiPer has the capability to be retrained for lab-specific training sets. SNiPer is freely available for download at .
The correct calling of poor performing SNPs may prove to be key in future linkage studies performed on the 10K GeneChip. It would prove particularly invaluable for those diseases that map to chromosome 19, known to contain a high proportion of poorly performing SNPs. Our results illustrate that SNiPer can be used to increase call rates on the 10K GeneChip® without sacrificing accuracy, thereby increasing the amount of valid data generated.
Pooling genomic DNA samples within clinical classes of disease followed by genotyping on whole-genome SNP microarrays, allows for rapid and inexpensive genome-wide association studies. Key to the success of these studies is the accuracy of the allelic frequency calculations, the ability to identify false-positives arising from assay variability and the ability to better resolve association signals through analysis of neighbouring SNPs.
We report the accuracy of allelic frequency measurements on pooled genomic DNA samples by comparing these measurements to the known allelic frequencies as determined by individual genotyping. We describe modifications to the calculation of k-correction factors from relative allele signal (RAS) values that remove biases and result in more accurate allelic frequency predictions. Our results show that the least accurate SNPs, those most likely to give false-positives in an association study, are identifiable by comparing their frequencies to both those from a known database of individual genotypes and those of the pooled replicates. In a disease with a previously identified genetic mutation, we demonstrate that one can identify the disease locus through the comparison of the predicted allelic frequencies in case and control pools. Furthermore, we demonstrate improved resolution of association signals using the mean of individual test-statistics for consecutive SNPs windowed across the genome. A database of k-correction factors for predicting allelic frequencies for each SNP, derived from several thousand individually genotyped samples, is provided. Lastly, a Perl script for calculating RAS values for the Affymetrix platform is provided.
Our results illustrate that pooling of DNA samples is an effective initial strategy to identify a genetic locus. However, it is important to eliminate inaccurate SNPs prior to analysis by comparing them to a database of individually genotyped samples as well as by comparing them to replicates of the pool. Lastly, detection of association signals can be improved by incorporating data from neighbouring SNPs.
Converging lines of evidence point to the existence of immune dysfunction in autism spectrum disorder (ASD), which could directly affect several key neurodevelopmental processes. Previous studies have shown higher cytokine levels in patients with autism compared with matched controls or subjects with other developmental disorders. In the current study, we used plasma-cytokine profiling for 25 discordant sibling pairs to evaluate whether these alterations occur within families with ASD.
Plasma-cytokine profiling was conducted using an array-based multiplex sandwich ELISA for simultaneous quantitative measurement of 40 unique targets. We also analyzed the correlations between cytokine levels and clinically relevant quantitative traits (Vineland Adaptive Behavior Scale in Autism (VABS) composite score, Social Responsiveness Scale (SRS) total T score, head circumference, and full intelligence quotient (IQ)). In addition, because of the high phenotypic heterogeneity of ASD, we defined four subgroups of subjects (those who were non-verbal, those with gastrointestinal issues, those with regressive autism, and those with a history of allergies), which encompass common and/or recurrent endophenotypes in ASD, and tested the cytokine levels in each group.
None of the measured parameters showed significant differences between children with ASD and their related typically developing siblings. However, specific target levels did correlate with quantitative clinical traits, and these were significantly different when the ASD subgroups were analyzed. It is notable that these differences seem to be attributable to a predisposing immunogenetic background, as no other significant differences were noticed between discordant sibling pairs. Interleukin-1β appears to be the cytokine most involved in quantitative traits and clinical subgroups of ASD.
In the present study, we found a lack of significant differences in plasma-cytokine levels between children with ASD and in their related non-autistic siblings. Thus, our results support the evidence that the immune profiles of children with autism do not differ from their typically developing siblings. However, the significant association of cytokine levels with the quantitative traits and the clinical subgroups analyzed suggests that altered immune responses may affect core feature of ASD.