Alzheimer's disease (AD) is the most common type of dementia, characterized by a severe form of memory loss and deterioration of other cognitive functions. It currently affects 30 million people worldwide, and this number is expected to quadruple by 2050.1
Great strides have been made in AD research to unveil genetic underpinnings and provide a foundation for a personal genomics solution to treat the disease. Preceding the findings of common variants in ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP by Hollingworth et al
in their recent genome-wide association study (GWAS, GERAD+) involving over 50 000 patients, other GWAS have identified a total of 19 genes encompassing single-nucleotide polymorphisms (SNPs) associated with an increased risk of developing AD.2
Additionally, 15 Online Mendelian Inheritance in Man (OMIM) genes showing single-gene inheritance associated with AD have been annotated (http://www.ncbi.nlm.nih.gov/omim
). Polymorphisms in the APOE gene are the most well documented genetic risk factors for developing early-onset AD.3
In particular, APOE4 accounts for 50% of cases and has been found to increase the relative risk for early-onset AD in line with its allelic prevalence—E4/E4 (14.9), E3/E4 (3.2), E2/E4 (2.6), and E2/E2 (0.6); conversely, APOE2 appears to have a protective effect against AD.5
With the accumulation of new genetic insights into AD, systems biology and systems medicine approaches are poised to derive new meanings from interactions among genetic variants.
Following these recent developments in AD GWAS, we identified a large void in the interpretation and integrative capability of higher scales of biology within these susceptibility loci. Namely, previous reductionist genetic approaches have not been able to sufficiently indicate AD as a complex disease. Indeed, the pathogenesis of sporadic AD has been widely attributed to both genetic and environmental factors, while pure autosomal dominant Mendelian transmittance accounts for a smaller proportion of cases (10%).7
While APOE, APP, PSEN1 and PSEN2 have been characterized as true deterministic genes, other genetic loci increase the risk of developing AD. Thus, being able to characterize the connectivity and directionality of the relationships between the underlying genetics and corresponding functions in high-throughput data may break boundaries between specialized silos of knowledge of gene functions and greatly enhance a holistic approach to understanding the disease. Furthermore, substantive efforts have yet to be made to investigate the functional overlap—of relevance to a personal genome—between AD genes showing classical Mendelian and complex modes of inheritance.
The presentation of the seminal evaluation of incorporating personal genome information into a modern clinical assessment by Ashley et al
demonstrated the powerful utility of integrating common polymorphisms to determine risk of disease within a single patient.9
However, the clinical relevance of the majority of ‘unique personal genetic variants’ remains unrecognized.9
Furthermore, aggregating the significance of these unique personal variants within a patient along with polymorphisms known through disparate modes of inheritance and non-genetic factors may provide key knowledge about an individual's risk and pathology of disease. For instance, despite the increased risk and higher OR of AD attributed to APOE4, many E4/E4 homozygotes live to old age with no indication for AD, and up to 50–75% of heterozygotes carrying one E3 allele never develop AD.6
However, numerous studies have established that Aβ deposition in the brain and poorer outcome in terms of neurodegenerative disease after head injury occur more commonly in individuals possessing an APOE4 allele.6
Therefore, the APOE4 variant represents a paradigmatic example of complex inheritance of AD in its intersection between genetic predisposition and a plausible environmental factor associated with the disease.
In traditional GWAS, biological function is inferred from a small set of sequence elements within loci, and they require multiple patients to establish a prediction. To date, no predictive methods have been applied to establish the association between a trait and unique personal variants in an uncharacterized gene or in uncharacterized polymorphisms of a gene harboring disease-associated polymorphisms. In the past, reverse genetic methods have predicted gene function from molecular similarity of sequence or structure, and could in theory be applied to prediction of unique personal genomic variants. Conversely, forward genomics examine higher systems properties of high-throughput data and subsequently zoom in on causal genetic roots. Such forward genomics techniques have also proven successful for arriving at new phenotype-associated variants in a variety of contexts.11
For instance, we have shown that the same systems properties of biological processes and molecular functions are consistently enriched among the top 1000 genes of independent adult-onset diabetes mellitus.13
Previously, our group has also shown that properties at the protein-interaction level are able to establish overlap between diseases and predict novel candidates involved in molecular mechanisms of disease.14
Additionally, Zhong et al
demonstrated that specific edgetic alleles (mutations responsible for specific protein-interaction patterns) are associated with certain Mendelian diseases, as compared with other edgetic or null alleles (mutations responsible for structural alteration and complete loss of protein interactions).16
Further, it has been shown by our group and others that disease trait similarity of complex diseases can be imputed from genetic variants.17
Here we demonstrate the relevance of edgetic properties of Mendelian and complex disease inheritance genes in AD, as well as integrate a forward genomics approach to arrive at new hypotheses for risk inheritance. Analyzing SNPs at the mechanism level of the gene also addresses the current limitation of GWAS described by Goldstein's group—SNPs may be markers of rare or unique personal variants, as opposed to the prevailing belief that they measure a nearby common variant with minor allele frequency >5% (consensus definition of a SNP).19
We thus hypothesized that the clinical significance of unique personal variants could be imputed with increased accuracy by triangulating three established approaches: forward genomics, reverse genetics, and computational biology modeling of systems and networks (online supplementary figure S1). Lee et al
and other groups have established the proof of concept for using Gene Ontology (GO) similarity and protein interactions to prioritize genes associated with a disease.20–22
Yet, none of these studies have investigated similarity between disease traits or similarity between polymorphisms according to their mode of inheritance (single gene vs complex). Here we analyze 40 genes shown to be associated with AD, using text-mining techniques to characterize mechanistic commonalities and inter-relationships between GO biological processes (GO-BP) enriched within SNPs from OMIM and confirmed in GWAS (online supplementary tables S1 and S2). We have previously demonstrated the feasibility and utility of a novel information theory-based method for predicting protein functions and building disease–disease networks by exploiting the semantic similarity of GO terms among host genes of validated trait-associated SNPs.13
We apply this forward genomic method of GO term enrichment and scoring of AD SNPs based on information theory semantic similarity (ITSS) scores which we developed23
in order to construct the functional space of AD polymorphism host genes. We further constrain our predictive space by examining the node and edgetic significances within protein-interaction networks (PINs) and domain–domain interactions of AD genes and canonical pathways. Taken together, our mechanism-guided approach to integrating intermediate phenotypes derived from forward genomics lays a foundation for translating unique personal variants into other established networks used for drug repositioning ().
Figure 1 Scalar protein analysis of domains enriched in genetics (SPADE-gen): changing the paradigm of drug repositioning for complex diseases with genetically anchored biological mechanisms. The diagram shows the stepwise process by which the proposed SPADE-gen (more ...)