Using the Phenologs tool to find MP terms that are overrepresented with the five mouse orthologs of human genes implicated in ASD, six such MP terms were identified (). The six MP terms are “abnormal social investigation”, “impaired coordination”, “abnormal behavior”, “abnormal cerebellar foliation”, “small cerebellum”, and “abnormal motor/capabilities/coordination movement.” Many of these same MP terms were found among MGI’s annotations of mouse models of autism confirming the validity of the Phenolog findings.
Phenologs of ASD-implicated genes
We then integrated these results with tools available in the Ontological Discovery Environment by taking the six MP terms from the Phenologs analysis and performing a Phenome Map analysis. This function creates a hierarchical tree that shows groupings of all genes annotated to these terms based on their annotations to different combinations of the input MP terms (). Genes associated with two or more MP terms were examined for association with ASD in both the MGI database and in the scientific literature. For example, the only gene with annotations to all six MP terms, En2
, was already annotated to autism in MGI. However, a mouse genotype containing Gabrb3
was not yet annotated to autism in MGI although we identified research clearly using it as a model () [20
]. This process led to the addition of three mutant alleles that have experimental evidence for a linkage with autism within MGI. These are Gabrb3
, and Nrcam
. One additional gene, Pten
, was also identified as a strong candidate but lacked an explicit statement from researchers saying this was an appropriate ASD model.
Of particular interest were genes from this analysis that lack mouse literature linking the gene to autism related phenotypes but whose human orthologs have been implicated in ASD. For example, mutations in the Rora
gene have been linked to abnormal coordination and abnormal cerebellum development in mice but had not been considered a model for autism simply because the human gene was not implicated in the disease. However, it was recently demonstrated that idiopathic autistic patients have decreased expression of the RORA protein due to differences in the methylation state of the gene suggestive that RORA plays a role in the disease phenotype [21
Intrigued by these findings, we compared the 426 genes implicated in autism by the CNV study to mouse genes that have similar functional and phenotypic similarity to known mouse ASD genes. We took advantage of two tools. The first was MouseNET. MouseNET creates a functional network based on an inputted list of genes. By entering into the MouseNET the eight genes associated with ASD in the MGI database (Cadps2, En2, Gabrb3, Gstm1, Nlgn3, Pten, Ehmt1, and Nrcam), a ranked list of genes from a functional network was generated. A ranked list containing 40 potential genes was identified by a scoring mechanism that incorporated the similarity between genes. Literature review of the genes identified by MouseNet found a subset of eight human genes associated with autism. These genes are CADPS, PAX3, DMD, MLL1, PKD1, AMPH, CACNA1A, and APC. However, the mouse orthologs for these genes were not recorded as models for autism due to a lack of experimental evidence. One additional gene, Unc5c was found to overlap with the CNV list.
We also analyzed the autism-implicated gene set with the ABBA tool to find similar genes based on functional and phenotypic similarity. ABBA incorporates data from the ODE database to generate a list of candidate genes with similar characteristics to an input set. To assess this approach, we first measured precision and recall against a true positive list of 133 genes definitively linked to people affected with ASD as determined by a consortium of ASD researchers [8
]. Precision and recall scores were calculated over a range of gene set overlap thresholds (). As expected, there is less recall and improved precision as stringency is increased. We compared these results to 1000 randomly drawn ranked gene lists and plotted the precision and recall at a 95% confidence level. Our ABBA analysis performs significantly better than chance on recall but not precision at very high thresholds, and for lower thresholds, consistently performs above chance. This analysis shows that our approach performs well despite scarcity in existing knowledge of genetic components of ASD. We expect as more genes are implicated in ASD and are added to the true-positive list, precision of our analysis will improve.
By inputting the eight autism-implicated genes into our ABBA analysis, 349 candidate genes were identified. Eleven orthologous genes was found to overlap with the CNV set including: Unc5c, Dsc3, Ghr, Cask, Fgfr3, Camk4, Chrna3, Anks1b, Chrnb4, Plcb4, and Thrb. Assuming 25,000 protein coding genes, the probability of having 11 or more orthologous genes overlap is p=0.0069. Investigation of the ABBA candidate genes within MGI showed 27 had neurological phenotypes (see ) especially mutant alleles to Unc5c and Plcb4 that expressed neurological morphology and behavioral phenotypes potentially indicative of ASD. Besides the CNV analysis, neither gene has been associated with ASD before.