A rapidly increasing proportion of human genetic diseases are thought to arise from copy number variations (CNVs), defined as >1 kb duplicated or deleted stretches of DNA (1
). Phenotypically, some of these disorders manifest as multiple congenital anomalies, which generally include developmental delays (DDs) along with variable secondary features such as cardiac defects and cranio-facial differences (2
). Children with DD fail to achieve normal developmental milestones, both physical and intellectual, in early childhood (<5 years), and often have impaired motor function, cognitive ability and/or language skills. Delayed or impaired neurological development frequently leads to learning disabilities (LD; also termed mental retardation).
In the majority of DD cases, the identities of causative genes, however, remain unknown, particularly for large CNVs encompassing many genes. In individual cases, a clinical geneticist may highlight an excellent candidate gene for DD on the basis of prior experience and by sampling the available literature. This process, however, is inevitably subjective and time-consuming, and it necessarily rests on the completeness, availability and easy accessibility of a rapidly increasing corpus of knowledge. Such a process will also fail to discover molecular pathways or processes whose disruption has not been reported previously as being associated with DD. Accurate definition of disease-relevant pathways or processes, however, remains far from straightforward, as the available electronic pathway resources, including the Gene Ontology (6
) (GO) and Kyoto Encyclopedia of Genes and Genomes (7
) (KEGG) do not capture the true complexity of disease-relevant biological pathways or processes. The identification of DD-relevant genes is further complicated by the presence of large numbers of CNVs in the general, apparently healthy, population (8
). If, however, it is assumed that such variants do not contribute to the pathoetiology of developmental conditions, then their genes can be excluded when seeking disease-relevant genes.
Our goal in this study was to obtain evidence, using a robust statistical approach, for the causative element(s) underlying each patient's clinical presentation. More specifically, we sought to identify disruptive genetic changes among a large cohort of 87 individuals, providing statistical genetic evidence not only for their DD presentations, but also for their additional phenotypes, such as behavior or eye abnormalities. To identify genes and biological processes that underlie these patients’ phenotypes, we turned to an experimental resource which is orthogonal to, and likely more relevant than, electronic molecular pathways. This is a set of 5329 defined phenotypes associated with 5011 genes disrupted in mouse models that have been organized in a phenotype ontology (11
). We hypothesized that each disease-causative CNV region will harbor one or more gene(s) whose mouse ortholog, when disrupted, results in a phenotype that corresponds to that of the human disease under investigation. Furthermore, if sufficient CNV regions with similar disease associations were to be known, particularly those containing relatively few genes, then it might be possible to detect within these regions significant enrichments of genes whose orthologs, when disrupted, result in particular mouse phenotypes that are relevant to that disease. This approach seeks significant associations between patients’ genotypes and what we term ‘model phenotypes’ observed for the knockout models of orthologous mouse genes. Together with their associated mouse knockouts, these model phenotypes are available to provide useful insights into the molecular and cellular pathoetiology of disease. If this strategy is to be successful, then it must control the rate of false discovery associations that inevitably accrue from the large number of statistical tests―one for each phenotype―that are being applied.
In a previous study, we applied a similar but more primitive approach to 148 de novo
CNV intervals from LD individuals (12
). Among over 200 diverse nervous system phenotypes that were investigated, we identified two mouse model phenotypes that were significantly over-represented with a low false discovery rate (FDR) <5%. Each of these model phenotypes, abnormal axon morphology
and abnormal dopaminergic neuron morphology
, is of particular relevance to human LD phenotypes. We were also able to demonstrate significant associations between human and model phenotypes for additional clinical features other than LD that were apparent from this patient population (12
We considered it important to develop our novel methodology further and to apply it more widely to determine whether it provides insights into other patient datasets. We sought to investigate whether the method is effective in highlighting candidate genes and biological processes in seemingly heterogeneous syndromes, which present the greatest challenges for this, and other functional enrichment, approaches. Thus, we wished to know (i) whether the candidate causative elements identified among these large, multigenic CNVs would indicate a single biological process as explaining a shared DD phenotype, or else would stratify the cohort on the basis of different biological processes which would be suggestive of the disorder being heterogeneous in etiology; (ii) whether multiple elements within each patient's CNV(s) contribute additively or multiplicatively to the disorder; (iii) whether functional elements contribute to both primary and secondary features of a patient or whether pleiotropy is indicated; (iv) whether there is a qualitative or a quantitative difference among the functional elements identified for either Gain or Loss CNVs; and finally, (v) whether DD and LD are associated with comparable model phenotypes.
By applying an extended mouse phenotype method to a set of 98 CNVs observed in individuals with DD, we identified significant associations of model phenotypes with CNV genes which subsequently allowed a set of commonly disrupted biological processes to be identified and 103 candidate genes to be collated. These represent excellent candidates for genes whose copy number change contribute to DD and associated phenotypes. Model phenotypes thus provide valuable insights into the pathoetiology of DD for single individuals, and pinpoint genes whose copy number change likely underlies their, and other DD patients’, phenotypes.