|Home | About | Journals | Submit | Contact Us | Français|
To facilitate the identification of genes associated with cataract and other ocular defects, the authors developed and validated a computational tool termed iSyTE (integrated Systems Tool for Eye gene discovery; http://bioinformatics.udel.edu/Research/iSyTE). iSyTE uses a mouse embryonic lens gene expression data set as a bioinformatics filter to select candidate genes from human or mouse genomic regions implicated in disease and to prioritize them for further mutational and functional analyses.
Microarray gene expression profiles were obtained for microdissected embryonic mouse lens at three key developmental time points in the transition from the embryonic day (E)10.5 stage of lens placode invagination to E12.5 lens primary fiber cell differentiation. Differentially regulated genes were identified by in silico comparison of lens gene expression profiles with those of whole embryo body (WB) lacking ocular tissue.
Gene set analysis demonstrated that this strategy effectively removes highly expressed but nonspecific housekeeping genes from lens tissue expression profiles, allowing identification of less highly expressed lens disease–associated genes. Among 24 previously mapped human genomic intervals containing genes associated with isolated congenital cataract, the mutant gene is ranked within the top two iSyTE-selected candidates in approximately 88% of cases. Finally, in situ hybridization confirmed lens expression of several novel iSyTE-identified genes.
iSyTE is a publicly available Web resource that can be used to prioritize candidate genes within mapped genomic intervals associated with congenital cataract for further investigation. Extension of this approach to other ocular tissue components will facilitate eye disease gene discovery.
Even with the advent of high-throughput sequencing, the discovery of genes associated with congenital birth defects such as eye defects remains a challenge. We sought to develop a straightforward experimental approach that could facilitate the identification of candidate genes for developmental disorders, and, as proof-of-principle, we chose defects involving the ocular lens. Opacification of the lens results in cataract, a leading cause of blindness that affects 77 million persons and accounts for 48% of blindness worldwide.1 Cataracts can be classified as either congenital or age related, and can be expressed as either an isolated or a nonsyndromic phenotype or as part of a larger developmental syndrome.2–4 Approximately one quarter of congenital cataracts are inherited5; all three modes of Mendelian inheritance have been described, with autosomal dominant being the most common.2 Both linkage and mutational analyses of candidate genes have been successfully used to identify genetic causes of congenital cataracts; 24 loci are known to exist for isolated cataracts.2
The identification of genetic mutations, such as those implicated in cataract formation, traditionally follows an initial mapping step that involves linkage analysis or homozygosity mapping, followed by sequence analysis of candidate genes or genomic regions in patient DNA. A similar approach can identify mutant genes in model organisms such as mouse and zebrafish. Nonetheless, linkage and mutational analyses are cumbersome and often involve the exclusion of a large number of candidate genes by DNA sequence analysis before the correct gene is identified. Although the advent of next-generation sequencing makes it possible to rapidly identify a large number of potentially deleterious genetic variants within a sample, it often remains unclear how to identify the actual disease-associated mutation in a cost-effective manner without performing a large cohort case-control study. It is often the case that additional biological knowledge is necessary to resolve disease-producing genetic mutations from sequence variants that are unrelated to the phenotype of interest.
In the case of human developmental disorders, we hypothesized that knowledge of embryonic gene expression patterns, which are often conserved and readily accessible for the homologous mouse genes, could help assist in the identification of congenital birth defect genes in human. Here we describe a straightforward experimental and computational strategy to identify and prioritize candidate disease genes based on microarray gene expression profiles generated from embryonic mouse tissues. As an initial application, we applied this to cataract phenotypes. To make this tool broadly accessible, we concurrently developed a publicly available Web-based resource termed iSyTE (integrated Systems Tool for Eye gene discovery) that can efficiently prioritize candidate genes associated with human congenital cataract.
Mice were treated in accordance with protocols defined in the ARVO Statement for the Use of Animals in Ophthalmic and Vision Research. The Animal Care and Use Committee of Harvard Medical School (Boston, MA) approved all experimental protocols involving mice. Wild-type ICR mice were obtained from Taconic (Albany, NY) and were used for microarray and in situ hybridization analyses. Mice were housed in a 14-hour light/10-hour dark cycle; the morning of vaginal plug discovery was defined as embryonic day (E)0.5.
Total RNA was extracted from manually dissected mouse embryonic day 10.5, 11.5, and 12.5 lenses (approximately 200 lenses per E10.5 replicate, 150 lenses per E11.5 replicate, 100 lenses per E12.5 replicate) or from whole embryonic tissue minus the eye region at stages E10.5, E11.5, and E12.5 using an RNA purification kit (RNeasy Mini Kit; Qiagen, Valencia, CA). RNA from stage-matched whole embryonic tissue minus the eye region, which was removed by microdissection, was pooled in equimolar ratios, denoted the whole body (WB) control, and was processed in parallel. Microarray data from the WB control was later used to achieve in silico enrichment for lens-enriched genes (see Results). We first tested the purity of the dissected lens tissue by analyzing dissected lenses at these stages from P0-3.9-GFPCre reporter mice, in which the lens-specific GFP expression is driven by the Pax6 ectodermal enhancer within the 3.9-kb region upstream of the Pax6 P0 promoter.6 We then used wild-type in house timed pregnant ICR mice as a resource for collecting the lens tissues used for microarray analysis. Microarray analyses were performed in biological triplicate by hybridization to a microarray (Affymetrix Mouse 430 2.0 chip; Affymetrix, Santa Clara, CA) in the Biopolymers Facility at Harvard Medical School. Standard Affymetrix protocols were used to prepare cDNA and biotin-labeled cRNA using in vitro transcription. Quality of the total RNA was evaluated in a microfluidics-based platform (2100 Bioanalyzer; Agilent Technologies, Inc., Santa Clara, CA) before processing for cDNA preparation by RT-PCR. The cDNA was converted to biotinylated cRNA using modified nucleoside triphosphates in an in vitro transcription reaction. The labeled cRNA was hybridized to the chips for 16 hours and then washed and stained. The chip was irradiated at 488 nm (excitation) and scanned at 570 nm (emission). Raw probe intensities from all microarray profiles were preprocessed together using the robust multiarray average method,7 implemented in the affy package.8 If a gene was represented by multiple probe sets, we selected the probe set with the highest median expression across all samples to represent the expression of that gene. In this manner, all probe sets were collapsed into 20,460 genes, based on their unique gene symbols. To calculate tissue-specific enrichment, we used a moderated t-test implemented in limma9 to identify differentially expressed genes. False discovery rates were then estimated for this gene list using the method of Benjamini and Hochberg.10 All bioinformatics analyses were carried out using an R statistical environment (http://www.r-project.org). The NCBI Gene Expression Omnibus accession number for all the microarray data reported in this article is GSE32334.
To perform a comprehensive and unbiased gene set analysis, we used a large compendium of more than 10,000 mouse-specific gene sets composed of Gene Ontology terms, KEGG pathways, MouseCyc pathways, MGI mouse phenotype-associated genes, FANTOM4 mouse tissue-specific transcription factor gene sets, and other custom gene sets related to development, signaling pathways, and stem cell regulation (Supplementary Table S1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). Furthermore, we compiled gene sets for lens development, human cataract and for control purposes, tooth development, human tooth agenesis, and human orofacial clefting (Supplementary Table S2, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). For lens development genes, we used a recently curated list of genes that are critically involved in the preplacodal and placodal stages of lens development.11 In addition, we compiled lists of nonsyndromic and syndromic human cataract genes based on a high-quality manual collection of all known human cataract–associated genes, CatMap.2 Tooth development genes, for comparative purposes, were those that cause abnormal tooth development in mouse and human models based on the Mouse Genome Informatics (MGI) database (mammalian phenotype ID, MP:0000116). Similarly, tooth agenesis and orofacial clefting gene lists were taken from a recent review,12 with the addition of one new nonsyndromic tooth agenesis gene, Wnt10a.13 Full details of these gene sets are available in Supplementary Table S2, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental. We tested whether the 200 most highly ranked genes (with or without WB control) were enriched for each gene set independently using Fisher's exact test. The resultant P-values were Bonferroni corrected.
In situ hybridization experiments were performed as previously described.14 In brief, primers containing SP6 or T7 promoter sequences upstream of gene-specific sequences were used to amplify cDNA products that were then analyzed by 1% agarose gel electrophoresis, column purified, and used as templates in in vitro transcription UTP–digoxigenin-labeling reactions. Digoxigenin-labeled probes were then used for in situ hybridization on 13-μm E11.5 mouse lens frozen sections. The following primer pairs were used to amplify mRNA-specific probe sequence from E12.5 mouse embryonic cDNA: 5′-GCTATTTAGGTGACACTATAGTCTACCTGGGCTTTCTGGTG-3′, Fam198b-F; 5′-TTGTAATACGACTCACTATAGGGGCATTCTGCGGATGTCTTCT-3′, Fam198b-R; 5′-GCTATTTAGGTGACACTATAGTCTCAGCTCCCAGCTTTGAT-3′, Ptpru-F; 5′-TTGTAATACGACTCACTATAGGGCTTT- GCGGATGATGACAATG-3′, Ptpru-R; 5′-GCTATTTAGGTGACACTATAGA GCTTCACCCAGCCCTTATC-3′, Ng23-F; 5′-TTGTAATACGACTCACTATAGGGTCTGTCTGCAGCTGTTGAGG-3′, Ng23-R; 5′-GCTATTTAGGTGACACTATAGGACCATCGAGGACGACCTAA-3′, Sipa1l3-F; 5′-TTGTAATACGACTCACTATAGGGGAGTGGCTCTTGGAGTCTGG-3′, Sipa1l3-R; 5′-GCTATTTAGGTGACACTATAGTACCTACCCTCCTGCCACAG-3′, Ypel2-F; 5′-TTGTAATACGACTCACTATAGGGCCCAAAGTGGTTTTGCAGTT-3′, Ypel2-R; 5′-GCTATTTAGGTGACACTATA- GGAATCATGCAGCCAGGTTTT-3′, Rbm24-F; 5′-TTGTAATACGACTCACTATAGGGTCTGTCTGCAGCTGTTGAGG-3′, Rbm24-R; 5′-GCTATTTAGGTGACACTATAGGGCCAGTTCCACACTCTCTT-3′, Gje1-F; 5′-TTGTAATACGACTCACTATAGGGCTCAAAAACCTCAGCAACACA-3′, Gje1-R; 5′-GCTATTTAGGTGACACTATAGGACACAGGCTCAAGCTACCC-3′, Vit-F; 5′-TTGTAATACGACTCACTATAGGGCCATTGGCTTTGGAAAAGAA-3′, Vit-R. Digitized images were processed using image editing software (Photoshop; Adobe, Mountain View, CA). Reagents and probes are available on request.
To construct the iSyTE database, we identified three critical time points in lens development—at E10.5, E11.5 and E12.5—as the lens transitions from the stage of lens placode invagination (E10.5) to that of lens vesicle formation and the onset of lens fiber cell differentiation (E12.5) (Fig. 1).11,15 This developmental window conforms to when mouse orthologs of many human cataract genes are strongly expressed in the developing mouse lens. To ensure high-quality microarray data, we isolated total RNA from manually microdissected mouse embryonic lenses at these stages in amounts sufficient to use a single-step cDNA amplification protocol (see Methods). Using whole genome transcript profiling on microarrays (Mouse Genome 430 2.0; Affymetrix), we generated a developmental profile of the mouse lens transcriptome over the specified developmental interval. The quality of the processed microarrays was assessed using various diagnostic plots, and no anomalies were found (Supplementary Fig. S1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental).
To identify genes with lens-enriched expression, we established an in silico subtraction approach by which lens microarray data sets are compared to a developmentally matched microarray data set representing the whole embryonic body from which the ocular tissue was removed by microdissection, denoted WB. This in silico subtraction involves ranking all genes based on the t-statistic when tissue-specific expression profiles are compared to WB profiles. We hypothesized that this control background data set, which we denoted WB for “whole body minus eyes,” represents an optimal averaged gene expression profile for a mixture of tissues and that comparison of tissue-specific profiles against the WB control profile would facilitate identification of genes with lens-specific or lens-enriched expression. We anticipated that the resultant in silico–subtracted mouse lens database would represent a useful tool to identify lens-enriched genes with roles in lens biology with which to prioritize candidate genes within mapped cataract loci for mutational analysis. Although exceptions exist, this is consistent with the hypothesis that tissue-enriched gene expression more likely reflects a function for the gene in that tissue than if a gene exhibits ubiquitous or widespread expression. The ranked lists of lens-enriched genes are what we refer to as the iSyTE database.
We tested the usefulness of this approach to identify genes associated with lens development and human cataract by first identifying the gene sets that are enriched in the top 200 highly ranked genes (representing ~1% the total number of genes in the genome), with or without WB control, using Fisher's exact test with Bonferroni-corrected P values. The top 200 highly ranked genes from the lens data set with WB subtraction were highly enriched for gene sets for eye and lens biology, without enrichment for gene sets for miscellaneous housekeeping factors (Fig. 2A). We also identified the most highly enriched gene sets for the top 200 highly ranked genes from the lens data set without WB subtraction and found that they consisted primarily of ribosomal components. Therefore, the in silico subtraction method specifically identifies lens-enriched genes, both with high expression and low expression in the lens, while filtering out genes with high expression that are not lens specific. We further found that the top 200 lens-enriched genes from the WB subtraction data set consist primarily of genes associated with lens development, isolated or nonsyndromic cataract and, interestingly, with syndromic cataract as well (Figs. 2B, B,2C;2C; Supplementary Fig. S2, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). Analysis using different numbers of top lens-enriched genes (such as n = 100, 300, 500 genes) produced similar results (data not shown).
To test the potential of iSyTE to identify cataract-associated genes, we analyzed 24 previously mapped intervals that contain genes associated with human isolated or non-syndromic congenital cataract. On manual inspection of these mapped genomic intervals, iSyTE successfully identified the correct mutant gene as the top candidate within a locus in approximately 70% cases (17/24), and in approximately 88% cases (21/24) it ranked the mutant gene within the top two candidates among all candidate genes in the locus, where each locus spans on average 12.3 Mb and contains approximately 80 genes (Table 1). Moreover, the effectiveness of mutant gene identification remained high even when the highly lens-specific crystallin encoding genes were removed from the analysis. These data reflect the ability of iSyTE to identify genes that are expressed at relatively low levels but that are highly enriched in the lens. This group includes the genes FOXE3, HSF4, MAF, and PITX3, which encode transcription factors, as well as BFSP2, LIM2, and MIP that encode cytoskeletal proteins (Table 2).
In addition to the identification of known cataract genes, iSyTE can also identify novel cataract genes. We successfully used a preliminary version of iSyTE to identify the genes involved in two separate cataract cases.38,39 In the first case, the patient presented with bilateral, progressive cataracts with posterior lenticonus as the primary phenotype and carried the balanced paracentric inversion 46,XY,inv(9)(q22.33q34.11).38 The iSyTE database identified TDRD7 as the most probable candidate among 108 genes within a 10-Mb interval around the q22.33 breakpoint. Subsequently, disruption and haploinsufficiency of TDRD7 in the patient was confirmed, and an additional independent 3-bp coding region deletion mutation in TDRD7 was identified in a consanguineous case. In the second case, we applied iSyTE to another independent case of human congenital cataract in which a translocation breakpoint ostensibly responsible for the proband's phenotype was located within a relatively gene-poor genomic interval in which no gene was directly interrupted.39 Nonetheless, iSyTE correctly identified PVRL3 as the gene responsible for the proband's cataract phenotype, most likely on the basis of a position effect, as subsequently proven by the analysis of multiple mouse Pvrl3 mutant alleles.
As yet another validation of iSyTE, we used section in situ analysis for several iSyTE-identified genes on mouse embryonic lens sections to confirm that some of the novel genes that iSyTE ranked as lens enriched were indeed expressed in the expected fashion (Fig. 3). This analysis demonstrated highly enriched lens expression of all 8 of 8 randomly chosen genes that were ranked within the top 250 lens-enriched genes, establishing the validity of the database (Fig. 3). Moreover, human orthologs of two of these genes (SIPA1L3 and PTPRU) fall within or near mapped human cataract loci.40,41 Gje1 (previously known as Gjf1; Fig. 3) has been recently identified as a novel cataract-associated gene in a mouse model.42 Besides these eight relatively uncharacterized genes, evidence for lens enrichment and association with cataract in mouse models has also recently been documented for other iSyTE lens-enriched genes (e.g., Aldh1a1).43 These results further support the usefulness of iSyTE as a cataract gene prioritization resource.
We next sought to use iSyTE to predict promising candidate genes in mapped human cataract loci for which the gene involved has not been identified. We analyzed the latest version of the CatMap data set2 (latest update September 30, 2011) and identified 17 mapped cataract intervals for which a gene has not yet been assigned. We then used iSyTE to predict the most promising candidate genes in these loci. We provide the top candidate genes in each mapped interval based on their high lens-enrichment rank in iSyTE (Table 3; Supplementary Table S3, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). Based on our result that 88% (21/24) of known cataract genes are within the top two candidate genes within a mapped interval, the gene list in Table 3 of iSyTE-predicted candidate cataract genes can potentially serve as a resource for identifying and prioritizing cataract-associated candidate genes for sequencing.
To understand the basis for the effectiveness of the subtraction strategy in identifying genes of functional significance in lens development, we compared gene expression between the developing lens and WB control (Table 2). As expected, dramatic differences for signal intensities of genes encoding crystallin proteins between the lens and WB control were observed. However, genes with relatively low levels of expression in the lens microarray database, which otherwise would likely be ranked as low-priority candidates (e.g., Hsf4, Bfsp2), are identified by the subtraction strategy; genes encoding developmental transcription factors also appear to be preferentially selected.
Furthermore, the microarray expression patterns in the three developmental stages appear to faithfully reflect the published expression pattern of genes in lens development. For example, Bmp7, Meis1, Sox2, Pax6, and Mab21l1, which function in early lens development, have progressively decreased expression by microarray from E10.5 through E12.5. In contrast, Gja3, Gja8, Sox1, Prox1, Mip, and Lim2, which function in lens fiber cells, have progressively increased expression by microarray from E10.5 through E12.5. Thus, because of its derivation from three temporally distinct stages of lens development, the iSyTE database provides insight into early or late function for the gene of interest.
To investigate whether the in silico subtraction strategy could be generally applied to identify genes associated with other developmental disorders, we generated a microarray data set for the developing molar tooth, which is a well-established system for studying the epithelial–mesenchymal interactions involved in organogenesis. We performed laser capture microdissection to capture mouse E13.5 tooth germ tissue and then extracted sufficient total RNA to perform microarrays after two rounds of in vitro transcription-based amplification (double amplification) (Supplementary Fig. S3, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). Using the same amplification protocol, we generated a microarray data set from total RNA extracted and pooled in equimolar ratios from mouse WB tissue at E11.5, E12.5, and E13.5. Similar to the lens, the tooth-specific profiles were “subtracted” from the WB control using a moderated t-test, and a tooth enrichment P value was assigned to each gene. t-Statistics were used to rank genes for tooth enrichment.
We next tested the usefulness of this strategy to identify genes associated with tooth development and human tooth and craniofacial defects. Similar to the lens, these analyses demonstrate that the top 200 highly ranked genes after WB subtraction were highly enriched for genes relevant to tooth biology, without being enriched for genes encoding miscellaneous housekeeping factors (Supplementary Fig. S4A, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). As expected, the top 200 tooth-enriched genes from the WB subtraction data set contained genes associated with syndromic and nonsyndromic tooth agenesis and with orofacial clefting (Supplementary Figs. S4B, S4C, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-8839/-/DCSupplemental). These data are accessible at http://bioinformatics.udel.edu/Research/iSyTE and indicate that in addition to the lens, the in silico subtraction strategy can successfully identify genes associated with tooth development and disease.
We next sought to test the robustness and applicability of the two different WB data sets generated in this study by two experimentalists using two different amplification protocols (single amplification for lens, double amplification for tooth) at slightly different developmental stages (WB at E10.5, 11.5, and 12.5 for lens; WB at E11.5, 12.5, and 13.5 for tooth). We tested whether we could still identify tissue-specific gene enrichment even when WB profiles were generated from a different preparation. Indeed, swapping the different WB profiles generated for the lens and the tooth analysis in the in silico subtraction strategy still robustly identified genes associated with lens and tooth developmental disorders, respectively (Fig. 4).
Finally, we sought to represent the lens enrichment data in user-friendly Genome Browser tracks, allowing our genomewide lens enrichment data to be visualized in the context of the vast amount of genomic annotation already available. We created a custom iSyTE track at the University of California at Santa Cruz (UCSC) Genome Browser, and it is accessible from http://bioinformatics.udel.edu/Research/iSyTE. Each track is color coded to represent the lens enrichment ranking based on WB-subtracted gene expression profiles of E10.5, E11.5, and E12.5 lenses. Thus, iSyTE tracks allow the visualization of genes with their degree of enrichment of expression in the developing lens expressed in a color-coded format, with red indicating highly enriched and blue indicating highly depleted (Fig. 5). To make our resource useful for a wide variety of users, we provide iSyTE custom tracks for two widely used human genome assemblies (hg19 and hg18) and two mouse genome assemblies (mm9 and mm8).
Operationally, after opening the UCSC Genome Browser for a specific genome assembly, the user can search for and browse any genomic interval of interest. This representation allows immediate visual detection of the best candidate genes in a given genomic interval and allows one to zoom in or out to visualize the presence of promising candidates within a particular region or proximal to it. The iSyTE tracks can be viewed in the context of other genomic resources that are already available in the UCSC Genome Browser, such as sequence conservation, known SNP locations, and ENCODE histone modification profiles. Visualization of the iSyTE tracks that represent three embryonic stages in one frame provides some appreciation of the dynamic pattern of gene expression during lens development.
Although it has been proposed that tissue-specific gene expression profiling may facilitate disease gene identification59,60 and that gene expression data sets for many tissue and cell types exist, the application of these resources to gene discovery, particularly in the context of disease, has been limited.61 This is primarily because such data sets are large and the route to efficient selection and prioritization of candidate genes is not straightforward, especially in the context of normal development and in the absence of clear control versus mutant gene expression change comparisons. Several gene expression atlases based on in situ hybridization provide insight into developmental gene expression,62 but such information is typically nonquantitative and does not permit facile comparison of tissue-specific gene expression levels. In this work, we developed a strategy to subject tissue-specific microarray data sets to in silico subtraction that involves comparison of a tissue-specific data set with a WB reference data set, which allows the systematic ranking of genes based on their tissue enrichment. Even with high throughput sequencing, mutations that lie outside the coding regions may be difficult to identify. We demonstrate that this filter provides a highly effective way to identify candidate genes associated with the development of specific tissues for which gene expression profiles can be readily obtained.
The development of iSyTE was based on two basic hypotheses. The first is that genes that are highly expressed at critical stages of murine embryonic development in a specific organ are likely associated with mutations in human genes that are linked to an organ-specific birth defect. The second is that in silico subtraction of gene expression profiles for whole embryonic body from those for equivalently staged specific, microdissected embryonic tissue can effectively remove nonspecific but highly expressed genes, thereby revealing tissue-specific genes. Using lens and tooth as examples, we show that this relatively straightforward experimental and computational approach can effectively facilitate the identification of human disease–associated genes.
As with any gene prediction tool, there is a false-negative rate associated with a given prediction, and it is important to consider the potential source of false negatives when interpreting results from iSyTE. Our retrospective analysis of 24 known cataract genes indicates that approximately 10% of the genes do not have high lens expression or enriched expression as measured in the current microarray data, thereby suggesting a false-negative rate of approximately 10%. This could potentially result from the following factors: the sensitivity of the microarray probes for these genes may be poor; the expression of these genes may be restricted to a different developmental stage than those analyzed; and the effect of lens-specific expression is masked by neighboring genes within the candidate interval, which have higher levels of lens-specific expression but which are noncausative.
Indeed, such examples are evident in our present data analysis. For example, in 3 of 24 cases (FYCO1, GCNT2, CHMP4B), iSyTE did not rank the correct gene within the top two candidates in the interval (Table 1). On further analysis, in case of FYCO1 (ranked 21/191), the mapped interval was large (12.21 Mb) and contained 191 candidate genes, several of which exhibited significantly higher lens-enriched expression than FYCO1. In GCNT2 (ranked 7/21 within a 5.26-Mb interval), we found very low expression of this gene in the microarrays, indicative of either suboptimal probe binding or genuinely low expression at the lens stages analyzed. In CHMP4B (ranked 34/43 in a 3.03-Mb mapped interval), this gene is significantly expressed in the lens (signal detection P < 0.002), but it is also significantly expressed in the WB control. As a result, it does not have a high lens-enrichment rank and is therefore not correctly identified by iSyTE as a likely candidate gene.
In some cases, iSyTE does not predict any promising candidate genes based on lens enrichment (e.g., in the mapped human cataract intervals on 2q33 and 17p24) (Table 3). In yet another case (20p11.23-p12.1), iSyTE predicted BFSP1 from 29 candidates in the interval (Table 3). However, in this interval, BFSP1 has been sequenced and found to harbor no exonic or exon junction mutation, suggesting that the mutation resides in a regulatory region or in another gene. Therefore, in all cases, further experimental validation through mutational sequence analysis will be necessary, in addition to the in silico predictions made by iSyTE.
Other genomewide in silico analyses have recently been applied to the interpretation of candidate SNPs in genomewide association studies (GWAS).63 For example, Ernst et al.64 showed that cell-type specific histone modification patterns can identify regulatory regions and that knowledge of the location of these regulatory regions and their associated genes can aid in the interpretation of GWAS by providing potential regulatory mechanisms for each candidate SNP. Similarly, Ozkul et al.65 have devised a strategy based on ChIP-seq data for the transcription factor CRX to rank candidate genes within mapped intervals for retinitis pigmentosa (RP). Combined with exome sequencing, this approach successfully identified a novel mutation in the gene MAK, which is associated with RP. In the work reported here, we demonstrate a cost-effective strategy to effectively prioritize mutations for human disease gene identification. Because embryonic dissections can be readily performed in many research laboratories and because microarray is increasingly affordable, the iSyTE approach should be applicable to other organ- and tissue-specific diseases, as demonstrated by our tooth germ analysis.
In conclusion, we describe a novel strategy for identifying disease-associated genes that is supported by a publicly available Web resource called iSyTE. We recently used a preliminary version of iSyTE to help identify two human genes associated with cataract, TDRD7 and PVRL3. Because there are likely many other candidate cataract-associated genes that have not yet been identified, this Web-based resource should provide a useful tool for the ocular genetics community. Besides serving to identify lens-specific disease genes, future versions of iSyTE that include expression data sets for other ocular components should further help identify additional genes that influence the development and biology of the eye.
The authors thank Sung Choe for preliminary analysis, Shamil Sunyaev for helpful comments, and Hongzhan Huang for help with hosting the Web site.
Supported by National Institutes of Health (NIH) Grants R01EY10123-15 (RLM) and R01EY021505-01 (SAL) and the NIH Common Fund through National Institute of Biomedical Imagining and Bioengineering Grant RL9EB008539 (JWKH and DJO).
Disclosure: S.A. Lachke, None; J.W.K. Ho, None; G.V. Kryukov, None; D.J. O'Connell, None; A. Aboukhalil, None; M.L. Bulyk, None; P.J. Park, None; R.L. Maas, None