Search tips
Search criteria 


Logo of iovsIOVSARVO
Invest Ophthalmol Vis Sci. 2012 March; 53(3): 1617–1627.
Published online 2012 March 21. doi:  10.1167/iovs.11-8839
PMCID: PMC3339920

iSyTE: Integrated Systems Tool for Eye Gene Discovery



To facilitate the identification of genes associated with cataract and other ocular defects, the authors developed and validated a computational tool termed iSyTE (integrated Systems Tool for Eye gene discovery; iSyTE uses a mouse embryonic lens gene expression data set as a bioinformatics filter to select candidate genes from human or mouse genomic regions implicated in disease and to prioritize them for further mutational and functional analyses.


Microarray gene expression profiles were obtained for microdissected embryonic mouse lens at three key developmental time points in the transition from the embryonic day (E)10.5 stage of lens placode invagination to E12.5 lens primary fiber cell differentiation. Differentially regulated genes were identified by in silico comparison of lens gene expression profiles with those of whole embryo body (WB) lacking ocular tissue.


Gene set analysis demonstrated that this strategy effectively removes highly expressed but nonspecific housekeeping genes from lens tissue expression profiles, allowing identification of less highly expressed lens disease–associated genes. Among 24 previously mapped human genomic intervals containing genes associated with isolated congenital cataract, the mutant gene is ranked within the top two iSyTE-selected candidates in approximately 88% of cases. Finally, in situ hybridization confirmed lens expression of several novel iSyTE-identified genes.


iSyTE is a publicly available Web resource that can be used to prioritize candidate genes within mapped genomic intervals associated with congenital cataract for further investigation. Extension of this approach to other ocular tissue components will facilitate eye disease gene discovery.

Even with the advent of high-throughput sequencing, the discovery of genes associated with congenital birth defects such as eye defects remains a challenge. We sought to develop a straightforward experimental approach that could facilitate the identification of candidate genes for developmental disorders, and, as proof-of-principle, we chose defects involving the ocular lens. Opacification of the lens results in cataract, a leading cause of blindness that affects 77 million persons and accounts for 48% of blindness worldwide.1 Cataracts can be classified as either congenital or age related, and can be expressed as either an isolated or a nonsyndromic phenotype or as part of a larger developmental syndrome.24 Approximately one quarter of congenital cataracts are inherited5; all three modes of Mendelian inheritance have been described, with autosomal dominant being the most common.2 Both linkage and mutational analyses of candidate genes have been successfully used to identify genetic causes of congenital cataracts; 24 loci are known to exist for isolated cataracts.2

The identification of genetic mutations, such as those implicated in cataract formation, traditionally follows an initial mapping step that involves linkage analysis or homozygosity mapping, followed by sequence analysis of candidate genes or genomic regions in patient DNA. A similar approach can identify mutant genes in model organisms such as mouse and zebrafish. Nonetheless, linkage and mutational analyses are cumbersome and often involve the exclusion of a large number of candidate genes by DNA sequence analysis before the correct gene is identified. Although the advent of next-generation sequencing makes it possible to rapidly identify a large number of potentially deleterious genetic variants within a sample, it often remains unclear how to identify the actual disease-associated mutation in a cost-effective manner without performing a large cohort case-control study. It is often the case that additional biological knowledge is necessary to resolve disease-producing genetic mutations from sequence variants that are unrelated to the phenotype of interest.

In the case of human developmental disorders, we hypothesized that knowledge of embryonic gene expression patterns, which are often conserved and readily accessible for the homologous mouse genes, could help assist in the identification of congenital birth defect genes in human. Here we describe a straightforward experimental and computational strategy to identify and prioritize candidate disease genes based on microarray gene expression profiles generated from embryonic mouse tissues. As an initial application, we applied this to cataract phenotypes. To make this tool broadly accessible, we concurrently developed a publicly available Web-based resource termed iSyTE (integrated Systems Tool for Eye gene discovery) that can efficiently prioritize candidate genes associated with human congenital cataract.


Mouse Husbandry

Mice were treated in accordance with protocols defined in the ARVO Statement for the Use of Animals in Ophthalmic and Vision Research. The Animal Care and Use Committee of Harvard Medical School (Boston, MA) approved all experimental protocols involving mice. Wild-type ICR mice were obtained from Taconic (Albany, NY) and were used for microarray and in situ hybridization analyses. Mice were housed in a 14-hour light/10-hour dark cycle; the morning of vaginal plug discovery was defined as embryonic day (E)0.5.

Microarray Analysis

Total RNA was extracted from manually dissected mouse embryonic day 10.5, 11.5, and 12.5 lenses (approximately 200 lenses per E10.5 replicate, 150 lenses per E11.5 replicate, 100 lenses per E12.5 replicate) or from whole embryonic tissue minus the eye region at stages E10.5, E11.5, and E12.5 using an RNA purification kit (RNeasy Mini Kit; Qiagen, Valencia, CA). RNA from stage-matched whole embryonic tissue minus the eye region, which was removed by microdissection, was pooled in equimolar ratios, denoted the whole body (WB) control, and was processed in parallel. Microarray data from the WB control was later used to achieve in silico enrichment for lens-enriched genes (see Results). We first tested the purity of the dissected lens tissue by analyzing dissected lenses at these stages from P0-3.9-GFPCre reporter mice, in which the lens-specific GFP expression is driven by the Pax6 ectodermal enhancer within the 3.9-kb region upstream of the Pax6 P0 promoter.6 We then used wild-type in house timed pregnant ICR mice as a resource for collecting the lens tissues used for microarray analysis. Microarray analyses were performed in biological triplicate by hybridization to a microarray (Affymetrix Mouse 430 2.0 chip; Affymetrix, Santa Clara, CA) in the Biopolymers Facility at Harvard Medical School. Standard Affymetrix protocols were used to prepare cDNA and biotin-labeled cRNA using in vitro transcription. Quality of the total RNA was evaluated in a microfluidics-based platform (2100 Bioanalyzer; Agilent Technologies, Inc., Santa Clara, CA) before processing for cDNA preparation by RT-PCR. The cDNA was converted to biotinylated cRNA using modified nucleoside triphosphates in an in vitro transcription reaction. The labeled cRNA was hybridized to the chips for 16 hours and then washed and stained. The chip was irradiated at 488 nm (excitation) and scanned at 570 nm (emission). Raw probe intensities from all microarray profiles were preprocessed together using the robust multiarray average method,7 implemented in the affy package.8 If a gene was represented by multiple probe sets, we selected the probe set with the highest median expression across all samples to represent the expression of that gene. In this manner, all probe sets were collapsed into 20,460 genes, based on their unique gene symbols. To calculate tissue-specific enrichment, we used a moderated t-test implemented in limma9 to identify differentially expressed genes. False discovery rates were then estimated for this gene list using the method of Benjamini and Hochberg.10 All bioinformatics analyses were carried out using an R statistical environment ( The NCBI Gene Expression Omnibus accession number for all the microarray data reported in this article is GSE32334.

Gene Set Analysis

To perform a comprehensive and unbiased gene set analysis, we used a large compendium of more than 10,000 mouse-specific gene sets composed of Gene Ontology terms, KEGG pathways, MouseCyc pathways, MGI mouse phenotype-associated genes, FANTOM4 mouse tissue-specific transcription factor gene sets, and other custom gene sets related to development, signaling pathways, and stem cell regulation (Supplementary Table S1, Furthermore, we compiled gene sets for lens development, human cataract and for control purposes, tooth development, human tooth agenesis, and human orofacial clefting (Supplementary Table S2, For lens development genes, we used a recently curated list of genes that are critically involved in the preplacodal and placodal stages of lens development.11 In addition, we compiled lists of nonsyndromic and syndromic human cataract genes based on a high-quality manual collection of all known human cataract–associated genes, CatMap.2 Tooth development genes, for comparative purposes, were those that cause abnormal tooth development in mouse and human models based on the Mouse Genome Informatics (MGI) database (mammalian phenotype ID, MP:0000116). Similarly, tooth agenesis and orofacial clefting gene lists were taken from a recent review,12 with the addition of one new nonsyndromic tooth agenesis gene, Wnt10a.13 Full details of these gene sets are available in Supplementary Table S2, We tested whether the 200 most highly ranked genes (with or without WB control) were enriched for each gene set independently using Fisher's exact test. The resultant P-values were Bonferroni corrected.

In Situ Hybridization

In situ hybridization experiments were performed as previously described.14 In brief, primers containing SP6 or T7 promoter sequences upstream of gene-specific sequences were used to amplify cDNA products that were then analyzed by 1% agarose gel electrophoresis, column purified, and used as templates in in vitro transcription UTP–digoxigenin-labeling reactions. Digoxigenin-labeled probes were then used for in situ hybridization on 13-μm E11.5 mouse lens frozen sections. The following primer pairs were used to amplify mRNA-specific probe sequence from E12.5 mouse embryonic cDNA: 5′-GCTATTTAGGTGACACTATAGTCTACCTGGGCTTTCTGGTG-3′, Fam198b-F; 5′-TTGTAATACGACTCACTATAGGGGCATTCTGCGGATGTCTTCT-3′, Fam198b-R; 5′-GCTATTTAGGTGACACTATAGTCTCAGCTCCCAGCTTTGAT-3′, Ptpru-F; 5′-TTGTAATACGACTCACTATAGGGCTTT- GCGGATGATGACAATG-3′, Ptpru-R; 5′-GCTATTTAGGTGACACTATAGA GCTTCACCCAGCCCTTATC-3′, Ng23-F; 5′-TTGTAATACGACTCACTATAGGGTCTGTCTGCAGCTGTTGAGG-3′, Ng23-R; 5′-GCTATTTAGGTGACACTATAGGACCATCGAGGACGACCTAA-3′, Sipa1l3-F; 5′-TTGTAATACGACTCACTATAGGGGAGTGGCTCTTGGAGTCTGG-3′, Sipa1l3-R; 5′-GCTATTTAGGTGACACTATAGTACCTACCCTCCTGCCACAG-3′, Ypel2-F; 5′-TTGTAATACGACTCACTATAGGGCCCAAAGTGGTTTTGCAGTT-3′, Ypel2-R; 5′-GCTATTTAGGTGACACTATA- GGAATCATGCAGCCAGGTTTT-3′, Rbm24-F; 5′-TTGTAATACGACTCACTATAGGGTCTGTCTGCAGCTGTTGAGG-3′, Rbm24-R; 5′-GCTATTTAGGTGACACTATAGGGCCAGTTCCACACTCTCTT-3′, Gje1-F; 5′-TTGTAATACGACTCACTATAGGGCTCAAAAACCTCAGCAACACA-3′, Gje1-R; 5′-GCTATTTAGGTGACACTATAGGACACAGGCTCAAGCTACCC-3′, Vit-F; 5′-TTGTAATACGACTCACTATAGGGCCATTGGCTTTGGAAAAGAA-3′, Vit-R. Digitized images were processed using image editing software (Photoshop; Adobe, Mountain View, CA). Reagents and probes are available on request.


Gene Expression Profiling of the Mouse Embryonic Lens

To construct the iSyTE database, we identified three critical time points in lens development—at E10.5, E11.5 and E12.5—as the lens transitions from the stage of lens placode invagination (E10.5) to that of lens vesicle formation and the onset of lens fiber cell differentiation (E12.5) (Fig. 1).11,15 This developmental window conforms to when mouse orthologs of many human cataract genes are strongly expressed in the developing mouse lens. To ensure high-quality microarray data, we isolated total RNA from manually microdissected mouse embryonic lenses at these stages in amounts sufficient to use a single-step cDNA amplification protocol (see Methods). Using whole genome transcript profiling on microarrays (Mouse Genome 430 2.0; Affymetrix), we generated a developmental profile of the mouse lens transcriptome over the specified developmental interval. The quality of the processed microarrays was assessed using various diagnostic plots, and no anomalies were found (Supplementary Fig. S1,

Figure 1.
Strategy for building iSyTE. To identify genes that are specifically expressed in the lens during embryonic development, mouse lens tissue at E10.5, E11.5, and E12.5 was profiled using microarrays. Several hundred lenses at stages E10.5, E11.5, and E12.5 ...

Identification of Lens-Enriched Genes

To identify genes with lens-enriched expression, we established an in silico subtraction approach by which lens microarray data sets are compared to a developmentally matched microarray data set representing the whole embryonic body from which the ocular tissue was removed by microdissection, denoted WB. This in silico subtraction involves ranking all genes based on the t-statistic when tissue-specific expression profiles are compared to WB profiles. We hypothesized that this control background data set, which we denoted WB for “whole body minus eyes,” represents an optimal averaged gene expression profile for a mixture of tissues and that comparison of tissue-specific profiles against the WB control profile would facilitate identification of genes with lens-specific or lens-enriched expression. We anticipated that the resultant in silico–subtracted mouse lens database would represent a useful tool to identify lens-enriched genes with roles in lens biology with which to prioritize candidate genes within mapped cataract loci for mutational analysis. Although exceptions exist, this is consistent with the hypothesis that tissue-enriched gene expression more likely reflects a function for the gene in that tissue than if a gene exhibits ubiquitous or widespread expression. The ranked lists of lens-enriched genes are what we refer to as the iSyTE database.

We tested the usefulness of this approach to identify genes associated with lens development and human cataract by first identifying the gene sets that are enriched in the top 200 highly ranked genes (representing ~1% the total number of genes in the genome), with or without WB control, using Fisher's exact test with Bonferroni-corrected P values. The top 200 highly ranked genes from the lens data set with WB subtraction were highly enriched for gene sets for eye and lens biology, without enrichment for gene sets for miscellaneous housekeeping factors (Fig. 2A). We also identified the most highly enriched gene sets for the top 200 highly ranked genes from the lens data set without WB subtraction and found that they consisted primarily of ribosomal components. Therefore, the in silico subtraction method specifically identifies lens-enriched genes, both with high expression and low expression in the lens, while filtering out genes with high expression that are not lens specific. We further found that the top 200 lens-enriched genes from the WB subtraction data set consist primarily of genes associated with lens development, isolated or nonsyndromic cataract and, interestingly, with syndromic cataract as well (Figs. 2B, B,2C;2C; Supplementary Fig. S2, Analysis using different numbers of top lens-enriched genes (such as n = 100, 300, 500 genes) produced similar results (data not shown).

Figure 2.
In silico subtraction is an effective tool to identify lens-enriched genes. (A) The 200 most highly ranked genes with WB subtraction and without WB subtraction (No WB) at E10.5, E11.5, and E12.5 were tested against many functional biological gene categories ...

iSyTE Effectively Identifies Known and Novel Genes Associated with Cataract

To test the potential of iSyTE to identify cataract-associated genes, we analyzed 24 previously mapped intervals that contain genes associated with human isolated or non-syndromic congenital cataract. On manual inspection of these mapped genomic intervals, iSyTE successfully identified the correct mutant gene as the top candidate within a locus in approximately 70% cases (17/24), and in approximately 88% cases (21/24) it ranked the mutant gene within the top two candidates among all candidate genes in the locus, where each locus spans on average 12.3 Mb and contains approximately 80 genes (Table 1). Moreover, the effectiveness of mutant gene identification remained high even when the highly lens-specific crystallin encoding genes were removed from the analysis. These data reflect the ability of iSyTE to identify genes that are expressed at relatively low levels but that are highly enriched in the lens. This group includes the genes FOXE3, HSF4, MAF, and PITX3, which encode transcription factors, as well as BFSP2, LIM2, and MIP that encode cytoskeletal proteins (Table 2).

Table 1.
iSyTE Rank of Genes Associated with Human Isolated Congenital Cataract
Table 2.
Signal Intensities for Gene Expression in Lens and WB

In addition to the identification of known cataract genes, iSyTE can also identify novel cataract genes. We successfully used a preliminary version of iSyTE to identify the genes involved in two separate cataract cases.38,39 In the first case, the patient presented with bilateral, progressive cataracts with posterior lenticonus as the primary phenotype and carried the balanced paracentric inversion 46,XY,inv(9)(q22.33q34.11).38 The iSyTE database identified TDRD7 as the most probable candidate among 108 genes within a 10-Mb interval around the q22.33 breakpoint. Subsequently, disruption and haploinsufficiency of TDRD7 in the patient was confirmed, and an additional independent 3-bp coding region deletion mutation in TDRD7 was identified in a consanguineous case. In the second case, we applied iSyTE to another independent case of human congenital cataract in which a translocation breakpoint ostensibly responsible for the proband's phenotype was located within a relatively gene-poor genomic interval in which no gene was directly interrupted.39 Nonetheless, iSyTE correctly identified PVRL3 as the gene responsible for the proband's cataract phenotype, most likely on the basis of a position effect, as subsequently proven by the analysis of multiple mouse Pvrl3 mutant alleles.

As yet another validation of iSyTE, we used section in situ analysis for several iSyTE-identified genes on mouse embryonic lens sections to confirm that some of the novel genes that iSyTE ranked as lens enriched were indeed expressed in the expected fashion (Fig. 3). This analysis demonstrated highly enriched lens expression of all 8 of 8 randomly chosen genes that were ranked within the top 250 lens-enriched genes, establishing the validity of the database (Fig. 3). Moreover, human orthologs of two of these genes (SIPA1L3 and PTPRU) fall within or near mapped human cataract loci.40,41 Gje1 (previously known as Gjf1; Fig. 3) has been recently identified as a novel cataract-associated gene in a mouse model.42 Besides these eight relatively uncharacterized genes, evidence for lens enrichment and association with cataract in mouse models has also recently been documented for other iSyTE lens-enriched genes (e.g., Aldh1a1).43 These results further support the usefulness of iSyTE as a cataract gene prioritization resource.

Figure 3.
iSyTE predicts potential candidate genes in mapped cataract loci in human and mouse. Section in situ hybridization on E11.5 to E12.0 mouse embryonic tissue confirms lens expression for Sipa1l3 (human locus 19q13.13, SIPA1L3), Ptpru (human locus 1p35.3, ...

We next sought to use iSyTE to predict promising candidate genes in mapped human cataract loci for which the gene involved has not been identified. We analyzed the latest version of the CatMap data set2 (latest update September 30, 2011) and identified 17 mapped cataract intervals for which a gene has not yet been assigned. We then used iSyTE to predict the most promising candidate genes in these loci. We provide the top candidate genes in each mapped interval based on their high lens-enrichment rank in iSyTE (Table 3; Supplementary Table S3, Based on our result that 88% (21/24) of known cataract genes are within the top two candidate genes within a mapped interval, the gene list in Table 3 of iSyTE-predicted candidate cataract genes can potentially serve as a resource for identifying and prioritizing cataract-associated candidate genes for sequencing.

Table 3.
iSyTE Predicted Candidate Genes in Mapped Intervals for Human Cataract

Basis of the Effectiveness of the Subtraction Strategy

To understand the basis for the effectiveness of the subtraction strategy in identifying genes of functional significance in lens development, we compared gene expression between the developing lens and WB control (Table 2). As expected, dramatic differences for signal intensities of genes encoding crystallin proteins between the lens and WB control were observed. However, genes with relatively low levels of expression in the lens microarray database, which otherwise would likely be ranked as low-priority candidates (e.g., Hsf4, Bfsp2), are identified by the subtraction strategy; genes encoding developmental transcription factors also appear to be preferentially selected.

Furthermore, the microarray expression patterns in the three developmental stages appear to faithfully reflect the published expression pattern of genes in lens development. For example, Bmp7, Meis1, Sox2, Pax6, and Mab21l1, which function in early lens development, have progressively decreased expression by microarray from E10.5 through E12.5. In contrast, Gja3, Gja8, Sox1, Prox1, Mip, and Lim2, which function in lens fiber cells, have progressively increased expression by microarray from E10.5 through E12.5. Thus, because of its derivation from three temporally distinct stages of lens development, the iSyTE database provides insight into early or late function for the gene of interest.

Extension of the Subtraction Strategy to Other Tissue Types

To investigate whether the in silico subtraction strategy could be generally applied to identify genes associated with other developmental disorders, we generated a microarray data set for the developing molar tooth, which is a well-established system for studying the epithelial–mesenchymal interactions involved in organogenesis. We performed laser capture microdissection to capture mouse E13.5 tooth germ tissue and then extracted sufficient total RNA to perform microarrays after two rounds of in vitro transcription-based amplification (double amplification) (Supplementary Fig. S3, Using the same amplification protocol, we generated a microarray data set from total RNA extracted and pooled in equimolar ratios from mouse WB tissue at E11.5, E12.5, and E13.5. Similar to the lens, the tooth-specific profiles were “subtracted” from the WB control using a moderated t-test, and a tooth enrichment P value was assigned to each gene. t-Statistics were used to rank genes for tooth enrichment.

We next tested the usefulness of this strategy to identify genes associated with tooth development and human tooth and craniofacial defects. Similar to the lens, these analyses demonstrate that the top 200 highly ranked genes after WB subtraction were highly enriched for genes relevant to tooth biology, without being enriched for genes encoding miscellaneous housekeeping factors (Supplementary Fig. S4A, As expected, the top 200 tooth-enriched genes from the WB subtraction data set contained genes associated with syndromic and nonsyndromic tooth agenesis and with orofacial clefting (Supplementary Figs. S4B, S4C, These data are accessible at and indicate that in addition to the lens, the in silico subtraction strategy can successfully identify genes associated with tooth development and disease.

Use of WB Microarray Data Sets as a Public Resource

We next sought to test the robustness and applicability of the two different WB data sets generated in this study by two experimentalists using two different amplification protocols (single amplification for lens, double amplification for tooth) at slightly different developmental stages (WB at E10.5, 11.5, and 12.5 for lens; WB at E11.5, 12.5, and 13.5 for tooth). We tested whether we could still identify tissue-specific gene enrichment even when WB profiles were generated from a different preparation. Indeed, swapping the different WB profiles generated for the lens and the tooth analysis in the in silico subtraction strategy still robustly identified genes associated with lens and tooth developmental disorders, respectively (Fig. 4).

Figure 4.
In silico subtraction strategy is robust against use of different WB controls. After swapping WB control profiles generated for separate lens and tooth analyses, the in silico subtraction strategy still robustly identifies genes that are specific to ( ...

Construction of a Web-Based Public Resource: iSyTE

Finally, we sought to represent the lens enrichment data in user-friendly Genome Browser tracks, allowing our genomewide lens enrichment data to be visualized in the context of the vast amount of genomic annotation already available. We created a custom iSyTE track at the University of California at Santa Cruz (UCSC) Genome Browser, and it is accessible from Each track is color coded to represent the lens enrichment ranking based on WB-subtracted gene expression profiles of E10.5, E11.5, and E12.5 lenses. Thus, iSyTE tracks allow the visualization of genes with their degree of enrichment of expression in the developing lens expressed in a color-coded format, with red indicating highly enriched and blue indicating highly depleted (Fig. 5). To make our resource useful for a wide variety of users, we provide iSyTE custom tracks for two widely used human genome assemblies (hg19 and hg18) and two mouse genome assemblies (mm9 and mm8).

Figure 5.
iSyTE tracks on the UCSC Genome Browser represent a publicly available resource for cataract gene identification. iSyTE custom tracks (accessible at visualize the ranking of the lens enrichment for E10.5, ...

Operationally, after opening the UCSC Genome Browser for a specific genome assembly, the user can search for and browse any genomic interval of interest. This representation allows immediate visual detection of the best candidate genes in a given genomic interval and allows one to zoom in or out to visualize the presence of promising candidates within a particular region or proximal to it. The iSyTE tracks can be viewed in the context of other genomic resources that are already available in the UCSC Genome Browser, such as sequence conservation, known SNP locations, and ENCODE histone modification profiles. Visualization of the iSyTE tracks that represent three embryonic stages in one frame provides some appreciation of the dynamic pattern of gene expression during lens development.


Although it has been proposed that tissue-specific gene expression profiling may facilitate disease gene identification59,60 and that gene expression data sets for many tissue and cell types exist, the application of these resources to gene discovery, particularly in the context of disease, has been limited.61 This is primarily because such data sets are large and the route to efficient selection and prioritization of candidate genes is not straightforward, especially in the context of normal development and in the absence of clear control versus mutant gene expression change comparisons. Several gene expression atlases based on in situ hybridization provide insight into developmental gene expression,62 but such information is typically nonquantitative and does not permit facile comparison of tissue-specific gene expression levels. In this work, we developed a strategy to subject tissue-specific microarray data sets to in silico subtraction that involves comparison of a tissue-specific data set with a WB reference data set, which allows the systematic ranking of genes based on their tissue enrichment. Even with high throughput sequencing, mutations that lie outside the coding regions may be difficult to identify. We demonstrate that this filter provides a highly effective way to identify candidate genes associated with the development of specific tissues for which gene expression profiles can be readily obtained.

The development of iSyTE was based on two basic hypotheses. The first is that genes that are highly expressed at critical stages of murine embryonic development in a specific organ are likely associated with mutations in human genes that are linked to an organ-specific birth defect. The second is that in silico subtraction of gene expression profiles for whole embryonic body from those for equivalently staged specific, microdissected embryonic tissue can effectively remove nonspecific but highly expressed genes, thereby revealing tissue-specific genes. Using lens and tooth as examples, we show that this relatively straightforward experimental and computational approach can effectively facilitate the identification of human disease–associated genes.

As with any gene prediction tool, there is a false-negative rate associated with a given prediction, and it is important to consider the potential source of false negatives when interpreting results from iSyTE. Our retrospective analysis of 24 known cataract genes indicates that approximately 10% of the genes do not have high lens expression or enriched expression as measured in the current microarray data, thereby suggesting a false-negative rate of approximately 10%. This could potentially result from the following factors: the sensitivity of the microarray probes for these genes may be poor; the expression of these genes may be restricted to a different developmental stage than those analyzed; and the effect of lens-specific expression is masked by neighboring genes within the candidate interval, which have higher levels of lens-specific expression but which are noncausative.

Indeed, such examples are evident in our present data analysis. For example, in 3 of 24 cases (FYCO1, GCNT2, CHMP4B), iSyTE did not rank the correct gene within the top two candidates in the interval (Table 1). On further analysis, in case of FYCO1 (ranked 21/191), the mapped interval was large (12.21 Mb) and contained 191 candidate genes, several of which exhibited significantly higher lens-enriched expression than FYCO1. In GCNT2 (ranked 7/21 within a 5.26-Mb interval), we found very low expression of this gene in the microarrays, indicative of either suboptimal probe binding or genuinely low expression at the lens stages analyzed. In CHMP4B (ranked 34/43 in a 3.03-Mb mapped interval), this gene is significantly expressed in the lens (signal detection P < 0.002), but it is also significantly expressed in the WB control. As a result, it does not have a high lens-enrichment rank and is therefore not correctly identified by iSyTE as a likely candidate gene.

In some cases, iSyTE does not predict any promising candidate genes based on lens enrichment (e.g., in the mapped human cataract intervals on 2q33 and 17p24) (Table 3). In yet another case (20p11.23-p12.1), iSyTE predicted BFSP1 from 29 candidates in the interval (Table 3). However, in this interval, BFSP1 has been sequenced and found to harbor no exonic or exon junction mutation, suggesting that the mutation resides in a regulatory region or in another gene. Therefore, in all cases, further experimental validation through mutational sequence analysis will be necessary, in addition to the in silico predictions made by iSyTE.

Other genomewide in silico analyses have recently been applied to the interpretation of candidate SNPs in genomewide association studies (GWAS).63 For example, Ernst et al.64 showed that cell-type specific histone modification patterns can identify regulatory regions and that knowledge of the location of these regulatory regions and their associated genes can aid in the interpretation of GWAS by providing potential regulatory mechanisms for each candidate SNP. Similarly, Ozkul et al.65 have devised a strategy based on ChIP-seq data for the transcription factor CRX to rank candidate genes within mapped intervals for retinitis pigmentosa (RP). Combined with exome sequencing, this approach successfully identified a novel mutation in the gene MAK, which is associated with RP. In the work reported here, we demonstrate a cost-effective strategy to effectively prioritize mutations for human disease gene identification. Because embryonic dissections can be readily performed in many research laboratories and because microarray is increasingly affordable, the iSyTE approach should be applicable to other organ- and tissue-specific diseases, as demonstrated by our tooth germ analysis.

In conclusion, we describe a novel strategy for identifying disease-associated genes that is supported by a publicly available Web resource called iSyTE. We recently used a preliminary version of iSyTE to help identify two human genes associated with cataract, TDRD7 and PVRL3. Because there are likely many other candidate cataract-associated genes that have not yet been identified, this Web-based resource should provide a useful tool for the ocular genetics community. Besides serving to identify lens-specific disease genes, future versions of iSyTE that include expression data sets for other ocular components should further help identify additional genes that influence the development and biology of the eye.

Supplementary Material

Supplementary Data:


The authors thank Sung Choe for preliminary analysis, Shamil Sunyaev for helpful comments, and Hongzhan Huang for help with hosting the Web site.


Supported by National Institutes of Health (NIH) Grants R01EY10123-15 (RLM) and R01EY021505-01 (SAL) and the NIH Common Fund through National Institute of Biomedical Imagining and Bioengineering Grant RL9EB008539 (JWKH and DJO).

Disclosure: S.A. Lachke, None; J.W.K. Ho, None; G.V. Kryukov, None; D.J. O'Connell, None; A. Aboukhalil, None; M.L. Bulyk, None; P.J. Park, None; R.L. Maas, None


1. Resnikoff S, Pascolini D, Etya'ale D, et al. Global data on visual impairment in the year 2002. Bull World Health Organ. 2004;82:844–851 [PubMed]
2. Shiels A, Bennett TM, Hejtmancik JF. CatMap: putting cataract on the map. Mol Vis. 2010;16:2007–2015 [PMC free article] [PubMed]
3. Hejtmancik JF. Congenital cataracts and their molecular genetics. Semin Cell Dev Biol. 2008;19:134–149 [PMC free article] [PubMed]
4. Graw J. Mouse models of cataract. J Genet. 2009;88:469–486 [PubMed]
5. Rahi JS, Dezateaux C. Congenital and infantile cataract in the United Kingdom: underlying or associated factors. British Congenital Cataract Interest Group. Invest Ophthalmol Vis Sci. 2000;41:2108–2114 [PubMed]
6. Rowan S, Sigger T, Lachke SA, et al. Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site affinity. Genes Dev. 2010;24:980–985 [PubMed]
7. Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264 [PubMed]
8. Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20:307–315 [PubMed]
9. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article310.2202/1544–6115.1027 [PubMed]
10. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B (Methodological). 1995;57:289–300
11. Lachke SA, Maas RL. Building the developmental oculome: systems biology in vertebrate eye development and disease. Wiley Interdiscip Rev Syst Biol Med. 2010;2:305–323 [PubMed]
12. Matalova E, Fleischmannova J, Sharpe PT, Tucker AS. Tooth agenesis: from molecular genetics to molecular dentistry. J Dent Res. 2008;87:617–623 [PubMed]
13. Bohring A, Stamm T, Spaich C, et al. WNT10A mutations are a frequent cause of a broad spectrum of ectodermal dysplasias with sex-biased manifestation pattern in heterozygotes. Am J Hum Genet. 2009;85:97–105 [PubMed]
14. Xu PX, Woo I, Her H, Beier DR, Maas RL. Mouse Eya homologues of the Drosophila eyes absent gene require Pax6 for expression in lens and nasal placode. Development. 1997;124:219–231 [PubMed]
15. Donner AL, Lachke SA, Maas RL. Lens induction in vertebrates: variations on a conserved theme of signaling events. Semin Cell Dev Biol. 2006;17:676–685 [PubMed]
16. Ramachandran RD, Perumalsamy V, Hejtmancik JF. Autosomal recessive juvenile onset cataract associated with mutation in BFSP1. Hum Genet. 2007;121:475–482 [PubMed]
17. Jakobs PM, Hess JF, FitzGerald PG, et al. Autosomal-dominant congenital cataract associated with a deletion mutation in the human beaded filament protein gene BFSP2. Am J Hum Genet. 2000;66:1432–1436 [PubMed]
18. Shiels A, Bennett TM, Knopf HLS, et al. CHMP4B, a novel gene for autosomal dominant cataracts linked to chromosome 20q. Am J Hum Genet. 2007;81:596–606 [PubMed]
19. Litt M, Kramer P, LaMorticella DM, et al. Autosomal dominant congenital cataract associated with a missense mutation in the human alpha crystallin gene CRYAA. Hum Mol Genet. 1998;7:471–474 [PubMed]
20. Berry V, Francis P, Reddy MA, et al. Alpha-B crystallin gene (CRYAB) mutation causes dominant congenital posterior polar cataract in humans. Am J Hum Genet. 2001;69:1141–1145 [PubMed]
21. Padma T, Ayyagari R, Murty JS, et al. Autosomal dominant zonular cataract with sutural opacities localized to chromosome 17q11–12. Am J Hum Genet. 1995;57:840–845 [PubMed]
22. Billingsley G, Santhiya ST, Paterson AD, et al. CRYBA4, a novel human cataract gene, is also involved in microphthalmia. Am J Hum Genet. 2006;79:702–709 [PubMed]
23. Mackay DS, Boskovska OB, Knopf HLS, Lampi KJ, Shiels A. A nonsense mutation in CRYBB1 associated with autosomal dominant cataract linked to human chromosome 22q. Am J Hum Genet. 2002;71:1216–1221 [PubMed]
24. Kramer P, Yount J, Mitchell T, et al. A second gene for cerulean cataracts maps to the beta crystallin region on chromosome 22. Genomics. 1996;35:539–542 [PubMed]
25. Riazuddin SA, Yasmeen A, Yao W, et al. Mutations in betaB3-crystallin associated with autosomal recessive cataract in two Pakistani families. Invest Ophthalmol Vis Sci. 2005;46:2100–2106 [PubMed]
26. Héon E, Liu S, Billingsley G, et al. Gene localization for aculeiform cataract, on chromosome 2q33–35. Am J Hum Genet. 1998;63:921–926 [PubMed]
27. Sun H, Ma Z, Li Y, et al. Gamma-S crystallin gene (CRYGS) mutation causes dominant progressive cortical cataract in humans. J Med Genet. 2005;42:706–710 [PMC free article] [PubMed]
28. Zhang T, Hua R, Xiao W, et al. Mutations of the EPHA2 receptor tyrosine kinase gene cause autosomal dominant congenital cataract. Hum Mutat. 2009;30:E603–E611 [PubMed]
29. Chen J, Ma Z, Jiao X, et al. Mutations in FYCO1 cause autosomal-recessive congenital cataracts. Am J Hum Genet. 2011;88:827–838 [PubMed]
30. Pras E, Raz J, Yahalom V, et al. A nonsense mutation in the glucosaminyl (N-acetyl) transferase 2 gene (GCNT2): association with autosomal recessive congenital cataracts. Invest Ophthalmol Vis Sci. 2004;45:1940–1945 [PubMed]
31. Mackay D, Ionides A, Kibar Z, et al. Connexin46 mutations in autosomal dominant congenital cataract. Am J Hum Genet. 1999;64:1357–1364 [PubMed]
32. Shiels A, Mackay D, Ionides A, et al. A missense mutation in the human connexin50 gene (GJA8) underlies autosomal dominant “zonular pulverulent” cataract, on chromosome 1q. Am J Hum Genet. 1998;62:526–532 [PubMed]
33. Bu L, Jin Y, Shi Y, et al. Mutant DNA-binding domain of HSF4 is associated with autosomal dominant lamellar and Marner cataract. Nat Genet. 2002;31:276–278 [PubMed]
34. Pras E, Levy-Nissenbaum E, Bakhan T, et al. A missense mutation in the LIM2 gene is associated with autosomal recessive presenile cataract in an inbred Iraqi Jewish family. Am J Hum Genet. 2002;70:1363–1367 [PubMed]
35. Jamieson RV, Perveen R, Kerr B, et al. Domain disruption and mutation of the bZIP transcription factor, MAF, associated with cataract, ocular anterior segment dysgenesis and coloboma. Hum Mol Genet. 2002;11:33–42 [PubMed]
36. Berry V, Francis P, Kaushal S, Moore A, Bhattacharya S. Missense mutations in MIP underlie autosomal dominant “polymorphic” and lamellar cataracts linked to 12q. Nat Genet. 2000;25:15–17 [PubMed]
37. Khan K, Rudkin A, Parry DA, et al. Homozygous mutations in PXDN cause congenital cataract, corneal opacity, and developmental glaucoma. Am J Hum Genet. 2011;89:464–473 [PubMed]
38. Lachke SA, Alkuraya FS, Kneeland SC, et al. Mutations in the RNA granule component TDRD7 cause cataract and glaucoma. Science. 2011;331:1571–1576 [PMC free article] [PubMed]
39. Lachke SA, Higgins AW, Inagaki M, et al. The cell adhesion gene PVRL3 is associated with congenital ocular defects. Hum Genet. 2012;131:235–250 [PMC free article] [PubMed]
40. Bateman JB, Richter L, Flodman P, et al. A new locus for autosomal dominant cataract on chromosome 19: linkage analyses and screening of candidate genes. Invest Ophthalmol Vis Sci. 2006;47:3441–3449 [PubMed]
41. Hattersley K, Laurie KJ, Liebelt JE, et al. A novel syndrome of paediatric cataract, dysmorphism, ectodermal features, and developmental delay in Australian Aboriginal family maps to 1p35.3-p36.32. BMC Med Genet. 2010;11:165. [PMC free article] [PubMed]
42. Puk O, Löster J, Dalke C, et al. Mutation in a novel connexin-like gene (Gjf1) in the mouse affects early lens development and causes a variable small-eye phenotype. Invest Ophthalmol Vis Sci. 2008;49:1525–1532 [PubMed]
43. Lassen N, Bateman JB, Estey T, et al. Multiple and additive functions of ALDH3A1 and ALDH1A1: cataract phenotype and ocular oxidative damage in Aldh3a1(−/−)/Aldh1a1(−/−) knock-out mice. J Biol Chem. 2007;282:25668–25676 [PMC free article] [PubMed]
44. Eiberg H, Lund AM, Warburg M, Rosenberg T. Assignment of congenital cataract Volkmann type (CCV) to chromosome 1p36. Hum Genet. 1995;96:33–38 [PubMed]
45. Butt T, Yao W, Kaul H, et al. Localization of autosomal recessive congenital cataracts in consanguineous Pakistani families to a new locus on chromosome 1p. Mol Vis. 2007;13:1635–1640 [PubMed]
46. Wang L, Lin H, Shen Y, et al. A new locus for inherited nuclear cataract mapped to the long arm of chromosome 1. Mol Vis. 2007;13:1357–1362 [PubMed]
47. Gao L, Qin W, Cui H, et al. A novel locus of coralliform cataract mapped to chromosome 2p24-pter. J Hum Genet. 2005;50:305–310 [PubMed]
48. Khaliq S, Hameed A, Ismail M, Anwar K, Mehdi SQ. A novel locus for autosomal dominant nuclear cataract mapped to chromosome 2p12 in a Pakistani family. Invest Ophthalmol Vis Sci. 2002;43:2083–2087 [PubMed]
49. Abouzeid H, Meire FM, Osman I, et al. A new locus for congenital cataract, microcornea, microphthalmia, and atypical iris coloboma maps to chromosome 2. Ophthalmology. 2009;116:154–162 [PubMed]
50. Sabir N, Riazuddin SA, Butt T, et al. Mapping of a new locus associated with autosomal recessive congenital cataract to chromosome 3q. Mol Vis. 2010;16:2634–2638 [PMC free article] [PubMed]
51. Kaul H, Riazudddin SA, Yasmeen A, et al. A new locus for autosomal recessive congenital cataract identified in a Pakistani family. Mol Vis. 2010;16:240–245 [PMC free article] [PubMed]
52. Sabir N, Riazuddin SA, Kaul H, et al. Mapping of a novel locus associated with autosomal recessive congenital cataract to chromosome 8p. Mol Vis. 2010;16:2911–2915 [PMC free article] [PubMed]
53. Dash DP, Silvestri G, Hughes AE. Fine mapping of the keratoconus with cataract locus on chromosome 15q and candidate gene analysis. Mol Vis. 2006;12:499–505 [PubMed]
54. Berry V, Ionides AC, Moore AT, et al. A locus for autosomal dominant anterior polar cataract on chromosome 17p. Hum Mol Genet. 1996;5:415–419 [PubMed]
55. Armitage MM, Kivlin JD, Ferrell RE. A progressive early onset cataract gene maps to human chromosome 17q24. Nat Genet. 1995;9:37–40 [PubMed]
56. Zhao R, Yang Y, He X, et al. An autosomal dominant cataract locus mapped to 19q13-qter in a Chinese family. Mol Vis. 2011;17:265–269 [PMC free article] [PubMed]
57. Zhang S, Liu M, Dong JM, et al. Identification of a genetic locus for autosomal dominant infantile cataract on chromosome 20p12.1-p11.23 in a Chinese family. Mol Vis. 2008;14:1893–1897 [PMC free article] [PubMed]
58. Craig JE, Friend KL, Gecz J, et al. A novel locus for X-linked congenital cataract on Xq24. Mol Vis. 2008;14:721–726 [PMC free article] [PubMed]
59. Blackshaw S, Fraioli RE, Furukawa T, Cepko CL. Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes. Cell. 2001;107:579–589 [PubMed]
60. Diehn JJ, Diehn M, Marmor MF, Brown PO. Differential gene expression in anatomical compartments of the human eye. Genome Biol. 2005;6:R74. [PMC free article] [PubMed]
61. Brown JD, Dutta S, Bharti K, et al. Expression profiling during ocular development identifies 2 Nlz genes with a critical role in optic fissure closure. Proc Natl Acad Sci U S A. 2009;106:1462–1467 [PubMed]
62. de Boer BA, Ruijter JM, Voorbraak FPJM, Moorman AFM. More than a decade of developmental gene expression atlases: where are we now? Nucleic Acids Res. 2009;37:7349–7359 [PMC free article] [PubMed]
63. Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. Bioinformatics. 2011;27:1741–1748 [PMC free article] [PubMed]
64. Ernst S, Kirchner S, Krellner C, et al. Emerging local Kondo screening and spatial coherence in the heavy-fermion metal YbRh2Si2. Nature. 2011;474:362–366 [PubMed]
65. Ozgül RK, Siemiatkowska AM, Yucel D, et al. Exome sequencing and cis-regulatory mapping identify mutations in MAK, a gene encoding a regulator of ciliary length, as a cause of retinitis pigmentosa. Am J Hum Genet. 2011;89:253–264 [PubMed]

Articles from Investigative Ophthalmology & Visual Science are provided here courtesy of Association for Research in Vision and Ophthalmology