For the past several decades, standard methods for identifying genes underlying disease in a monogenic form have primarily been through selecting candidate genes for testing or by using positional cloning. The candidate gene approach requires less work and costs less because only the candidate gene needs to be sequenced, but this method requires prior knowledge of the pathogenesis of a disease for gene selection. This fundamentally impedes the disease gene identification speed because the pathogeneses of many diseases have not yet been unmasked. Without pathogenesis information of a disease, the traditional positional cloning strategy can be used first to map the disease gene in the chromosome and then to identify the disease-causing gene within a specific interval. Thus the pathogenesis of a disease can be explored based on the identified disease gene. However, the positional cloning method requires marked locus heterogeneity and the availability of a large family. Focusing on the exome can be especially fruitful in disease gene identification given that previous studies have indicated that approximately 85% of causal mutations for human diseases are located within the coding region and canonical splice acceptor and donor sites (
http://www.hgmd.cf.ac.uk/ac/index.php). Therefore, through sequencing and comparing the coding region of affected and unaffected individuals within a family and filtering the benign changes using a public database, such as 1000 Genomes Project and dbSNP databases, the mutation in the coding region can be identified even within small families and without knowing the pathway of a disease and marked locus heterogeneity. Currently, the cost of the exome sequencing method is even less than that of the positional cloning strategy. So, this method will not only speed up disease gene identification but will enable us to systematically tackle previously intractable monogenic disorders. In fact, exome sequencing approaches have been successfully used to identify disease genes for Mendelian disorders in recent studies
[21]–
[36]. Unfortunately, compared to the positional cloning strategy, the exome sequencing method may not identify the mutations in non-coding regions. This limitation promotes the use of the whole sequencing method to identify disease genes
[26],
[37]. Theoretically the whole gene sequencing will eventually become the best method of disease gene identification, because this method has the advantages of both positional cloning and exome sequencing methods. It can also identify disease genes caused by a large indel, inversion, translocation, and other chromosome structure aberrant. However, at the current stage, whole genome sequencing costs more and needs a lot of bioinformatics work, and this restricts its use in disease gene identification. Presently exome sequencing is a powerful tool with low cost for identifying genes that underlie disease. The whole genome sequencing method will very likely become the most powerful method for disease gene identification as the constant improvements to massively sequencing technologies and the impending massively parallel single-molecule sequencing technologies will reduce method costs and time barriers
[38]. Practically, candidate gene approach, positional cloning strategy and exome sequencing or whole genome sequencing methods has been combined to identify the disease-causing genes in humans
[39]–
[43].
Our data here indicate again that exome sequencing can rapidly identify genes causing dominant Mendelian diseases, which can occur in a heterozygous form. We further were able to identify this gene by sequencing exomes of only two affected patients and using available public databases, such as dbSNP131 and the 1000 Genomes Project. Additionally, the use of second-generation sequencing produces a high level of coverage, with subsequent higher accuracy, and allows more regions of a genome to be sequenced in a very cost effective manner. The 30-fold average coverage we obtained here is a very high sequencing depth. It covered 97% of the target sequence with ~96% accuracy rate, and thus allowed us to identify variants with high confidence.
Using this technique, we successfully identified a gene for high myopia in an affected family. Several lines of evidence provided support for the mutations in
ZNF644, and thus the mutated
ZNF644 gene, being the cause of high myopia: 1) only the S672G mutation identified in the two affected patients showed complete co-segregation with the disease phenotype in the family studied; 2) our analysis of the
ZNF644 gene in 300 unrelated, sporadic high myopia patients identified an additional three missense mutations and two mutations in the 3′ UTR which may affect mRNA stability or microRNA interaction; 3) none of these identified mutations were present in the 30 genomes of Han Chinese Beijing in the 1000 Genomes Project database, the Han Chinese Beijing SNPs in the dbSNP131 database, or 600 normal ethnicity-matched controls; and 4) Comparative analyses of
ZNF644 in other species showed that I587 is conserved and S672, R680, C699 are highly conserved among primates, placental animals, and other vertebrate species (
http://genome.ucsc.edu/cgi-bin/hgPal). Based on protein structure,
ZNF644 is predicted to be a transcription factor (
http://www.genecards.org/cgi-bin/carddisp.pl?gene=ZNF644), and given that it has potentially deleterious mutations in patients with high myopia, it may play a role in gene expression regulation in the retina and retinal pigment epithelium (RPE). One important issue in the genetic study of high myopia is the age of disease onset. We would have an informative censoring problem if family members of 951 did not show the disease phenotype because their age was too young. However, the disease phenotype studied in family 951 is very special. The disease onset was at 3–4 years old for all affected patients in the family 951 with high myopia; all affected patients developed high myopia by the age of seven. The youngest unaffected member in the family (V:2) is 9 years old now; he does not show any signs of myopia at all. In addition, all affected patients in the family had severe high myopia, which allowed us identify the affected patients easily. Therefore, there is very little chance that an unaffected family member does not show the trait by virtue of being too young.
Although it clearly has a ubiquitous level of expression, this is common for other genes involved in eye diseases (for example, the retinitis pigmentosa disease-causing gene
PRPC8 is ubiquitously expressed in human tissues
[44]), and its expression in eye tissue allows for the
ZNF644 gene having activity in the eye. Note that, given that less than 4% of the sporadic high myopia cases had mutations in
ZNF644 (we identified 5 different mutations in 11 patients out of 300 cases), the
ZNF644 gene is unlikely to play a major role in sporadic high myopia.
ZNF644 belongs to the Krüppel C2H2-type zinc-finger protein family, which contains 7 C2H2-type zinc fingers. Among the six identified mutations, four missense mutations were found clustered in exon 3 of the ZNF644 gene, suggesting that this exon may code for important protein domain structures or have regulatory functions. The other two mutations were located in the 3′ UTR of ZNF644 gene, which is a region often important for RNA degradation. The main feature of high myopia is axial elongation of the eye globe. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, a mutant ZNF644 protein may impact the normal eye development and therefore underlie the axial elongation of the eye globe in high myopia patients. However, the exact mechanism of ZNF644 action and its role in high myopia pathogenesis remains unclear, and future functional studies will be important. To date, there have been no documented studies on the ZNF644 gene, and the data here indicating its involvement in a devastating eye disease provide excellent motivation for future investigation of the ZNF644 gene, which in turn should enable dissection of its relationship with high myopia pathogenesis.