|Home | About | Journals | Submit | Contact Us | Français|
A large scale bioinformatics and molecular analysis of a 34 Mb interval on chromosome 6q12 was undertaken as part of our ongoing study to identify the gene responsible for an autosomal recessive retinitis pigmentosa (arRP) locus, RP25. Extensive bioinformatics analysis indicated in excess of 110 genes within the region and we also noted unfinished sequence on chromosome 6q in the Human Genome Database, between 58 and 61.2 Mb. Forty three genes within the RP25 interval were considered as good candidates for mutation screening. Direct sequence analysis of the selected genes in 7 Spanish families with arRP revealed a total of 244 sequence variants, of which 67 were novel but none were pathogenic. This, together with previous reports, excludes 60 genes within the interval (~55%) as disease causing for RP. To investigate if copy number variation (CNV) exists within RP25, a comparative genomic hybridization (CGH) analysis was performed on a consanguineous family. A clone from the tiling path array, chr6tp-19C7, spanning ~100-Kb was found to be deleted in all affected members of the family, leading to a major refinement of the interval. This will eventually have a significant impact on cloning of the RP25 gene.
Retinal dystrophies represent the most common inherited form of human visual handicap, with an estimated prevalence of 1 in 3000. They can be classified according to which photoreceptor system, rods or cones, is primarily affected (Bird, 1995). Disorders primarily affecting the rod system such as retinitis pigmentosa (RP) present initially with night blindness and may progress to involve the peripheral visual field, with relative preservation of central vision. On the other hand, cone dystrophies manifest initially with loss of central visual acuity and defects in colour vision without significant loss of peripheral vision.
Several loci with retinal dystrophy phenotypes have been mapped to the pericentromeric region of chromosome 6 (6q14-q21) (Abd El-Aziz et al. 2005). The genes for most of these loci have been identified. For example, ELOVL4 is the gene for autosomal dominant Stargardt-like disease (STGD3) as well as for autosomal dominant macular dystrophy (adMD) (Zhang et al. 2001). RIM1 and IMPG1 have also been reported as the causative genes for cone-rod dystrophy (CORD7) and benign concentric annular macular dystrophy (BCAMD), respectively (Johnson et al. 2003; van Lith-Verboeven et al. 2004). Recently, the gene for Leber congenital amaurosis type 5 (LCA5), Lebercillin, was also identified (den Hollander et al. 2007). It is interesting to note that the above retinal genes overlap with the autosomal recessive RP locus, RP25, for which the gene still remains to be identified.
The locus for RP25 was originally established by homozygosity mapping through targeting functional candidate genes. This resulted in the mapping of 4 Spanish families with arRP between microsatellite markers D6S257 and D6S1644 (Ruiz et al. 1998). The region flanked by these markers is ~34 Mb in size and contains approximately 110 known and predicted genes. Subsequently, evidence of linkage to the same region has been reported in three additional Spanish families with arRP as well as in other ethnic groups (Khaliq et al. 1999; Barragan et al. 2005a; Abd El-Aziz et al. 2007). Based on these findings, the isolation of the RP25 gene would represent a major achievement in RP research.
Previously we and others reported the exclusion of 17 genes from the RP25 interval (Marcos et al. 2000; Li et al. 2001; Marcos et al. 2002, 2003; Barragan et al. 2005a, 2005b; Abd El-Aziz et al. 2005, 2006; Barragan et al. 2008. Herein we report extensive molecular analysis of the original RP25 interval (~34 Mb) and mutation screening of a further 43 genes; therefore along with previous reports, 55% of the 110 genes have been evaluated as to their role in disease-causation for RP25. In addition, a comparative genomic hybridization (CGH) analysis was performed on one of the RP25 families to investigate whether copy number variations (CNVs) exist and if this could have an impact on the RP25 phenotype.
Seven Spanish families, two consanguineous (RP5 and RP167) and five non-consanguineous (RP73, RP214, RP299, RP260 and RP235), were included in the study (Fig. 1). Informed consent was obtained from all participants for clinical and molecular genetic studies. The study conformed to the tenets of the Declaration of Helsinki.
Genomic sequence of the screened genes within the interval was accessed through the National Centre for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/), the ENSEMBL database (http://www.ensembl.org) and the UCSC human genome browser (http://genome.ucsc.edu/). Information on expression pattern and expressed sequence tags (ESTs) were obtained from the NCBI UniGene database. In cases where several alternatively spliced transcripts were documented in ENSEMBL, or the exonic structure was uncharacterised, the BLAST tool (available through the NCBI) and the 6 Frame Translation tool from the Baylor College of Medicine (BCM) Search Launcher (http://searchlauncher.bcm.tmc.edu/seq-util/seq-util.html) were used to compare the sequences from these transcripts and the human genome sequence databases and to define the coding and protein sequences.
In order to evaluate the pathogenicity of the novel variants, we employed various softwares which analyse the potential role of a given variant on the function or structure of the encoded protein based on conservation and homology, physical properties of the amino acids, and prediction of the protein disorder (Conseq: http://conseq.tau.ac.il/; PolyPhen: http://coot.embl.de/PolyPhen/; SIFT: http://blocks.fhcrc.org/sift/SIFT.html; Disopred: http://bioinf.cs.ucl.ac.uk/disopred/disopred.html) (Thusberg & Vihinen 2006). In addition, intronic variants were evaluated for affecting any regulatory process at the transcriptional or splicing levels (TESS Transcription Element Search System: http://www.cbil.upenn.edu/cgi-bin/tess/tess; http://www.fruitfly.org/seq_tools/splice.html Splice Signal Analysis: http://www.ebi.ac.uk/asd-srv/wb.cgi; Alternative Splicing DataBase: http://hazelton.lbl.gov/~teplitski/alt/; http:/www.fruitfly.org/cgi-bin/seq_tools/splice.pl; Splicing Element Annotation: http://genes.mit.edu/acescan2/index.html; http://www.ensembl.org/Homo_sapiens/generegulationview; http://www.cisred.org/content/software; http://regrna.mbc.nctu.edu.tw/html/about.html) (Yeo et al. 2004; Matlin et al. 2005; Wang & Marin 2006).
In total 530 primers pairs were designed using Primer 3 Output (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) in order to ensure a total coverage of the entire coding region, the intronic flanking sequences, the regulatory factors binding sites and the 5′ untranslated region (UTR) of the major transcript as well as the additional exons contributed by other published alternatively spliced isoforms (details of primers are available on request). Polymerase chain reaction (PCR) was carried out as previously reported (Barragan et al. 2005a).
The PCR products were purified by adding Exosap [1 U shrimp alkaline phosphatase (SAP, Amersham LifeScience, Buckinghamshire, UK; ExoSAP-IT®, USB Corporation) and 1 U Exonuclease I (United States Biochemicals, Ohio, USA) to 1 μl of the PCR product and incubated at 37 °C for 15 minutes then at 80 °C for another 15 minutes to deactivate the enzyme. Sequencing reaction was then performed as previously described (Abd El-Aziz et al. 2005). Direct sequence analysis of one affected and one unaffected individual from each of the Spanish families was performed on the automated fluorescence DNA sequencer (ABI 3730, Applied Biosystems, Warrington, UK), according to manufacture’s instructions. Subsequently, the data were analysed using SeqManTM 4.03 software (DNAStar, Wisconsin, USA).
Novel variants were tested in a matching control population by direct sequencing using the same protocol as for the mutation detection. In some other cases the changes were genotyped by fragment analysis using primers labelled at the 5′ end with 6-FAM fluorocrome. PCR products were then injected in an ABI3730 automated genotyper. Subsequently, data collection and allele identification were performed using GeneMapper software (Applied Biosystems).
Comparative genomic hybridization (CGH) was performed on 6 DNA samples (parents, 3 affected and 1 unaffected members) from one of the consanguineous families (RP5) using a whole genome tiling path (WGTP) array. The methods employed were as previously published (Redon et al. 2006). Two pairs of primers, Fragment 1 F and R (ACAATCCCCTGTCATGGCTA and CCTTTCCACCTGTATATTAGGAGA) and Fragment 2 F and R (GATCCACTGACAGCTTGCAC and TCGCAAATCTCATGTACTTCTCA), were designed at the beginning and end of a deleted clone. PCR amplification was then carried out in order to test the existence of this clone in all members of the family.
According to the genetic data obtained from the RP25 linked families, we have divided the RP25 interval into four regions (A, B, C and D; Fig. 2) which as stated earlier contains in excess of 110 genes. Region A, between microsatellite markers D6S257 and D6S1053, spans approximately 8 Mb and contains 15 genes. It is important to note that within this region a part of the human genome (Ensembl release 48), ~50 Kb between 58 and 59 Mb and ~3 Mb between 59 and 61.2 Mb on chromosome 6q, remains to be sequenced. A ~6 Mb interval between microsatellite markers D6S1053 and D6S1557 constitutes the second region (B) and contains 11 genes. The third region (C) is localised between D6S1557 and D6S421 spans only ~1 Mb and contains 7 genes. The last region, D, is the largest since it spans the rest of the RP25 locus (~18 Mb) and contains 79 genes (Fig. 2). Even though a large number of genes are included within the region, some of the sequenced intervals are devoid of genes. These regions could either represent gene desert areas in the human genome or regions where newly predicted genes might soon appear.
A comprehensive bioinformatics analysis of the RP25 interval indicated approximately 110 genes with many of them showing retinal expression. Information about gene function, expression pattern and/or the genetic data was among the criteria for selecting candidate genes from this interval. Here we present data on 43 genes, and a description of the adopted strategies for selecting these genes is detailed below.
Based on the organization of the RP25 interval as described above, region A was evaluated because of the possibility that the Pakistani family, the 3 Chinese and 5 of the Spanish families may have a common gene. Hence, 12 genes from this region were considered as good candidates for mutation screening (Table 1). Secondly, 5 genes, from region B where all families except the Pakistani and one of the Spanish (RP260) families shared linkage, were also considered as high priority for screening (Fig. 2 and Table 1). Following the assumption that apart from the Pakistani family all families could have a common gene; 3 genes from region C were evaluated (Table 1).
Lastly, the possibility of the disease gene lying within region D from which the remaining 23 genes were chosen cannot be ruled out. These genes were selected according to their functional relevance to RP; for example homologous genes from the ciliated Trypanosome mapping to retinal disease loci were used to identify functional candidates (Table 1) (Broadhead et al. 2006). This was based on the hypothesis that successful transmission of several proteins from the cell body to the outer segment of the photoreceptor cells depends on their transport along a modified cilium and that defective passage of certain molecules results in retinal degeneration associated with RP (Marszalek & Goldstein, 2000). Also, the study of novel genes whose expression can be modulated by a primary defect in a retina-specific gene was also used as an approach to identify functional candidates. For this reason the genes highlighted by the retinal degeneration (rd) mouse model, caused by a defect in the β subunit of rod cGMP-phosphodiesterase, were considered as good candidates (Table 1) (Bowes et al. 1990). Finally, EyeSAGE (Serial Analysis of Gene Expression) and the National Eye Institute Bank (NEIBank) were used as excellent databases comparing the pattern of gene expression in the retina and retinal pigment epithelium (RPE) to those in different tissues and hence highlighting important genes for different eye diseases including RP25 (Table 1).
The results of bioinformatics analyses for exon-intron structure, gene size and number of transcripts of all the screened genes apart from 3 genes, PRIM2A and LOC442225 and LOC442226, confirmed the previously published data (Table 1). Both LOC442225 and LOC442226 were uncharacterised and hence the Blast and the 6 Frame Translation tools were used to identify their exonic structure. Additionally, at the time of screening the PRIM2A gene, only 10 exons were reported in the Ensembl database, the initiation codon (ATG) was predicted but the stop codon had not been identified. Recently 4 additional exons have been predicted; however the stop codon still remains to be identified and hence the open reading frame (ORF) needs to be fully characterised.
Mutation screening of the selected 43 genes led to the identification of 244 sequence variants of which 76 were novel (Table 2). All changes were assigned a nucleotide number starting at the first translation base of all genes studied according to the GenBank entries summarised in table 2.
A considerable number of novel variants were excluded as pathogenic, based on their non-segregation with the disease phenotype or their presence in the control population.
Of the 76 identified sequence variants, only 5 showed segregation with the disease phenotype and were considered as potentially pathogenic. The first change was a heterozygous amino acid substitution of an aspartic acid into glycine (D118G) within the C6orf165 gene and was observed in the proband of a non-consanguineous family (RP73). Even though the D118G genetic variant was not detected in 192 control chromosomes it could not be considered as a pathogenic alteration due to the absence of a second change that is necessary to explain the phenotype in the case of a recessive disorder.
Mutation screening of BAG2 revealed 2 heterozygous novel variants, c.596-69_75delGAGAT and c.114-24G>A, in the RP299 and RP73 families, respectively. The deletion in family RP299 has not been detected in 192 control chromosomes, whereas the G>A transition identified in family RP73 was detected in a control individual. Nevertheless, in silico information has not provided enough support to consider these changes as pathogenic in the absence of a second change in either family.
The last 2 changes, C>A transversion at c.758 + 16 C>A and T>C transition in intron 14 at c.1498-3T>C, were identified within the CD109 gene in the RP214 and RP299 families, respectively. Both changes were segregating with the disease phenotype and were not detected in 192 control chromosomes. However, as we could not find a second change in either family, these changes are likely to be rare polymorphisms.
Interestingly, all the changes within one of the screened genes, PRIM2A, were found to be heterozygous in all family members. This does not fit with the genetic data for the RP5 and RP214 families, since affected individuals in both should be homozygous for any changes. By blast search no additional copy of this gene was detected elsewhere in the human genome; however by searching the database for genome variants (TCAG), CNV was reported for the PRIM2A gene (Redon et al. 2006). Additionally, CNV was reported for other genes within this interval, such as EGFL11, FKBP1C, GUSBL2, and LOC442226; in all cases heterozygous SNPs were observed in affected samples tested in both RP5 and RP214 families. CGH analysis was undertaken to look for CNV as an explanation for the above finding.
The data obtained from the WGTP analysis revealed an inappropriate signal compatible with a deletion of a single clone in affected members of the RP5 family. The clone, chr6tp-19C7, spans ~100-Kb and is located at 65.7 Mb. A 95-Kb region proximal to this clone was not covered by the array used in this experiment. Therefore, it is possible that the deleted clone may extend over this gap. Additionally, a normal signal was observed from the Chr6tp-10D10 clone upstream of the gap. On the other hand, the deleted clone overlaps by 40-Kb with its distal neighbour, Chr6tp-10G7, which also shows a normal signal (Fig. 3). Based on this data it is possible that the deleted clone could be as much as 200-Kb in size.
The identification of at least a 100-Kb deletion in family RP5 is interesting and indicates that one of the genes, Q5T1H1, Q9H557, Q5TEL3, Q5TEL4, Q5VVG4 and Q5T3C8, within this interval might be responsible for the RP25 phenotype (Fig. 3). However, we cannot rule out the possibility that this deletion might represent a rare non-pathogenic CNV. The existence of the deletion was further validated by non-amplification of two PCR amplicons designed within the deleted clone in all affected members of the family (Fig. 4). It is noteworthy that the genes within the deleted clone interval share a common epidermal growth factor (EGF-like) domain, however they represent incomplete transcripts that range between 1 to 4 exons, without an initiation or stop codon. Assuming that one of the genes from the deleted interval would be responsible for the RP25 phenotype, the families whose linked regions do not overlap with the deletion, such as RP235 and RP260 (Fig. 2), should not possess mutations in this gene. Linkage to the RP25 interval in these two families could therefore be coincidental, since many regions in the genome are expected to show suggestive evidence of linkage given the small size of these families.
Herein, we report the molecular analysis of a 34 Mb interval on chromosome 6q which resulted in the exclusion of forty three genes as disease-causing for RP25 in 7 Spanish families. In addition, we have identified a ~100 Kb deletion within this interval in one of the RP25 families leading to a major refinement of the locus that should facilitate the subsequent identification of the disease gene (Fig. (Fig.22 and and33).
The RP25 interval spans approximately 34 Mb containing more than 110 genes, and bioinformatics analysis has revealed a sequence gap of ~3 Mb close to the centromere of chromosome 6. Moreover, some of the sequenced intervals within the RP25 region are devoid of any genes, representing either gene desert areas in the human genome or regions containing unpredicted genes that are yet to be characterised. Therefore, the number of genes within the RP25 interval should be considered as a rough estimate. However, given the large number of genes, it was impractical to screen all of them without adopting a specific strategy. As a result 43 genes were selected as good candidates for mutation screening in the Spanish families.
Both positional and functional candidate gene approaches were employed in order to identify the RP25 gene. The two strategies were overlapping throughout the study. The positional approach was considered as an essential part of the strategy for selection of candidate genes. This is due to fact that within the linkage interval there was no shared haplotype between each of the Spanish and/or other families originating from diverse geographical regions (Fig. 2). Different mutations within one gene are therefore expected to be responsible for the RP phenotype in all the families. However, the possibility of two genes responsible for the RP25 phenotype cannot be ruled out based on the presence of different crossovers from different families.
The functional candidate gene approach was also used to select genes for mutation screening. This is based on the fact that a gene could demonstrate a functional relationship with the underlying defect (RP) or there could be human homologs of genes responsible for a similar disease phenotype in other species. Alternatively, some genes might have been identified as members of a gene family of which other members have been implicated in a related disorder. For example, C6orf165 and SLC35A1 were considered as the human orthologs for the protein that was shown to be expressed in the cilia of the Trypanosome proteome, which made these genes excellent candidates (Broadhead et al. 2006).
Highly efficient genetic databases as well as information about genes expressed in the rd mouse were also used in order to identify functional candidates (Table 1). Interestingly, our strategy of selecting genes from the RP25 interval has been further validated by the identification of 2 genes within this locus, Lebercillin and COL9A1, as responsible for the retinal degeneration phenotype in both LCA5 (Leber Congenital Amaurosis), and Stickler syndrome, respectively (den Hollander et al. 2007; Van Camp et al. 2006).
Using the above described candidate gene approaches, 43 genes underwent mutation screening in 7 Spanish families. In total 244 genetic variants were identified, of which 76 were novel. Out of the 76 SNPs only 5 were initially considered as significant changes in terms of either segregation with the disease phenotype and/or absence in 192 matching control chromosomes. However, by further investigations they were ruled out as pathogenic since they have been observed in non-consanguineous families where a second change is necessary to explain their role in disease causation.
In any of the studied genes all the identified SNPs segregated with the genetic data apart from 5 genes, PRIM2A, EGFL11, FKBP1C, GUSBL2 and LOC442226, where heterozygous SNPs were observed within an interval of homozygosity. Even though this is in contradiction to the linkage data it was only acceptable in this case since a second copy of the gene was discovered to be existing within chromosome 6q12. It has been reported that CNVs could have a direct effect on transcription regulation which in turn may influence disease susceptibility and phenotypic variation (Redon et al. 2006). Similarly, we have postulated that CNVs existing within the RP25 locus could have an impact on the phenotype of the RP25 families. For this purpose a WGTP array was carried out on one family (RP5) and the data obtained revealed that a clone within the RP25 locus was deleted in all of the affected members of the investigated family. It is interesting to note that recently 5 new Spanish families showed suggestive evidence of linkage to the RP25 locus. Moreover a crossover in one of the families has probably refined the interval from 34 to 16 Mb between microsatellite markers D6S257 and D6S1557 (published in this issue). This complements our current findings presented here.
In summary no pathogenic changes were observed in the screened genes; hence their role in the causation of the RP25 phenotype is excluded. Nevertheless, the exclusion of this significant number of genes from the interval would help in prioritising the remaining genes for mutation screening and hence in identifying the causative gene for RP25. The identification of a deleted clone from the tiling path array has narrowed down the RP25 interval from 34 Mb to only 100 Kb. It is very likely that one of the genes within the deleted clone interval could be responsible for the RP phenotype in the studied families. This confirms our initial assumption of the presence of more than one gene as responsible for RP25. If that is the case we should expect the mutations in this gene to be responsible for the RP phenotype in all RP25 linked families apart from two families, (RP235 and RP260). Future work will therefore involve full characterisation and mutation screening of the current and/or novel genes within the deleted interval in the RP25 families.
We would like to thank the families who participated in the study. This study was funded by Fondo de Investigación Sanitaria, Spain (PI050857), Consejería de Salud, Junta de Andalucía, Spain(PI-0334/2007), British Retinitis Pigmentosa Society (Grant ref.GR556), NIHR Biomedical Research Centre for Ophthalmology (BMRC) and Special Trustees of Moorfields Eye Hospital.