The sample selected for this project included a subset of family members from our previously described AdRP Cohort (Bowne SJ, et al. IOVS
2007;48:ARVO E-Abstract 2334;).21,23,24,38,39
Families in the adRP cohort, based on pedigree analysis, have a high likelihood of having the autosomal dominant form of RP. Analysis of pedigrees for the likelihood of dominance versus X-linked or recessive inheritance did show some families with ad:Xl likelihood odds ratios of less than 0, indicating that the disease in a few families could be caused by mutations in an X-linked gene with clinical expression in carrier females ().
A proband from each family was tested previously for mutations in the complete coding regions of CA4, CRX, FSCN2, IMPDH1, NRL, PRPF31, RDS, RHO, ROM1, RP9, and TOPORS, and in mutation hot spots of RP1, PRPF3, PRPF8, NR2E3, and SNRNP200. Likely disease-causing mutations were identified in 141 of the 230 families. Only families without previously identified mutations were considered for this study.
Twenty-one families were selected for this project based on pedigree analysis and availability of family member DNAs. Two affected individuals from each family were selected to have the lowest kinship coefficient possible—that is, the most distant, available affected relatives with a common ancestor carrying a putative adRP mutation. This increased the probability that any shared variant identified in this project would be associated with disease, not just identical by chance. The demographic and pedigree characteristics of the families and the kinship coefficient of the two selected family members are shown in .
Six individuals with known mutations were also selected to use as positive controls in this study (). These individuals had a variety of mutation types in several different genes, thereby testing the identification rate of next-generation sequencing for different classes of DNA variants.
Each of the 46 genes selected for this project was (1) a known cause of adRP, (2) a known cause of other forms of retinal degeneration with phenotypes overlapping adRP, or (3) a potential disease-causing candidate gene selected from sensory cilium proteome studies, EyeSAGE data, and other studies of retinal expression and protein interaction (Liu Q, et al. IOVS
2006;47:ARVO E-Abstract 3725).30–33
The genes selected, their chromosomal location, and the associated diseases, if any, are listed in .
PCR primers corresponding to 1000 amplicons were designed to amplify all the coding and noncoding exonic sequences for each of the 46 genes selected. PCR primers were manufactured in sets of four with each set containing the same genome-specific sequence and one of four different tail sequences (M13, MD1, MD2, and MD3). These four tail sequences allowed PCR product from four individuals to be combined after amplification, while retaining the ability to distinguish the four individuals on sequence assembly.
Target Amplification and Library Construction
Genomic DNA from each of the 48 individuals tested in this study was amplified by WGA before target amplification. A 454GS FLX amplicon efficiency test was performed on a pool of the DNAs to optimize product pooling such that each amplimer represented in the sequencing library was relatively equivalent ().
Eleven patient pool libraries were constructed for analysis on the 454GS FLX sequencing system. Each of these pools corresponded to two sets of affected family pairs that had been amplified individually with the four different, tailed target primers. The six positive control samples were pooled to form one library that was not sorted by MD tail.
One concatenated, paired-end library was constructed for analysis (PE sequencer; Illumina/Solexa). Concatenation and shearing of the PCR products before library construction enabled access to those regions that, due to the short read length of GAIIx sequencing, would otherwise not be sequenced, and also gives more random distribution of all positions in a particular sequence. This feature introduces less quality bias based on position of the variation within a given sequence. The paired-end library pooled all 42 unknown affected individuals and the six positive controls. Since this library was not sortable, it was used only for variant confirmation, not individual variant identification.
454GS FLX Analyses of Positive Controls
Sequence reads corresponding to the pooled positive control library were aligned to Hs36 and analyzed for the presence of each known mutation. Five of the six mutations were detected at a read frequency of 6% to 13%, which is in accordance with the predicted detection rate of 8% (). The sixth variant, a large 47-kb deletion of PRPF31 and several flanking genes present in VCH029, was not detected using this technology, as expected.
Detection of Positive Controls in Pooled 454 FLX Run
Sequence Alignment and SNP Variant Detection
454GS FLX Reads.
Approximately 1.5 million 454GS FLX sequence reads were separated by individual, by using primer tails with typically 90% to 95% of reads identified unambiguously. Alignment of the sequence reads to Hs36 resulted in an average sequence depth of 70×, with 93% of reads mapping to the 46 target genes and identification of more than 9000 variants (, ).
Figure 1. Coverage from 454GS FLX sequencing of individual samples (Roche Diagnostics, Indianapolis, IN). The fraction of targeted positions (~250 kbp total) covered at 0× (red), 1× (green), 10× (light blue), and 20× (dark (more ...)
Sequence Data Generated on Next-Generation Platforms
The list of unfiltered variants was compared with the nonpathogenic variants identified in positive controls on the assumption that variants other than the one pathogenic mutation would not be disease-causing in other individuals. All variants found in the positive controls were removed, as were any nonpathogenic variants found in dbSNP (http://www.ncbi.nlm.nih.gov/SNP/
provided in the public domain by the National Institutes of Health, Bethesda, MD). Unfortunately, automated removal of variants in dbSNP was not possible, as a small portion of the variants in dbSNP are truly pathogenic.
The remaining 783 intermediate variants were analyzed to remove duplicate variants found in the same family leaving 420 unique variants. These variants were then annotated and prioritized based on their location in a gene (exon, intron, and splice-site) and on the predicted transcript or protein alteration and by manual assessment of the potential of the affected gene to cause RP. The resulting 112 variants were classified as potentially pathogenic, thereby warranting additional analysis ().
The 66.7 million 36-bp reads from the pooled GAIIx library (Illumina/Solexa) were aligned to Hs36 with an average sequence depth of 125× (). Variants were called and compared with the list of the 454GS FLX variants. To confirm a 454GS FLX variant required GAIIx read coverage of at least 100× and at least two variant-supporting reads. These data were confirmatory but not used in the initial identification of the 112 potential pathogenic variants.
Evaluation of Potential Pathogenic Variants
The 112 potential pathogenic variants were subjected to a series of analyses to determine whether they were true variants, if they segregated with disease in the family in which they were identified, and whether they were present in unaffected controls.
Fluorescent di-deoxy capillary sequencing was used to determine whether the variants identified by next-generation sequencing were actually genomic variants. Genomic DNAs from the original affected family pair and from two additional family members (when available) were tested with the corresponding M13 tailed primers for the original 1000 amplimer amplifications. Traditional Sanger sequencing showed that 55 of the 112 potential pathogenic variants were artifacts of 454GS FLX sequencing. An additional four of the potential pathogenic variants did not amplify within the genomic regions specified for the variant and so were also assumed to be artifacts. With this strategy, 55 of the potential pathogenic variants were confirmed to be present in the identified individuals.
Once a variant was determined to be real, segregation analysis was used to assess its likelihood of it being disease-causing. If initial analyses showed correct segregation in the first set of four family members tested, then all available family DNAs were tested. Forty-three of the 53 confirmed variants did not segregate with disease in the family and hence were determined to be benign. Ten of the variants segregated with disease in all available family members ().
Ten Potential Disease-Causing Variants
Three of the 10 segregating variants, KLHL7
p.G65D, and PRPF31
c.946–1, were identified and characterized in parallel laboratory testing and determined to be pathogenic.40,41
An additional variant in RPGR, p.G738*, was identified among the 10 segregating variants. Although not previously reported, RPGR
p.G738*, like many other reported RPGR
mutations, produces a premature termination codon in ORF15 and hence is most likely pathogenic. One additional segregating variant in GUCY2D
, p.R838C, has also been reported to cause cone–rod dystrophy.42
No further testing was performed for these five disease-causing mutations ().
Figure 3. Five families with identified pathogenic mutations. (A) VCH010. The p.A153V mutation in KLHL7 was present in all three affected family members tested. (B) VCH012. All five tested affected members of this family had the R838C mutation in GUCY2D (C) VCH017. (more ...)
Ethnically matched control population DNAs were tested for the possible presence of the remaining five variants of unknown pathogenicity. Three of the variants, PRPF8 c.1–51G>A, PITPNM3 p.R703W, and TTC26 c.896+73G>T were found in control DNAs and hence are benign. The two remaining variants, PROM1 c.1302+3C>T and MRFP c.641+9G>A, were not found in the controls. The number of immediately available family member DNA samples was low (three and two, respectively) for the PROM1 and MRFP variants. Subsequent collection and testing of three additional VCH008 family members demonstrated that the PROM1 c.1302+3C>T variant does not segregate with disease. Collection and testing of three additional VCH025 family members also demonstrated that the MRFP variant does not segregate with disease. These data demonstrate that both the PROM1 and MRFP variants are benign.
Analysis of the individual reads from 454GS FLX sequencing identified 77 small, high-confidence indels ranging in size from 1 to 3 bp. Indels with ambiguous positions or flanked by homopolymers were removed and the remaining compared with GAIIx indel data. A total of 10 indels were identified with the GAIIx data ().
Indels present in 454 FLX and GAIIx Sequence Reads
Indels were evaluated using the same fluorescent capillary sequencing strategy described above for the possibly pathogenic SNP variants. Traditional sequencing failed to confirm the presence of nine of the indels. The 10th indel, a 3-bp deletion in ORF15 of RPGR
was not assessed, since RPGR
is located on the X-chromosome and the family exhibited male-to-male transmission of RP. Furthermore, 3-bp deletions in ORF15 are common and usually benign.43–46