Considerable technical developments in array design and assay development have led to methods allowing efficient high-throughput SNP genotyping (25
). Some of these methods have achieved a level of affordability and ease of use such that genomewide linkage scans can now be performed using dense sets of SNPs. The Affymetrix 10K SNP platform is one of the several recently introduced platforms for high-throughput SNP genotyping, others include those developed by Applied Biosystems (http://www.appliedbiosystems.com/
), Motorola Life Sciences (http://www.motorola.com/lifesciences/
) and Illumina (http://www.illumina.com/
). These platforms allow a whole-genome scan for a disease locus to be completed with greater efficiency than most laboratories can currently achieve using conventional marker sets. This improvement in genotyping efficiency is especially relevant in the analysis of complex traits, which require large numbers of families to be continually collected and processed.
Studies utilizing SNPs as markers for genetic linkage searches have until now been predominately based on the evaluation of specific genomic regions or a comparison of the informativeness of SNP markers compared with microsatellite markers (5
). Here, we have evaluated the use of a high-density SNP genotyping platform to identify loci for three Mendelian diseases without any a priori knowledge of the genomic location of the disease predisposing mutation. Moreover, it is noteworthy that in one of the families (Family 1) a microsatellite-based genomewide linkage search had previously failed to identify a disease locus.
Novel loci were identified for the two recessive diseases studied. First, a locus for neonatal diabetes was identified on chromosome 10p13-p12.1 in Family 1. Forty known genes (based on Swiss-Prot, TrEMBL, mRNA and RefSeq) map to the 7.7 Mb region of linkage (UCSC Genome browser, http://genome.ucsc.edu/
, July 2003 release). Of these, 17 are predicted or hypothetical genes with little or no associated information regarding their biological function. At least three genes, phophatidylinositol 4-phosphate 5-kinase, type II, alpha (PIP5K2A
), pancreatic transcription factor 1 alpha (PTF1A
) and calcium channel, voltage-dependent, beta 2 subunit (CACNB2
), represent plausible candidates on the basis of either pancreatic and cerebellar expression in mammals and implied biological function of the expressed protein. Second, a locus for recessive craniosynostosis associated with calcification of the basal ganglia was mapped to chromosome 2p16.3-p14. The phenotype is distinctive from other documented forms of craniosynostosis that have been reported previously (16
). Potential candidates for the mutated protein causing disease in Family 2 include targets for the transcription factor TWIST or molecules whose biological pathway counteracts the normal fibroblast growth factor receptor signalling pathway (22
). Seventy-nine transcripts map to the 16.2 Mb region of linkage. Excluding the 44 predicted or hypothetical genes mapping to the region, there are no obvious candidates at present. The family segregating autosomal dominant renal dysplasia was linked to chromosome 10q23.31-q25.1 and a novel mutation in PAX2
demonstrated to be causative of disease. Although there is considerable variability in the severity of the disease phenotype within Family 3, previously identified missense mutations within PAX2
have been shown to be causative of equivalently sized phenotypic differences segregating within single families (OMIMTM
, MIM Number #120330: 24/03/2003 (http://www.ncbi.nlm.nih.gov/omim/
); The Human PAX2 Allelic Variant Database Web Site, http://pax2.hgu.mrc.ac.uk
The relationship between the information provided by SNP and microsatellite markers is not straightforward, however, it has been estimated that 1.7–2.5 SNP markers provide the same information as that of one microsatellite marker (9
). The information content (IC) averaged across the entire 10K SNP marker set is greater than that of the ABI medium density (MD10
) screening set of microsatellite markers (Applied Biosystems) and has been previously shown to contribute significantly to differences between microsatellite and SNP-based genome scans (26
). Since expected LOD scores correlate with IC, the 10K SNP array provides at least equal power to detect linkage compared with a search based upon a 5 Mb microsatellite screen.
While the median inter-marker distance between SNPs in the Affymetrix 10K array is only 104 kb with a mean genetic map distance of 0.31 cm (17
), markers are not uniformly spaced through the genome and a number of regions are significantly underrepresented (chromosomes 16p, 17q, 19p and 22q) (Figure ). This is especially problematic if only a single marker maps between two distant regions of higher IC but is itself non-informative within the individuals being genotyped. In each of the three genomewide linkage searches we conducted using the 10K SNP array, it was necessary to ‘infill’ such regions with additional microsatellite markers mapping to the corresponding genomic regions of the UCSC genetic map. As predicted, many of the areas of the genome that are underrepresented by SNPs are in telomeric and centromeric chromosomal regions (Figure ). The underrepresentation of informative SNP markers in these repetitive-rich regions of the genome closely mirrors a similar scarcity of markers in conventional commercial microsatellite screening sets, such as the medium density ABI (MD10
) set of markers (Figure ), which is a direct consequence of a global paucity of mapped polymorphisms within such regions. As SNP arrays offer the potential to conduct single-experiment genomewide linkage searches, it is highly desirable that marker coverage is improved specifically in the regions of low informativity, thus obviating the need for subsequent infilling with extra DNA markers. In addition to ensuring that coverage of the genome is more uniform, there is the issue of whether having a dense map of uniform SNPs compared to a less dense map of SNP clusters will enhance the overall assay performance for detecting true genetic linkage. This will of course depend on the type of linkage analysis performed and the size of the pedigrees analysed. Clusters of SNPs provide the maximal information for linkage mapping if they exhibit minimal linkage disequilibrium. Selecting a group of clustered SNPs therefore provides the likelihood for detecting the greatest number of haplotypes within a sample set (9
Figure 3 Comparison of the chromosome location and heterozygosity of SNPs within the Affymetrix 10K XbaI SNP array and microsatellites within the ABI medium density (MD10) marker set. Marker locations were taken from the UCSC database (http://genome.ucsc.edu (more ...)
The observable genotyping error rate in our genomewide linkage searches using the SNP array was vastly better than that achievable in microsatellite searches. However, it is more difficult to detect bi-allelic SNP-based genotyping errors by checking for Mendelian inconsistencies as compared with multi-allelic microsatellite markers, especially if only single generation, nuclear families are interrogated.
The volume of data generated from SNP-based genomewide scans is vast compared with conventional microsatellite-based searches. Robust software for archiving, manipulating and integrating marker data is essential if searches based on SNPs are to become commonplace. The recent upgrade of GDAS™ 3.0 software (Affymetrix Inc.) and the latest release of the pedigree database program ProgenyLab 6.0 (Progeny Software, South Bend, IN) permit automated export of linkage format files with the option of selecting a set number of markers per file for the entire genome. ProgenyLab also allows integration of SNP and microsatellite marker data into a single overlapping set and automated removal of non-Mendelian errors before linkage file export. These programs combined with the latest release of GENEHUNTER v2.1_r5 (http://www.fhcrc.org/labs/kruglyak/Downloads/
) permit processing of more than 300 SNPs in a single run. Such improvements in software go some way to address the inherent problems when using SNPs, however, further advances are required to produce seamless transition of SNP platforms to linkage statistics.
The performance of using a dense SNP marker set for conducting genomewide linkage screens is apparent from our analysis. In addition to the improved genetic resolution offered by these platforms over conventional microsatellite markers, they are inherently convenient as very small amounts of DNA are required (~250 ng) and additional samples can be readily typed at any given stage of a project. It is therefore clear on the basis of data presented herein that the platform, with certain caveats, provides a suitable alternative to using microsatellite markers for genomewide linkage searches. Our experience supports the view that genotyping platforms based on high-density SNP markers will shortly become the dominant technology for high-throughput linkage analyses.