The apparent dominant model of colorectal cancer (CRC) inheritance in several large families, without mutations in known CRC susceptibility genes, suggests the presence of so far unidentified genes with strong or moderate effect on the development of CRC. Linkage analysis could lead to identification of susceptibility genes in such families. In comparison to classical linkage analysis with multi-allelic markers, single nucleotide polymorphism (SNP) arrays have increased information content and can be processed with higher throughput. Therefore, SNP arrays can be excellent tools for linkage analysis. However, the vast number of SNPs on the SNP arrays, combined with large informative pedigrees (e.g. >35–40 bits), presents us with a computational complexity that is challenging for existing statistical packages or even exceeds their capacity. We therefore setup a procedure for linkage analysis in large pedigrees and validated the method by genotyping using SNP arrays of a colorectal cancer family with a known MLH1 germ line mutation.
Quality control of the genotype data was performed in Alohomora, Mega2 and SimWalk2, with removal of uninformative SNPs, Mendelian inconsistencies and Mendelian consistent errors, respectively. Linkage disequilibrium was measured by SNPLINK and Merlin. Parametric linkage analysis using two flanking markers was performed using MENDEL. For multipoint parametric linkage analysis and haplotype analysis, SimWalk2 was used.
On chromosome 3, in the MLH1-region, a LOD score of 1.9 was found by parametric linkage analysis using two flanking markers. On chromosome 11 a small region with LOD 1.1 was also detected. Upon linkage disequilibrium removal, multipoint linkage analysis yielded a LOD score of 2.1 in the MLH1 region, whereas the LOD score dropped to negative values in the region on chromosome 11. Subsequent haplotype analysis in the MLH1 region perfectly matched the mutation status of the family members.
We developed a workflow for linkage analysis in large families using high-density SNP arrays and validated this workflow in a family with colorectal cancer. Linkage disequilibrium has to be removed when using SNP arrays, because it can falsely inflate the LOD score. Haplotype analysis is adequate and can predict the carrier status of the family members.