The initial JH1 interval spanning about 5 Mb on BTA15 from 11,439,502 to 16,147,383 on the UMD 3.1 map
[13] was narrowed to a 15-marker window (15,162,470 to 15,949,175) through analysis of additional SNP50 genotypes submitted to the National Dairy Database
[24]. In all, the AIPL database contained 23 animals with both the source haplotype and a crossover haplotype that helped this refinement. Also, 34 haplotypes containing the suspect region from the 75-SNP haplotypes were identified in the fine-mapped region, and animals possessing these haplotypes were labeled as carriers. The frequency of JH1 carriers in the analyzed population increased from 21.7% to 23.3% when carriers diagnosed from crossover haplotypes were used in addition to the source haplotype for diagnosis.
Considering the refined area JH1 interval defined by 15 SNP50 markers, 17 crossover haplotypes were detected. The carrier status of animals with these crossover haplotypes was unknown. Only crossover haplotypes that included all of the 15 markers were labeled as carriers, and the remaining haplotypes labeled as non-carriers. Thus, reported JH1 status is conservative, and some heterozygous JH1 animals may have been reported as non-carriers. The true status of these animals could be discovered by breeding trials or by identification of the causative variation through re-sequencing.
Based on this mapping analysis, all heterozygous JH1 animals contained with the CDDR repository were considered as candidates for whole genome re-sequencing. Criteria for selection considered both pedigree relationships between JH1 carriers and non-carrier allele haplotype. The animals chosen for whole genome sequencing included an Observer Chocolate Soldier son (most patriarchal JH1 carrier) and 10 more recent carriers with differing non-carrier haplotype (). Sequence coverage yields between the 11 animals varied, but were sufficient to identify heterozygous SNP in the JH1 refined interval. Using all data, sequence coverage in this region was quite extensive with 99.93% of all genomic locations covered by greater than three reads (combined samples) and all of the region was covered by at least one read. Additionally, the top 8 samples all covered at least 91% of the region at a read depth of 3 or more (99.2% position with >1 read depth).
| Table 2The ten JH1 carrier bulls used for next generation sequencing. |
Within the original JH1 interval from 11,439,502 to 16,147,383, 20,805 variants were identified that included 17,585 SNPs and 3,220 INDELs. After filtering to retain only heterozygous loci a total of 262 variants remained: 244 SNPs and 18 INDELs. Within the refined JH1 interval, there were only 36 SNP and 2 INDELs. Repeat masking removed 22 SNP and 1 INDEL from the candidate list leaving 15 SNP and one INDEL as potential candidates for the JH1 mutation. Functional annotation identified a single high-impact stop-gain SNP located at position 15,707,169 on BTA15. This C-to-T transition SNP results in an Arginine to a stop codon in exon 3 of
CWC15, the bovine protein CWC15 homolog of a spliceosome-associated protein
[25]. This nonsense mutation would reduce the size of the
CWC15 protein product from 231 amino acids in length to only 54 amino acids. A NCBI conserved domains search on the bovine CWC15 protein product reveals that this truncated protein would not have the conserved Cwf_Cwc_15 (pfam04889) domain present in the wildtype. None of the other 14 SNP or the INDEL fell within the coding regions of the three genes in the refined JH1 interval, but there was one SNP within the 3′ UTR of A7YY77. A Sequenom panel of 29 SNP assays was designed for the validation test
[26] based on the 15 SNPs identified by sequence analysis of the refined JH1 interval region (
Table S1).
The diagnostic validation of SNP15707169 as the causative mutation was tested against 749 samples. After correcting for one incorrect JH1 assignment due to a crossover in the region, SNP15707169 was 100% concordant with JH1 status based on haplotype (
Table S1). Comparing JH1 status to the other 14 SNP loci revealed two other SNP with 100% concordance; however, neither SNP was within the expressed portion of a gene. The SNP calls from all duplicate assays (bi-directional) were in agreement.
Supporting our case for the
CWC15 stop gain SNP as the causative SNP was bovine expression data. Although quantitative expression data for this gene is limited, Harhay and colleagues
[27] found
CWC15 was expressed in all 87 bovine tissues supporting its role as an essential gene for cell function. Interestingly, comparison of the expression levels between tissues revealed there was a 7-fold difference normalized tag abundance. Some of the tissues with the highest relative expression of this gene included portions of the placenta and the uterine attachment to the placenta (InnateDB;
[28]).
CWC15 is associated in humans with the
PRP19/
CDC5 complex, which is thought to play an important role in mediating spliceosome activation
[29]. Duan et al.
[30] have shown that the mouse
CWC15 gene is expressed during early embryonic development. Perhaps a bovine conceptus survives in the absence of functional
CWC15 protein for a short duration until the presence of this gene becomes completely essential for efficient alternative splicing needed for proper development.
For JH1, the causative mutation in
CWC15 is a SNP for which a diagnostic test can easily be developed to identify new carriers. In this study, 730 test results from
CWC15 carriers were used to impute test results for 6,784 animals with 50K genotypes, and concordance was 99.3% between JH1 haplotype status and
CWC15 gene test status. About 1% of the animals reported as free of JH1 were identified as carriers using the imputed SNP genotypes. Other single mutations with additive or recessive inheritance could be imputed using similar methods. The location of the causative mutation –15,707,169 bp – can be used instead of the haplotype to determine carrier status, although detection from haplotypes will continue to be useful until the causative allele is added to SNP chips. Experience with the BovineHD Genotyping BeadChip (Illumina, Inc., San Diego, CA) has shown that having >1,000 animals genotyped for a polymorphism are sufficient to impute genotypes with close to 99% accuracy for >100,000 other animals
[16]. Known genotypes for casein variants
[31] or diacylglycerol O-acyltransferase 1
[32] for a modest number of individuals could be used to impute missing phenotypes for all other genotyped animals.
Three previously identified genetic defects in Holstein cattle (bovine leukocyte adhesion deficiency (BLAD;
[33]), deficiency of uridine monophosphate synthase (DUMPS;
[34]), and mulefoot
[35]), as well as three deleterious recessives in Brown Swiss cattle (weaver syndrome
[36], spinal dysmyelination
[37], and spinal muscular atrophy
[38]) were each matched to haplotypes using the same automated procedures used to predict JH1. lists the conditions, chromosomal positions, number of tested animals, and number of newly identified carriers detected for each of these single-gene traits using March 2012 genotype data. The newly identified carriers have been genotyped but not yet tested for the defect. Results below indicate that official causative mutation or progeny tests would confirm >95% of these haplotype carrier identifications.
| Table 3Haplotype vs. official test results for 6 known recessive conditions in Holstein (HO) and Brown Swiss (BS) cattle. |
Haplotype detection is even more accurate if official test results for known mutations are incorporated as an additional SNP within each haplotype. For example, complex vertebral malformation (CVM)
[2] could not previously be tracked accurately because two versions of the haplotype exist, one containing and one lacking the causative mutation. Inclusion of the official CVM tests from ancestor bulls allows accurate tracking within the pedigrees of descendants even if they are not tested directly for CVM. However, animals that inherit the equivalent, non-defective haplotype could still be falsely labeled as CVM carriers if pedigree information is missing. Accuracy of carrier status already genotyped animals should also improve if genotypes for the newly discovered mutations are used in addition to the nearby SNP markers.