|Home | About | Journals | Submit | Contact Us | Français|
Retrotransposition of processed mRNAs is a frequent source of novel sequence acquired during the evolution of genomes. The vast majority of retroposed gene copies are inactive pseudogenes that rapidly acquire mutations that disrupt the reading frame, while precious few are conserved to become new genes. Utilizing a multi-breed association analysis in the domestic dog, we demonstrate that a recently acquired fgf4 retrogene causes chondrodysplasia, a short-legged phenotype that defines several common dog breeds including the dachshund, corgi and basset hound. The discovery that a single evolutionary event underlies a breed-defining phenotype for 19 diverse dog breeds demonstrates the importance of unique mutational events in constraining and directing phenotypic diversity in the domestic dog.
The domestic dog is arguably the most morphologically diverse species of mammal and theories abound regarding the source of its extreme variation (1). Two such theories rely on the structure and instability of the canine genome, either in an excess of rapidly mutating microsatellites (2) or an abundance of overactive SINEs (3), to create increased variability from which to select for new traits. Another theory suggests that domestication has allowed for the buildup of mildly deleterious mutations that, when combined, create the variation observed in the domestic dog (4). The notion of gene duplication as a major cause of morphologic diversity has received little attention.
The majority of phenotypic variation in domestic dogs is found among, rather than within, the over 350 recognized domestic dog breeds. One aspect of interbreed variation is leg length, with some of the most striking short-legged breeds displaying limb morphology characteristic of chondrodysplasia, also known as short-limbed or disproportional dwarfism (Table S1). The trait is a primary requirement in the American Kennel Club (AKC) “breed standard” for over a dozen domestic breeds including the dachshund, Pekingese, and basset hound, where it was found to be dominant and allelic based on arranged crosses (5). The phenotype primarily affects the length of the long bones, with growth plates calcifying early in development, thus producing shortened bones with a curved appearance (Figure 1A) (6, 7).
In order to identify the genetic foundations of breed-defining phenotypes such as canine chondrodysplasia, we developed a multi-breed approach for mapping fixed canine traits. A total of 835 dogs from 76 distinct breeds that provided maximal coverage of phenotypic variation were genotyped using the Affymetrix version 2.0 SNP chip (8, 9). Chondrodysplastic breeds, or “cases”, were defined based on specific morphologic criteria set forth in each breed standard (8, 10) and comprised 95 dogs from eight breeds. The “control” or non-chondrodysplastic group included 702 dogs from 64 breeds lacking the above features (Figure 1A, Table S1).
Single marker analysis revealed a strong association (odds ratio (OR) =33.54) between a SNP on chromosome 18 (CFA18) at base position 23,298,242 (CanFam2) and the chondrodysplasia phenotype (χ2 = 437; p-value = 9×10−104 uncorrected; Figure 1B). The second best peak of association was found at position 23,729,786; 431 kb telomeric to the first, with a p-value of 2×10−57. Because the p-values are inflated due to population structure (4% of p-values < 10−7), we also performed independent Mann-Whitney U-tests on the distribution of allele frequencies within the chondrodysplastic and control breeds. The two SNPs on CFA18 retained the strongest association with p-values of 1.15×10−5 and 2.74×10−5, respectively. The best haplotype across the chromosome spanned the five SNPs beginning at position 23,298,242 and ending at position 23,729,786 (uncorrected p-value =1.9×10−111) (Table S1).
Because registered members of a breed are expected to meet specific morphologic criteria, we hypothesized that breed-defining traits such as chondrodysplasia would be under strong selective pressure. We compared heterozygosity in 139 cases and 173 controls genotyped at an additional 64 SNPs that spanned the associated region (Table S2) and observed 125 kb (23,320,831–23,445,875) in which the cases displayed considerably lower levels of heterozygosity than the controls, indicative of a selective sweep (case average = 1.9%, control = 19.6%, p=6×10−6, paired t-test), (11–14).
Fifty-four amplicons were sequenced in 44 dogs from 20 breeds (nine case and 11 control) with a goal of 1) identifying additional SNPs; 2) searching for causative mutations; and 3) finding the smallest haplotype shared among chondrodysplastic breeds (Table S3). Fifty of the 123 SNPs identified formed a single continuous homozygous haplotype in all 26 chondrodysplastic dogs tested, covering approximately 24 kb (23,422,559 to 23,446,056) (Figure 2A). A portion of the 3’UTR of semaphorin 3c (sema3c), a putative thioredoxin-domain containing 1 (txndc1) pseudogene, and two evolutionarily conserved sequences are contained within the shared haplotype (Figure 2B).
An insert of approximately five kb starting at position 23,431,136 (Figure S1) was found by tiling PCR amplicons across the homozygous region. This insert was present in all dogs from the original eight breeds and 11 of 12 additional breeds that fit at least two of the three chondrodysplastic criteria (175 dogs from 19 breeds) (8). Seven of the 175 short-legged dogs were heterozygous for the insert (Table S4). The insert was not found in all 204 medium to long-legged dogs from 41 breeds that do not display the trait (Table S4).
Although the insertion was unambiguously associated with chondrodysplasia, the initial analysis did not address whether the position of the insert or its specific content was causative. We therefore sequenced the insert using an Illumina Genome Analyzer. A library was first created from a gel-extracted long-range PCR product that spanned the entire insert from two unrelated chondrodysplastic dogs (dachshund and Scottish terrier). The sequence data were assembled using Velvet algorithms (15). BLAT analysis (genome.ucsc.edu) revealed a single contig with complete alignment at 100% identity to fibroblast growth factor 4 (FGF4), which is located on CFA18 at position 51,439,516; approximately 30 Mb from the insert.
Using Sanger sequencing with primers designed from the annotated FGF4 gene sequence, together with the sequence surrounding the insertion site (Table S5), we were able to demonstrate that the insert contained a conserved fgf4 retrogene. Neither the introns nor the upstream promoter sequences of the gene were present in the insert, however all exons were present, with no alterations in the coding sequence, as well as the 3’ UTR and poly-A tail characteristic of retrotransposition of processed mRNA (Figure 3).
To determine if the retrogene was expressed we searched for retrogene specific sequences in complete cDNA of chondrodysplastic dogs. A single base at a position syntenic to chr18:51441601, 455 bp distal to the coding sequence of FGF4, differed between the retrogene and the source gene, with the former displaying an A nucleotide and the latter a G, in all samples tested. Both A and G alleles were observed in cDNA created from articular cartilage of the long bones of chondrodysplastic dogs (Figure 4A), while cDNA and genomic DNA samples were homozygous for the G allele in non-chondrodysplastic dogs as demonstrated in the restriction enzyme assay in Figure 4B.
Gene duplication through retrotransposition differs from a tandem duplication that may simply double the gene dosage (16) as the retrogene must acquire a new promoter, likely with a different expression profile, in order to be active. To accomplish this, retrogenes often borrow contextual regulatory elements (17). We therefore assessed the expression of thrombospondin receptor (CD36) and Sema3c, which are upstream and downstream of the insert. A PCR-based assay on cDNA from the articular cartilage of fetal and neonatal dogs revealed expression of both genes in the growing limb (Figure S2). Further examination of expression in cartilage tissues from adult dogs shows that though the surrounding genes were expressed, neither the source FGF4 gene nor the fgf4-retrogene were still expressed (Figure 3C), suggesting that the gene does not follow the expression pattern of its surroundings nor is it ubiquitously expressed and implying it has a specific time-sensitive role. The retrogene is inserted in the middle of a LINE with both LINEs and SINEs upstream (Figure 2B). These transposable elements likely provide the regulatory machinery necessary to promote expression of the fgf4 retrogene (18) with localization and temporal control coming from the intact 3’UTR (19).
We hypothesize that atypical expression of the FGF4 transcript in the chondrocytes may be causing inappropriate activation of one or more of the fibroblast growth factor receptors such as FGFR3. An activating mutation in FGFR3 is responsible for > 95% of achondroplasia cases, the most common form of dwarfism in humans, and 60–65% of hypochondroplasia cases, a human syndrome that is more similar in appearance to breed defining chondrodysplasia (reviewed in (20)). FGF4 has been shown to induce the expression of sprouty genes, which interfere with the ubiquitin mediated degradation of the FGF receptors including FGFR3, and over-expression of the sprouty genes can cause chondrodysplastic phenotypes in both mice and humans (21, 22).
The chondrodysplastic breeds were developed in many different countries for a variety of occupations (10). Based on genomic analysis of population structure, they do not share a recent common ancestry (23, 24). However, since we find a common haplotype of 24Kb surrounding the fgf4 retrogene in 19 short legged breeds it is likely the chondrodysplastic phenotype arose only once, before the division of early dogs into modern breeds. Thereafter, the retrogene and its associated phenotype were both maintained and propagated by breeders for purposes specific to each breed.
To further understand the origin of the fgf4 retrogene, we compared haplotypes from the source gene, the retrogene, and the insertion site in both dogs and their wild progenitor, the gray wolf. The ancestor of all chondrodysplastic breeds would have needed to carry both a source gene with the rare haplotype found in the retrogene, and the 24 Kb haplotype that defines the insertion site (Figure S3, Table S6). This combination was not found in any of the dogs that we tested but was identified in wolves from Europe and the Middle East, supporting fossil evidence that these populations contributed to the early development of the dog (25, 26).
Though retrogenes are recognized as an important source of novel functional elements found between recently diverged species (27–29), little is known about the relationship between retrotransposition and phenotypic variation within species (29, 30). We have found a single retrotransposition event producing a conserved, expressed retrogene that has strongly focused the evolutionary direction of morphological change in the dog, as at least 12% of American breeds share a common phenotype and the retrogene. This retrogene is actively segregating within the species, has a coding sequence that is identical to that of the source gene, and is the only example of a functional retrogene found in morphologically distinct populations of a single species that is actively maintained by selection. If such rare mutational events or “sports”, as Charles Darwin referred to them in The Origin of Species (31), happen only in the evolution of domestic animals, then these systems may be less informative for understanding the origin of evolutionary novelty in wild species. However, if the type of molecular phenomenon we have observed represents a class of genomic change associated with dramatic phenotypic evolution, such as that characteristic of adaptive radiation (17, 32, 33), then such genetic changes might be keystone molecular innovations.