Detection of structural variants
The Porcine SNP60 BeadChip data from 55 IBMAP animals were analyzed by multiple predictions from three different programs: cnvPartition (Illumina), PennCNV [17
] and GADA [18
]. The initial number of CNVs called by each software was 94, 84, and 200, respectively. Figure summarizes the CNVs identified and compares the results obtained from the three programs.
Overlapping CNV events from the three programs used in the analysis.
For further analyses, we retained only CNVs applying a more stringent criterion, namely CNV regions (CNVRs) containing overlapping CNVs recalled by at least two programs, spanning three or more consecutive SNPs and detected in a minimum of two animals. A total of 49 CNVRs located in 13 of the 18 analyzed autosomal chromosomes were identified (Figure ). All of these CNVRs showed Mendelian inheritance in animals across several generations of the IBMAP cross and therefore are unlikely to be artefacts or false positives, suggesting that our empirical criterion to retain CNVRs is reasonable.
Figure 2 Graphical representation of the CNVRs detected. Red triangles represent loss predicted status, gains are indicated in green and regions with either loss or gain status are represented in blue. X-axis values are chromosome position in Mb. Y-axis values (more ...)
The percentage of CNVRs confirmed by at least two programs was 52.38% for PennCNV, 21% for GADA and 40.42% for cnvPartition. A total of 26 CNVRs (53.06%) were detected by the three algorithms (Figure ). Similar results were reported by Winchester et al. (2009) comparing different algorithms for CNV detection, suggesting that PennCNV is the most accurate program in the prediction of CNVs for the Illumina's platform [19
]. In a recent study [20
], the relative performance of seven methods for CNV identification was evaluated showing that the PennCNV algorithm has a moderate power and the lowest false positive rate. This is likely explained by the unique ability of this algorithm to integrate family relationships and signal intensities from parent-offspring trios data. The low percentage of CNVs called by the GADA software might be explained by the relative low coverage of the Porcine SNP60 BeadChip.
The size of the CNVRs detected ranged from 44.7 kb to 10.7 Mb, with a median size of 754.6 kb (Table ). The Porcine SNP60 BeadChip was originally developed for high-throughput SNP genotyping in association studies. Although CNV detection is feasible with this technology, it is impaired by low marker density, non-uniform distribution of SNPs along pig chromosomes and lack of non-polymorphic probes specifically designed for CNV identification [16
]. Hence, only the largest CNVRs are expected to be assessed with the Porcine SNP60 BeadChip. This explains the difference in minimum CNV length between our study (44.7 kb) and the work of Fadista et al., 2008 (9.3 kb) using the CGH technique.
Description of the 49 CNVRs detected in the pig genome
Among the first 55 animals analyzed, a single CNVR (CNVR35
) was called in two animals whereas the remaining 48 CNVRs were called in three or more animals. A segregation analysis was performed in 372 additional animals from the IBMAP cross and the distribution of the CNVRs was additionally studied in 133 unrelated pig samples from different geographical origins (see Methods). All initially detected 49 CNVRs were segregating in the IBMAP cross and 41 were also detected in American pig populations (Additional file 1
, Table S1). The number of animals with alternative alleles for the CNVRs ranged from five (CNVR13, CNVR46
) to 270 (CNVR15
). The predicted status for the CNVRs was 19 (38.7%) for gain, eight (16.3%) for loss and 22 (45%) for regions with gain or loss status in different animals (Table ). This proportion may be related to natural selection, as it is assumed that the genome is more tolerant to duplications than to deletions [21
]. The high percentage of CNVRs with gain or loss status may be the result of including in the analysis pig breeds with different genetic origins and from different countries. However, to establish the real status of CNVRs, validation by other techniques such as quantitative PCR (qPCR) will be necessary.
Validation by quantitative PCR
Real time quantitative assays were designed for CNVR validation on seven genomic regions simultaneously detected with the three programs (CNVRs 1, 3, 15, 17, 22, 32, and 36; Table ). Five of these CNVRs (15, 17, 22, 32, and 36) were confirmed by qPCR, nevertheless fewer animals were validated for CNVRs 15, 17, and 32 (Additional file 3
, Fig. S1). Thus, the false discovery rate (FDR) for the seven analyzed CNVRs was 29%; it should be noted that the percentage of CNVRs validated in this study (71%) is higher than previously reported in pigs (50%) [11
]. This result might be explained by the stringent criteria used in our analysis, which was proposed in order to increase confidence and minimize the false positives. Nevertheless, we were not able to confirm two of the CNVRs..Several factors may account for the discrepancy in CNVR prediction between the in silico
analysis of Porcine SNP60 BeadChip data and the qPCR method. First, the incomplete status of the 4× sequence depth Sscrofa9 assembly and the low probe density of the Porcine SNP60 BeadChip makes it difficult to establish the true boundaries of CNVRs and may result in an over estimation of their real size. Therefore, it cannot be ruled out that the primers used to validate the CNVRs by qPCR may have been designed outside the structural polymorphic region. Second, polymorphisms such as SNPs and indels may influence the hybridization of the qPCR primers, changing the relative quantification (RQ) values for some animals. Finally, the true CNVR boundaries may be also polymorphic between the analyzed animals.
For the qPCR validation of CNVR36, a PCR protocol for the Cytochrome P4502 C32 Fragment gene [EMBL: ENSSSCG00000010487] was designed. A total of 37 animals were analyzed: 21 with statistical evidence for CNVR and 16 without the CNVR (control group). One of the animals from the control group was selected as reference. Six false positive animals were observed, indicating a FDR of 29% for CNVR36 (Figure ).
Figure 3 Analysis by quantitative PCR (qPCR) of CNVR36 (CYP4502 C32 Fragment gene). Twenty-one animals with statistical evidence for CNVR and eight false negative animals from the control group are showed. The horizontal dashed line represents the relative quantification (more ...)
A qPCR assay with primers located in the SLC16A7 gene [EMBL: ENSSSCG00000000456] was used for CNVR22 validation. A total of 50 animals were analyzed: 21 with statistical evidence for CNVR (12 from the IBMAP cross and nine unrelated individuals belonging to six different breeds of American populations) and 29 without the CNVR (control group). One of the animals from the control group was selected as reference. Nine of the IBMAP cross animals were validated by qPCR (FDR = 25%). Conversely, only three animals from the American populations were validated by qPCR, suggesting a higher FDR (67%) (Figure ). These differences in FDR may be explained by the higher accuracy of the PennCNV algorithm when family information is available and stress the usefulness of including family information in CNV detection. However, this conclusion should be taken with caution due to the limited number of animals analyzed.
Figure 4 Analysis by quantitative PCR (qPCR) of CNVR22 (SLC16A7 gene). Twenty-one animals with statistical evidence for CNVR, three false negative and two Iberian from the control group are plotted. The horizontal dashed line represents the relative quantification (more ...)
For CNVRs 22 and 36, copy number changes were also identified by qPCR in animals where CNVs were not detected initially in the statistical analysis (three and eight animals, respectively). This represents a false negative rate of 10% (3/29) for CNVR22
and 50% (8/16) for CNVR36
. The three false negative animals for CNVR22
were classified as deletions by qPCR protocol. A similar situation, but with a different copy number status, was observed for CNVR36
, where the eight false negative animals showed a duplication pattern by qPCR. False negative identification is common in CNV detection, and has been reported previously using the CGH technique in pigs and other mammalian species [5
Three of the validated CNVRs (17, 22, and 36) showed differential patterns of copy number variants between breeds. For instance, CNVR22 showed a loss (deletion) in Landrace and in animals from other breeds (Figure ). Assuming that Iberian pigs have two copies of CNVR22 (qPCR RQ = 1), five animals showing an RQ = 0 by qPCR are predicted to be homozygous for a deletion on this genomic region. In CNVR36, a loss was found in Iberian pigs relative to Landrace animals (Figure ). In agreement with the Mendelian segregation of this CNVR, hybrid animals show intermediate RQ values. The RQ mean values were 0.49 for Iberian, 2.51 for Landrace and 1.2 for hybrid animals.
contains a miRNA gene [EMBL: ENSSSCG00000019484
] and the Cytochrome P4502 C32 Fragment
gene (Additional file 2
, Table S2), which is a member of the Cytochrome P450
gene family (CYTP45O
). Proteins coded by this gene family constitute the major catalytic component of the liver mixed-function oxidase system and play a pivotal role in the metabolism of many endogenous and exogenous compounds [27
]. Interestingly, CNVs comprising genes of the CYTP45O
family have been described in humans and dogs [5
], but had not been previously reported in pigs. In humans, copy number variations of CYTP45O
genes have been associated with variation in drug metabolism phenotypes [29
]. Differential expression of genes of the CYTP450
family has been correlated with androsterone levels in pigs from Duroc and Landrace breeds [32
]. It has also been demonstrated that the total CYTP450
activity was slightly higher in minipigs compared to conventional pigs [33
lays close to the peak position of a QTL for androsterone leves described in a cross between Large White and Chinese Meishan [34
]. This suggests a possible role of this structural variation in determining androsterone levels; however, more studies will be necessary to validate this hypothesis.
, also validated by qPCR, comprises the SLC16A7
gene. This gene belongs to the solute carrier family 16
gene family, which encodes 14 proteins that are largely known as monocarboxylate transporters (MCTs). The human SLC16A7
gene encodes the MCT2 protein [35
] and it is expressed in several normal human tissues. In pigs, MCT2 may function as a housekeeping lactate transporter, regulating the acidification of glycolytic muscles [36
]. Remarkably, CNVR22
is located in the middle of the confidence interval of a QTL for meat pH described in four pig populations [37
Duplication events have also been validated by qPCR for SOX14
) and INSC
Copy number changes have not been previously reported in either of them in pigs. SOX14
is a member of the SOX
gene family [38
] of transcription factors involved in the regulation of embryo development and cell fate determination. SOX14
may have a major role in the regulation of nervous system development and it is a mediator of the neuronal death process [39
is an intronless gene that may has arisen by duplication from an ancestral SOX B gene, which likely was the product of a retrotransposition event [40
) was first described in Drosophila
and it plays a central role in the molecular machinery for mitotic spindle orientation and regulates cell polarity for asymmetric division [41
]. Inscuteable homologs
have been found in several species, including vertebrates and insects [43
]. In mammals, INSC is functionally conserved and it is required for correct orientation of the mitotic spindle in retina [43
] and skin [44
] precursor cells.
The qPCR assay for CNVR17
validation was designed over the sequence of one expressed sequence tag [EMBL: EW037329
]. From four Cuban creole pigs tested, three animals showed a deletion and one animal a duplication event (Additional file 3
, Fig. S1).