The non-specificity of different assays for
CCL3L1 gene copy number raises the question of the gene specificity reported in earlier studies. Until the genetic architecture of this complex region is understood, it will be difficult to evaluate the genes individually and determine if there is an association between the copy numbers of all the genes in the cluster or if the association is specific to the copy number of only one gene [
18]. Further, there might be situations in which a higher gene copy of one gene is detrimental and a higher gene copy of another is beneficial. Based on existing assays, the distribution of copy numbers of
CCL3L1/CCL3L3 differs in different ethnic groups, (e.g., median copies of two, four, and six in Caucasians, Asians, and Africans, respectively) [
12,
13]. In our own results, while this seems to be the case at the population level, there seems to be inconsistencies between the different assays at the individual level, as shown specifically by the low concordance rates (Figure ). Overall, CCL3L_PP4 has a higher mean than other assays both in African American and European American could possibly measure all three genes as previously described [
1], but the sequence alignment shows that the assays are based specific to CCL3L1 and CCL3L3 and not to the truncated CCL3L2. Nevertheless, CCL3L_PP4 is not consistently larger across samples. On the contrary, CCL3L_PP4 which binds to all three genes has actually the lowest mean copy number. This clearly suggests non-specificity of the assays regardless of the sequence alignment.
If
CCL3L1 and
CCL3L3 are both diploid in nature then an individual should possess 4 copies of these genes based on the assays specific to two of these genes (Table : CCL3L_PP1, CCL3L_PP2, CCL3L_PP4, and CCL3L_PP5), so it is puzzling why there are several individuals with 1 or 2 copies, especially among Caucasians as shown in Figure . Do they lack one of these genes completely or do they have one copy of each or do they have different copy number of each gene? Any such discrepancy may confound the association from the biological function of the expressed proteins rather than the copies of the sequence in the genome. Thus, further investigation is needed to clarify the distribution for the sub-fractions of these genes and to understand how immunity with respect to the affect of each of these genes has evolved in different populations, including non-human primates, who tend to have higher copy numbers. Theoretically any assay performed to determine total
CCL3L1 and
CCL3L3 copies (although does not distinguish the dose of
CCL3L1 and
CCL3L3 in each individual) should reveal the same results and thus there should be no effect in the association. However, as we see from our results, the concordance rates are low. Without a gold-standard it is not possible to reliably assess which assay is better, but any misclassifications could lead to incorrect associations. Additionally, there are several other issues such as dye chemistry, reaction specificity/conditions and DNA concentrations that might also affect the assays [
19]. Even with exact copy numbers, while there may not be differences at the sequence level, there may be differences at the expression levels and therefore may confound the overall association at the protein level rather than the nucleotide sequence level. For instance, the affinity of CCL3L1 is strongest for CCR5 and may be important to know how many copies of
CCL3L1 an individual has versus how many of total
CCL3L1 and
CCL3L3. It remains to be shown how these two genes are differentially expressed and also how an expression of one gene might be affected by differential copy numbers of the others. They could enhance the affinity, reduce the affinity, or have no effect. However, while the expression and protein levels are important, the structural variants of these genes at the sequence level (copy numbers) needs to be understood and assayed properly to determine which ones are functional and what their levels are.
Since the extent of the non-specificity of the current RT-PCR based assays has not been well defined, the comparison to delineate the homologous regions provides basic information for assay development. The present data show that other genes, such as
CCL18 and
CCL24, which may not be in the gene cluster, also have overlapping regions (30-34% in CCL18 and 27-44% in CCL24 with genes in CCL3L cluster). At present, different bio-informatics tools are needed to examine the sequences and to understand their complexities. While our inferences of the RT-PCR based assays are based on a single reference genome assembly, all previous assays were likely based on the reference sequence as well and thus our approach provides a less conservative specificity since polymorphisms between and within genes in a population could further confound the specificity of the primers/probes. For example, although the exonic sequences between
CCL3L1 and
CCL3L3 are identical, there is at least one SNP in the UTR and two in the introns of
CCL3L1 and one in the intron of
CCL3L3 (additional file
1) that are uniquely polymorphic (based on the reference genome) and can be utilized to develop assays. However, it is yet to be determined if these are different within genes or between genes to make a more reliable assay. Additionally, variants between and within copies of the CCL3L-related genes might influence the function of these genes. For example, SNPs in
CCL3 and
CCL3L1 genes determine their production [
20]. Thus, SNPs and copy numbers are important in examining the production and expression of these gene levels. Both should be assayed appropriately.
Recently, as an alternative to RT-PCR, a method for
CCL3L1 copy number determination based on a paralogue ratio test (PRT) has been developed [
21]. However, the primers are non-specific and align with both
CCL3L1 and
CCL3L3. With the current methodologies for determining gene copy numbers, a main assumption is that the derived gene copy number based on specific probes represents the whole gene. This may not always be the case, since only parts of the gene might be duplicated; this may be missed if the probe is not specific for this segment or may provide a false count when other segments of the gene are not amplified, especially the functional segments. In some cases, there might be a complete gene copy, but subtle differences may be present at the sequence level. The orientation of the gene might be opposite so that expression would not be the same, or there might be differences in single-nucleotide polymorphisms (SNPs) between copies. To account for these variables, complete sequence data of all copies could be required. In summary, we report homology at the nucleotide sequence level between the different
CCL3L-related gene clusters and primers/probes for the RT-PCR based assays. The currently used assays for gene copies of
CCL3L1 are evidently non-specific and thus could overestimate the copy numbers. Based on the overlapping and non-specific sequences between these genes, current gene copy assays, such as gene specific RT-PCR, pyrosequencing, paralogue ratio tests (PRT), multiplex amplifiable probe hybridization (MAPH) or multiplex ligation-dependent probe amplification (MPLA), could be fine-tuned with broad-range nested PCR methods to avoid redundant sequences and other new assays developed. Special precautions, however, are needed to avoid the homologous sequences. Non-specificity of the laboratory methods for CNVs should not be overlooked as we develop different analytical methods to account for heterogeneity in association results.