We have found that both the absolute and relative concentrations of ‘carbohydrate-deficient’ isoforms of transferrin are subject to genetic variation. Indeed, comparison of the heritability and repeatability estimates suggests that all or nearly all of the repeatable variation is genetic. As explained in Materials and Methods section, we tried to correct for all relevant environmental factors (first two principal components (PCs), sex, smoking status and alcohol consumption) to maximize our chances to tease out the most important genetic components determining CDT%. Other factors, such as pregnancy, haemodialysis, liver disease are either uncommon in the general population (in the CoLaus sample we had <11 pregnant women, 7 participants with liver disease and 9 participants with alcohol problems, <0.5% of the total) or have negligible effect on CDT levels, thus could not have considerably influenced CDT levels.
In terms of specific SNPs and genes, there is evidence that variation near to the PGM1 gene on chromosome 1, and in the TF gene on chromosome 3, contributes to the overall genetic effect. This leads us to consider how these genes fit into the processes of synthesis, release or circulation of the isoforms of transferrin.
The association with
PGM1 may be related to synthesis of the carbohydrate sidechains, which are attached to the protein and then modified by removal and addition of carbohydrate groups (
7). The phosphoglucomutase enzyme converts glucose-6-phosphate into glucose-1-phosphate needed for synthesis of UDP-glucose, a precursor of the carbohydrate which is eventually transferred to the protein (
8). However, phosphoglucomutase catalyses a very early step in this pathway and it is by no means clear how variation in the
PGM1 gene would affect the carbohydrate structure of the glycoprotein. Major defects in
PGM1 are known (
http://www.ncbi.nlm.nih.gov/omim/171900), but they lead to a form of glycogen storage disease rather than a carbohydrate-deficient glycoprotein syndrome. However,
PGM1 expression in human liver (
9) is significantly affected by rs4643 at 63 898 026 bp (expression
P = 1.0 × 10
−6), and this SNP is one of a group of three affecting CDT% at
P < 10
−8. It is reasonable to accept that the SNPs on chromosome 1 showing significant association with CDT affect expression of
PGM1 in the liver (the site of transferrin synthesis), but an understanding of how phosphoglucomutase activity affects transferrin isoforms will require experiments which are beyond the scope of this study.
The associations on chromosome 3 are more readily related to relevant genes, but the details are complex. Although rs1534166 in SRPRB showed the strongest association (P = 4.3 × 10−46) with CDT% in the initial combined analysis, more detailed examination of this region showed two independent effects at neighbouring loci in different exons of TF, rs1799899 (Gly277Ser, P = 3.96 × 10−39), and rs1049296 (Pro589Ser, P = 5.45 × 10−43). These associations were identified using step-wise model selection; hence, each variant is associated when conditioned on the other one. The SNP rs1799899 had a substantially greater allelic effect than rs1049296, but was less common [minor allele frequency (MAF) = 5.8%]. A third non-synonymous SNP in TF, rs8177318 or Ser55Arg, may also contribute, but this could not be shown at the genome-wide significance level. Through this example we illustrate how computational methods can help in tracking down potentially causal variants, revealing allelic heterogeneity and substantially (from 3.6 to 5.8%) increasing the explained variance compared with univariate analysis.
The effects of these TF polymorphisms remain unchanged even in association with transferrin-corrected CDT%, so the significant effects on CDT% cannot be explained as being secondary to changes in total transferrin concentration. They may be due to changes in tertiary structure of transferrin because of the amino acid differences (Pro589Ser from rs1049296 and/or Gly277Ser from rs1799899), even though these changes are not at the N-glycosylation sites (which are asparagine residues at positions 413 and 611).
The heterogeneity of allelic effects across methods at the
TF locus can be ascribed to these non-synonymous coding SNPs and the difference in analytical principles between the methods for measurement of CDT. The CDTect method relied on elution of isoforms with a pI >5.7 from an anion-exchange column followed by measurement of transferrin in the eluate. Any changes in the pI of transferrin isoforms resulting from changes in the amino acid sequence will either increase or decrease the amount of transferrin eluting from the column. The effects of the Pro589Ser (or C1/C2) polymorphism in the transferrin protein on the pI of transferrin and estimated CDT concentrations were examined by Stibler
et al. (
10). Although the effect was not significant with their limited number of samples, they found mean CDT concentrations of 51, 53 and 58 mg/l for C1, C1/C2 and C2 samples, respectively. The effects of this variant on the pI of disialotransferrin were illustrated by Helander
et al. (
11). They showed a shift in disialotransferrin towards the pI of the more usual forms of mono- and asialotransferrin in participants with the C2 variant. Because the CDTect assay measured the sum of asialo- monosialo- and part of disialotransferrin, the difference in pI leads to a greater proportion of the disialotransferrin being measured as CDT and an increase in the apparent CDT concentration. However, both the capillary electrophoresis and the N-Latex methods showed the effects of rs1049296 and rs1799899 on CDT which cannot be explained by alteration in isoelectric point. The N-Latex method is based on a monoclonal antibody, which specifically recognizes transferrin glycoforms that lack one or both of the complete N-glycans (disialo-, monosialo- and asialotransferrins) (
12) and does not rely on a charge-based separation of the isoforms.
Apart from questions of how these polymorphisms affect serum CDT% or concentration, there may be practical implications for the diagnostic use of CDT measurement. CDT% is confirmed to be a robust marker of various aspects of alcohol consumption in our cohorts: the consumption of beer, wine and sprits strongly influence CDT% (P < 10−58, P < 10−57, P < 10−6, respectively), so does alcohol consumption frequency (P < 10−67) and total weekly alcohol consumption (P < 10−134). As for any clinical test, it is necessary to establish a reference range against which patients' results can be compared. Problems with CDT as a marker for alcohol consumption have been reported, thus a definition of genotype-dependent reference intervals may improve the sensitivity or specificity of this test; this aspect will be examined in a further paper.
In summary, we have identified two loci (and three independent SNPs) affecting the degree of glycosylation of circulating transferrin through a genome-wide association approach. They account for 5.8% of phenotypic variation in CDT% or CDT, which is a high proportion compared with most GWAS outcomes. These loci require further studies to investigate the detailed mechanisms. The types of variation, affecting the process of oligosaccharide synthesis and showing the importance of protein structure, which we illustrate for transferrin, may well be relevant to other biologically important glycoproteins.