We recently conducted the Diabetes Genetics Initiative (DGI) genome-wide association study for type 2 diabetes and 18 other traits, including blood lipoprotein and lipid concentrations6
. Here, we focus on replication analyses related to three traits—concentrations of LDL cholesterol, HDL cholesterol and triglycerides. In DGI, we analyzed the association of 389,878 markers with blood lipoproteins and lipids in 2,758 individuals. From these results, we selected an initial 196 SNPs for replication on the basis of the strength of statistical evidence. We then combined the DGI results with those from two other genome-wide association studies—the Finland–United States Investigation of NIDDM Genetics (FUSION) and the SardiNIA Study of Aging (see companion manuscript for meta-analytic methods4
)—and selected an additional 30 SNPs for replication on the basis of the combined evidence (see Supplementary Fig. 1
online for study design). The 226 SNPs selected for replication were tested in up to 18,554 separate participants from three studies, (). Statistical evidence from the DGI genome-wide association study and the three replication studies was summarized using a variance-weighted meta-analysis7
. We pre-specified P
< 5 × 10−8
as the statistical significance threshold for each new locus.
At 18 independent genomic loci, common DNA sequence variants were reproducibly related to at least one blood lipoprotein or lipid trait (). Previous studies have identified nine loci with compelling evidence for association between common variants and lipoprotein or lipid concentrations (those of APOB
, and we confirmed eight of these loci (). Additionally, we recently identified GCKR6
as a locus associated with triglyceride concentration.
Genetic loci where common SNPs are associated with blood lipoproteins or lipids
Prior work has provided suggestive but not definitive evidence for common variants at three loci (LDLR, HMGCR
. We found compelling evidence for common variants at each of the three loci ().
Six of the 18 loci have not been previously reported to relate to lipoprotein or lipid traits in humans. For these six newly identified loci, the statistical evidence for association was robust, ranging from P = 3 × 10−8 to P = 5 × 10−42 in the combined analysis of the DGI genome-wide association study and three replication cohorts ().
For LDL cholesterol, we identified two new loci and confirmed five loci with prior evidence (). The first new locus for LDL cholesterol is located on chromosome 1p13. SNPs rs599839 and rs646776 were robustly associated with LDL cholesterol (combined P
= 3 × 10−21
and 3 × 10−29
, respectively; ). Both SNPs are located in a 97-kb region of linkage disequilibrium containing four genes—CELSR2, PSRC1, MYBPHL
. The two SNPs are highly correlated with one another (r2
= 0.89 in the HapMap population of European ancestry). Each copy of the minor allele at either SNP (24% frequency) decreased LDL cholesterol concentrations by ~5-8 mg/dl (Supplementary Table 1
Genetic loci where common SNPs are associated with blood low-density lipoprotein cholesterol
The second new locus for LDL cholesterol is located on chromosome 19p13 in an intergenic region between CILP2
. SNP rs16996148 replicated for association with LDL cholesterol (combined P
= 3 × 10−8
, ). Two copies of the minor allele at SNP rs16996148 decreased LDL cholesterol concentrations by ~16 mg/dl (Supplementary Table 1
Besides the 1p13 and 19p13 loci, we confirmed five loci with prior evidence for association with LDL cholesterol concentrations (APOB
, LDLR, HMGCR
, ). We found that an intronic LDLR
SNP strongly related to LDL cholesterol. In the cardiovascular cohort of the Malmö Diet and Cancer Study, LDL cholesterol values varied by ~7 mg/dl per copy of the minor allele at the LDLR
SNP (combined P
= 2 × 10−51
, and Supplementary Table 1
). Similarly, an intronic SNP at HMGCR
was associated with LDL cholesterol (combined P
= 1 × 10−20
For HDL cholesterol, we identified one new locus and confirmed six loci for which there was prior evidence of association (). The new locus for HDL cholesterol is located at 1q42 in an intron of GALNT2
(SNP rs4846914, combined P
= 2 × 10−13
for association, ). Each copy of the minor allele decreased HDL cholesterol concentrations by ~1.5 mg/dl (Supplementary Table 1
). In addition, we confirmed six loci with prior evidence (ABCA1, APOA1-APOC3-APOA4-APOA5, CETP, LIPC, LIPG
Genetic loci where common SNPs are associated with blood high-density lipoprotein cholesterol
For triglycerides, we identified five new loci (). The five replicated SNPs are located at 7q11 near TBL2
, 8q24 near TRIB1
, 1q42 in GALNT2
, 19p13 near CILP2-PBX4
and 1p31 near ANGPTL3
< 5 × 10−8
for each SNP, ). Of these, the SNP at 7q11 near TBL2
had the strongest effect size, with each copy of the minor allele increasing triglyceride concentrations by ~8 mg/dl (Supplementary Table 1
). In addition, we confirmed four loci with prior evidence (APOA1-APOC3-APOA4-APOA5, APOB, GCKR
Genetic loci where common SNPs are associated with blood triglycerides
We observed that SNPs at four of the newly identified loci—19p13 near CILP2
, 1q42 in GALNT2
, 7q11 near TBL2
and 8q24 near TRIB1
—were associated with multiple lipoprotein or lipid traits ( and Supplementary Table 2
online). We did not require the associations with the second and/or third trait to meet a genome-wide association threshold of P
< 5 × 10−8
. We find these secondary associations to be of interest, as the patterns of association may provide clues to how the locus affects lipoprotein metabolism.
Effects of loci on multiple lipoprotein or lipid traits
The minor allele at SNP rs16996148 on 19p13 near CILP2 and PBX4 was associated with lower concentrations of both LDL cholesterol and triglycerides (). This pattern of association is similar to that of APOB coding SNP rs693, in which a variant allele is associated with both LDL cholesterol and triglycerides in the same direction.
The minor alleles of GALNT2 SNP rs4846914 as well as SNP rs17145738 on 7q11 near TBL2 and MLXIPL were associated with both triglyceride and HDL cholesterol concentrations: rs4846914 is associated with lower HDL concentrations and higher triglyceride concentrations, and rs17145738 is associated with higher HDL concentrations and lower triglyceride concentrations (). These patterns of association are similar to that of the common LPL nonsense mutation S447X (rs328).
SNP rs17321515 at 8q24 near TRIB1 was strongly associated with triglycerides and was also associated with LDL cholesterol and HDL cholesterol (). The minor G allele at this SNP was associated with lower triglycerides, lower LDL cholesterol and higher HDL cholesterol. This pattern of association has not been previously described for any lipid-modulating SNP.
Of note, for the 23 associations from 18 common alleles in this study, we found that the effect size of an allele varied inversely with allele frequency (r = −0.49, P = 0.01). For example, lower-frequency alleles, such as the 1% frequency allele at PCSK9, affected LDL cholesterol concentrations by ~0.5 s.d. units, whereas a 48% frequency allele at APOB affected LDL cholesterol by ~0.1 s.d. units (). Such an inverse relationship is predicted if alleles perturbing physiology are deleterious during evolution, as such alleles would not rise to a high frequency in the population.
Having observed that common variants at 18 loci are convincingly associated with lipoprotein- or lipid-related traits, we next addressed the extent to which these alleles explain inter-individual variability in lipoprotein or lipid concentrations. In the cardiovascular cohort of the Malmö Diet and Cancer Study, after accounting for age, age2
, gender and diabetes status, we found that, in sum, seven SNPs explained an additional 5.7% of the residual LDL cholesterol variance (). Meanwhile, seven SNPs explained an additional 5.2% of the residual HDL cholesterol variance () and nine SNPs explained an additional 4.5% of the residual triglyceride level variance ().
Though these common alleles explain an appreciable fraction of variance, it is likely that these values are underestimates of the impact of each validated locus. As nine of the loci with common variants (ABCA1
) have also been shown to cause mendelian syndromes or harbor multiple rare alleles that contribute to trait variation19
, sequencing of each validated locus will be required to discover all common and rare variants and determine the full impact of each locus.
It is not yet clear what the causal variants or even the causal genes are at the new loci. Each of the six associated SNPs is noncoding. The genes nearest to the associated SNPs are annotated in .
However, the linkage disequilibrium pattern and the genes in the associated intervals suggest functional hypotheses. At 19p13, the variant associated with LDL cholesterol (located between CILP2
) is in high linkage disequilibrium with a nonsynonymous coding SNP in the CSPG3
gene encoding neurocan (rs2228603, 329 kb upstream, r2
= 0.85 in HapMap population of European ancestry), suggesting that CSPG3
may be the causal gene at the locus. At the 1q42 locus for HDL cholesterol and triglycerides, GALNT2
encodes polypeptide N
-acetylgalactosaminyltransferase 2, an enzyme involved in O-linked glycosylation and transfer of N
-acetylgalactosamine to the serine or threonine residues on proteins. O-linked glycosylation has a regulatory role for many proteins20
. This suggests the hypothesis that enzymatic glycosylation of any of a number of proteins involved in HDL cholesterol and triglyceride metabolism may lead to the observed pattern of association. At the 7q11 locus for triglycerides, the associated interval includes MLXIPL
, encoding a transcription factor recently described to connect carbohydrate flux with fatty-acid synthesis in the liver (also called carbohydrate response element binding protein or ChREBP)21
. Finally, inactivating mutations in ANGPTL3
(encoding angiopoietin-like 3) have already been demonstrated to lead to low triglycerides in mice22
We next considered one mechanism by which SNPs (and particularly noncoding SNPs) may relate to traits, namely, the regulation of local gene expression. We analyzed the correlation of lipid-associated SNPs with mRNA transcript levels of nearby genes in 60 human liver samples. At five of the six newly identified loci, lipid-associated SNPs showed no effect on expression of local genes (P > 0.05).
However, SNP rs646776 at the 1p13 locus was strongly associated with transcript concentrations of not only a single gene, but three neighboring genes: SORT1
= 3 × 10−26
2 × 10−12
) and PSRC1
= 3 × 10−12
) (Supplementary Fig. 2
online). SNP rs646776 explained 86%, 58%, and 58% of the inter-individual variability in SORT1, CELSR2
transcript concentrations, respectively. In analyses conditioning on either the CELSR2
transcript levels, rs646776 remained associated with SORT1
transcript concentration (P
= 1 × 10−5
and 1 × 10−5
, respectively). Conversely, after SORT1
transcript level was accounted for, rs646776 was weakly or not associated with PSRC1
= 0.04 and 0.81, respectively). Overall, our results suggest that variation at the 1p13 interval may have a regional effect on gene expression.
SORT1, or sortilin, functions both as a sorting protein and as a cell-surface receptor, and it is abundant in skeletal muscle and adipocytes23,24
. As a sorting protein, sortilin enables insulin-mediated glucose uptake by catalyzing the biogenesis of insulin-sensitive vesicles that transport the glucose transporter GLUT4 to the plasma membrane. In addition, as a multiligand receptor, sortilin can bind several proteins, including lipoprotein lipase, and potentially facilitate lipoprotein uptake. Overall, these observations suggest a mechanism by which increased sortilin expression seen with the C allele (at SNP rs646776) could lead to lower circulating LDL cholesterol concentrations.
Notably, a proxy for SNP rs646776 at the 1p13 locus, SNP rs599839, was recently reported to affect risk of coronary artery disease5
. SNP rs599839 was also related to LDL cholesterol () in our study, and the allele associated with lower LDL cholesterol (G allele, 24% frequency) was the same as that correlated with lower risk of coronary artery disease (odds ratio 0.78; P
= 4.0 × 10−9
As participants in the initial and replication studies were of European ancestry, it remains to be shown whether the new loci will be associated with lipid-related traits in individuals of other ancestries. In a pilot study, we tested whether the six SNPs from the six new loci identified in those of European ancestry would be associated with lipoprotein or lipid traits in a multiethnic sample. We studied 4,259 participants from the Singapore National Health Survey 98 comprising ethnic Chinese, Indians and Malays25
. SNPs at two of the six loci (1p13 near CELSR2-PSRC1-SORT1
associated with LDL cholesterol; 7q11 near TBL2-MLXIPL
associated with triglycerides) replicated for association in each of the three ethnic groups (Supplementary Table 3
online). Because of well-known differences in linkage disequilibrium structure and allele frequencies across populations of different ancestries, a comprehensive testing of genetic variation at each new locus is needed for each ethnic group.
We have obtained definitive evidence for six new independent loci at which common genetic variation influences one or more lipoprotein or lipid traits. By establishing these loci as relevant to lipoprotein metabolism in humans, we nominate these as high-priority targets for further investigation. Before considering these loci as targets for pharmacological therapy, it will be critical to assess whether causal alleles at each locus affect risk for cardiovascular disease. If alleles are convincingly associated with risk of cardiovascular disease (as has been shown for PCSK9
)), this would give in vivo
human proof for the locus as a valid target and support a path forward.