We used a candidate-gene approach to identify novel associations between variation in 24 clotting genes and the risk of incident VT. Based on the biology of clotting, we selected 24 genes that code for proteins affecting coagulation (factors II, V, VII, VIII, IX, X, XI, XII, XIIIa1, and XIIIb; α-, β-, and γ-fibrinogen; and tissue factor), anti-coagulation (antithrombin, proteins C and S, endothelial protein C receptor, thrombomodulin, and tissue factor pathway inhibitor [TFPI]), fibrinolysis (plasminogen and tissue-type plasminogen activator [TPA]), and antifibrinolysis (type 1 plasminogen activator inhibitor [PAI-1] and thrombin activatable fibrinolysis inhibitor [TAFI]).
Source of Genetic Variation
We used publicly available data on gene-wide variation derived from sequencing efforts. These data came from the Seattle SNPs Program for Genomic Applications (PGA) that conducted complete sequencing of all 24 genes in 23 individuals (46 alleles) of European ancestry and 24 individuals (48 alleles) of African ancestry (http://pga.gs.washington.edu/
). These data include all SNPs and insertion deletions identified in the 94 alleles (haplotypes). From these data we chose SNPs that had a minor allele frequency of at least 5% in either population and used the computer program ldSelect (http://droog.gs.washington.edu/ldSelect.html
), which considers correlation information between all SNPs in a region, to select a set of SNPs, known as tag SNPs, which efficiently characterize the common haplotypes.8
As an example, provides a diagram of the genetic structure and variation in the factor IX gene (F9). The Seattle SNPs PGA sequenced 100% of the 33,000 base pairs that encompass the F9 gene and identified 77 variants in the 2 populations. Using ldSelect, we chose 10 SNPs to tag haplotypes for F9 in subjects of European ancestry and 20 SNPs to tag haplotypes for F9 in subjects of African ancestry. For some genes, such as factor VIII (F8), Seattle SNPs was not able to sequence the full gene and certain gene segments, such as exons and introns, were omitted and did not contribute to SNP selection or haplotype estimation.
The study population came from a population-based, case-control study of 349 women with incident events of deep vein thrombosis or pulmonary embolism and 1,680 control subjects. All women were 30–89 years of age, post-menopausal, and not related to one another.9
All events were verified by medical record review. Genomic DNA was collected from whole blood and genotyping was performed using a custom Illumina GoldenGate panel (Illumina Inc., San Diego, California).
We used the PHASE computer program to infer full-gene haplotypes for each subject (http://www.stat.washington.edu/stephens/software.html
) based on our tag-SNP data. Haplotypes with a minor allele frequency (MAF) of less than 2% of the study population were combined into 1 ‘rare haplotype’ category. The factor V (F5) gene was large and there were few common haplotypes with a MAF of 2% or greater. For this reason, we split the F5 gene in 2 at a locus of historical recombination when creating haplotypes.
Our approach was to test global variation in a gene using haplotype information and to test for SNP and haplotype associations independently. We conducted tests on 25 genes (F5 was split), 170 SNPs, and 173 haplotypes. Additive genetic models were used to test the association between additional SNP and haplotype copies and VT risk. Global testing compared the haplotype model that characterizes all common haplotypes within the gene with a model that included no haplotype information. We excluded from hypothesis testing the 2 well-characterized SNPs, FV Leiden and prothrombin (F2) 20210A, and their tagged haplotypes.10, 11
To deal with the issue of multiple testing, we used the false discovery rate (FDR) q-statistic to guide statistical significance.12
Briefly, FDR methods output a set of hypothesis-testing decisions, among which the rate of false-positives is controlled below a specified value, in expectation. Use of the FDR is appropriate when the number of expected true positives is not expected to be small. Our threshold of significance was set at 0.2, meaning that no more than 20% of reported results are expected to be false positives.13
The results from our candidate-gene approach for incident VT have been published elsewhere.9
Briefly, of the 25 gene-wide variation tests performed, 1 was associated with q-value <0.2 (TFPI) and 2 other genes were associated with a p-value <0.05 (F5 upper half and protein C). Of the 170 SNPs tested, 5 SNPs had a q-value <0.2 among the 21 (12%) SNPs that had a p-value <0.05. Of the 173 haplotypes tested, 1 haplotype was associated with a q-value <0.2 among the 20 (12%) haplotypes that had a p-value <0.05. Among the 5 SNPs with an associated q-value <0.2, FV K858R (rs4524, MAF=27%, OR=0.7, p-value = 0.003) and FXI 22771 (rs2289252, MAF=40%, OR=1.3, p-value = 0.002) were novel findings. The 3 remaining SNPs were in the protein C gene (PROC 2583: rs1799810, MAF=42%, OR=1.3, p-value = 0.001; PROC 4919: rs2069915, MAF=41%, OR=0.8, p = 0.005; and PROC 11310: rs5937, MAF=32%, OR=1.4, p < 0.001), the latter being a novel discovery.14, 15
The haplotype associated with a q-value <0.2 was in the PROC gene and was uniquely marked by the 11310 variant.
In summary, 1 gene (TFPI) was globally associated with risk although neither its haplotypes or SNPs reached our threshold for significance. Five SNPs in 3 genes (F5, F11, and PROC) were significantly associated with risk. Overall, few of the associations suggested more than doubling or halving of risk and most variants were commonly occurring. Of note, few haplotype effects were detected. Diplotype analyses were not conducted.
For candidate-gene approaches to discovering novel risk factors for VT, gene coverage is only as good as the source of the variation data. Genetic regions or variants that are rare, that are not genotyped, or are not in sufficiently high LD with SNPs that are genotyped would be overlooked. Publicly available sequence data, like those available on the Seattle PGA, are unlikely to contain most rare SNPs, such as those with a MAF less than 2%. For example the F2 20210A variant (MAF 0.02) was not detected in the 2 panels of subjects sequenced by the Seattle PGA. When investigating genes with a large amount of genetic variation and multiple genes, issues of multiple testing need to be considered and accounted for when interpreting p-values. Like all association studies, the candidate-gene approach does not provide functional information or definitive conclusions on causality.
Candidate-gene studies use knowledge from biology to identify and investigate gene-centric variation associated with novel genetic risk factors. This approach can be used to take advantage of findings from genome-wide association studies of VT and provide further insight into new candidate genes that influence the risk of thrombosis.