|Home | About | Journals | Submit | Contact Us | Français|
To investigate the value of genomic information in prediction of individual serum uric acid concentrations.
Three population samples were investigated: from isolated Adriatic island communities of Vis (n=980) and Korčula (n=944), and from general population of the city of Split (n=507). Serum uric acid concentration was correlated with the genetic risk score based on 8 previously described genes: PDZK1, GCKR, SLC2A9, ABCG2, LRRC16A, SLC17A3, SLC16A9, and SLC22A12, represented by a total of 16 single-nucleotide polymorphisms (SNP). The data were analyzed using classification and regression tree (CART) and general linear modeling.
The most important variables for uric acid prediction with CART were genetic risk score in men and age in women. The percent of variance for any single SNP in predicting serum uric acid concentration varied from 0.0%-2.0%. The use of genetic risk score explained 0.1%-2.5% of uric acid variance in men and 3.9%-4.9% in women. The highest percent of variance was obtained when age, sex, and genetic risk score were used as predictors, with a total of 30.9% of variance in pooled analysis.
Despite overall low percent of explained variance, uric acid seems to be among the most predictive human quantitative traits based on the currently available SNP information. The use of genetic risk scores is a valuable approach in genetic epidemiology and increases the predictability of human quantitative traits based on genomic information compared with single SNP approach.
Uric acid is the end product of purine metabolism in humans and higher primates and was initially believed to be a highly potent risk factor for disease development (1). This was confirmed in different studies that reported increased levels of serum uric acid in metabolic disorders and cancer (2-4). However, recent studies have reported possible beneficial effects of uric acid, especially in cardiovascular and neurological diseases of late onset (5,6). Possible protective effects are due to a strong antioxidant properties of uric acid (7). All these findings make uric acid a highly interesting research target in both epidemiology and genetics.
Efforts in the search for the genetic basis of uric acid concentrations have been focused on several candidate genes (8,9) and the recently described SLC2A9 has been found to be a gene with a major effect in humans, especially in women (10,11). SLC2A9, also known as Glut 9, is a member of the SLC2A facilitative glucose transporter family, which has an important role in the sugars metabolism (12). The gene encodes a putative transporter of 540 amino acids, which is closely related to several more members of the same gene family, with 44% and 38% resemblance to Glut5 and Glut1, respectively (12). The protein consists of 12 transmembrane domains, sugar motifs, and other signatures which facilitate sugars transport (13). The long form of this protein in humans is most strongly expressed in basolateral membranes of proximal renal tubular cells, the liver, and placenta, while the short isoform shows the strongest expression in the apical membranes of polarized renal tubular cells and in the placenta (12,14). Furthermore, it is expressed in chondrocytes from human articular cartilage, functioning as the key link between uric acid in serum and its deposits in gout (15). Three mRNA species have been described: a major transcript of 1.9 kb and 2 other transcripts of 3.1 and 5.0 kb, found primarily in the kidney and liver, but present at low levels in several other tissues (12). SLC2A9 was localized to chromosome 4 (4p15.3-p16) using a monochromosomal human/rodent somatic cell hybrid mapping panel (12). A recently published meta-analysis based on the 28141 samples from 14 populations confirmed the strongest association of uric acid with SLC2A9 and also indicated several more genes that had smaller contribution in the overall uric acid variance, but reached genome-wide significance level (16).
The initial discovery of the SLC2A9 and its association with uric acid was made on the samples from the isolated community of island of Vis (10,17). This is a well-characterized research population, with great amount on information on its history and demographics (18-21), basic population genetics, and other genetic properties (22-24). One such study reported increased odds of nephrolithiasis in isolated populations (25), which was attributed to the existence of consanguinity in some of the island populations (18,26-30). Subsequent efforts in gene mapping in two of the Croatian Adriatic islands (Vis and Korčula) yielded a number of interesting results, including findings on genetic influence on height determination (31,32), lung function (33), chronic kidney disease (34), and levels of fasting glucose (35), serum lipids (36-38), creatinine (39), and plasma N-glycans (40), supporting the initial proposal that isolated populations present a very valuable and fundamentally advantageous research resource (17,18,41).
The aim of this study was to investigate the role of a set of 8 previously described genes in predicting the serum concentrations of uric acid in samples from 3 populations in Croatia.
This study was conducted on samples from 3 populations. The first sample included 986 unselected participants from the isolated community of the Croatian Adriatic island of Vis, recruited in 2003 and 2004. The second included 944 unselected adult participants from the eastern parts of the Korčula island, recruited in 2007. The third sample comprised 535 unselected adult participants from the city of Split, as a mainland control for the isolated communities. All participants were informed on the study aims and signed the informed consent before entering the study. Relevant ethics committees in Scotland and Croatia gave consent for the study performance.
We conducted serum uric acid measurements at each of the 3 study sites (Vis, Korčula, and Split). After the blood extraction, blood samples were promptly centrifuged, aliquoted, and stored at -70°C. Subsequently, the samples were transported into the single certified laboratory in Zagreb, where all samples were analyzed using the uricase UV photometry by Olympus AU400 (Olympus Corp, Tokyo, Japan).
All 3 samples have extensive genetic information. Vis island sample was genotyped with Illumina HumanHap 300 chip, with a total of 317503 SNPs, while Korčula and Split samples were genotyped with Human CNV370 chip, containing 346027 markers (32,42-44).
All 3 data sets were used to extract SNPs from the genes that were associated with the serum uric acid concentrations in the previous studies (16). Some of these SNPs were not available in all 3 samples. In these cases, we used the closest available SNP from the same gene that was present in all 3 data sets, resulting in a total of 16 SNPs associated with uric acid, belonging to a total of 8 genes (Table 1). Based on this information, genetic score was calculated for each individual. The score was defined as the weighted sum of the information from the selected alleles, with weights proportional to the percent of variance explained in a previously published meta-analysis (16). Each allele within the same gene was weighted with the equal percent of variance that was reported for a given gene. Genetic risk scores were calculated using PLINK version 1.00 (Center for Human Genetic Research, Massachusetts General Hospital and the Broad Institute of Harvard & MIT, Cambridge, MA, USA), using “score” option (45).
Classification and regression tree (CART), a simple predictive data mining technique, was used in predicting the uric acid concentration. The model was built with untransformed uric acid values and a total of 20 predictor variables: age, sex, study site, genetic risk score, and 16 selected SNPs. Ten crossvalidations were used in order to avoid model over-fitting. The main result of this approach is the variable importance, a numerical representation of the strength of association between the predictors and predicted variable. Importance is also expressed as the normalized estimate for each predictor, where the strongest predictor is assigned a value of 100%, while the others are ranked according to their decreasing relative importance. Additionally, CART was also used to define the age limit at which the strongest change in the uric acid concentration is recorded. This model was calculated for women only. Furthermore, general linear modeling was used to estimate the percent of variance explained (adjusted R2) by either genetic risk scores or the single SNP, and also to estimate the maximum percent of variance explained by age, sex, and genomic information. Four general linear models were based on various predictors, aiming to investigate the percent of variance of serum uric acid that was attributed to these predictors. The initial model was built with age and sex as predictor variables in order to establish the predictive value of these 2 variables. The second model was built for every sub-sample and SNP, and the range of R2 was recorded (the smallest and the greatest values are reported in the table), in order to establish the possibility to use any single SNP in uric acid concentration prediction. The third model was built with genetic risk scores only in order to allow comparison with the models which included single SNP only. The final, fourth model was based on age, sex study site, and genetic risk score. In order to obtain the normal data distribution, serum uric acid was log transformed. The same predictor set was used in both CART and general linear modeling. The analysis was performed in SPSS (SPSS Inc, Chicago, IL, USA), with significance set at P<0.05.
A total of 2431 samples were included in the analysis, with the most samples from Vis Island and the fewest from Split (Table 2). The breakdown according to basic characteristics indicated a number of significant differences across these samples (Table 2).
The use of CART in the initial model indicated that sex was the most influential predictor variable for the entire data set, without an indication of differences across samples (web extra material 1)(web extra material 1). The subsequent models were made for each sex separately, indicating that the most influential variable in men was genetic risk score, while in women it was age (web extra material 1). Due to the fact that we observed a strong deflection point in women, we used another CART model where only age was forced into the classification in order to find the age when the strongest change is recorded. It was recorded between 51 and 52 years, which was also confirmed by the scatterplot, whereas men had more or less similar values of uric acid across age groups (web extra material 2)(web extra material 2). The final CART model was made for 2 groups of women: those aged ≤52 years and those older than 52 years. In the younger group, the most influential variable was genetic risk score, while in the older it was age (web extra material 1).
General linear models indicated that age and sex were strong predictors, with as much as 20.1% of explained variance in the pooled model (Table 3). Genetic risk score alone explained up to 4.9% of variance in women and 2.5% in men, while single SNPs explained up to 2% in women and 1.1% in men (Table 3). Finally, the model with 4 predictors explained a total of 30.9% of variance in pooled analysis (Table 3).
This study demonstrated that serum uric acid had relatively high percent of explained variance based on the SNP information. Similar studies report very low percent of variance, usually less than a single percent (46). This suggests that SNPs might not be the best approach to utilize genomic information or that there is greater likelihood of the common disease rare variant hypothesis, which implies that most of genetic variation is due to rare genetic variants. Regardless on the mechanism, this study also shows an interesting age and sex modifying effects. Classification and regression tree results in our study showed that genetic risk score was the most important variable for prediction of serum uric acid concentration in men, while general linear modeling suggested lower overall variance. We also found a strong modifying effect of age in women, meaning that for women 52 years old or younger the most important variable was genetic risk score, while in the older group it was age. Such results were reported before for uric acid, suggesting strong sex-related and sex-and-age-related stratification and likely differential metabolism determinants and uric acid role (10,47-49). Furthermore, previous studies often reported rather complex metabolic pathways of uric acid with numerous links, including high correlation to some other traditional cardiovascular risks (50).
The use of any single SNP in this study suggested that most of the association between SNPs and uric acid had very low percent of explained variance, despite the significant association of some of these SNPs with uric acid, reported in previous studies (10). The application of genetic risk score increased these figures, suggesting that it could substantially improve the estimates of variance used in genetic epidemiology (51). The main advantage of genetic risk score use is closely related to the predicted genetic architecture of complex traits. Currently we believe that complex traits are determined by a large number of genetic influences of small effect. The association of any single SNP with the complex trait is likely to explain just a small amount of variance, usually not reaching more than a few percents (even for the SNPs that show significant associations with the investigated trait) (10). However, a methodological step forward is the use of large number of SNPs, which are expected to increase the amount of variance when compared with single SNP approach (51). However, the existing genotyping and computational resources might still be insufficient to use genetic risk scores at a sufficient level (52). A simulation study suggested that even the use of a rather broad set of 1000 markers in a case-control study with 10000 samples provided a relative risk of only 1.04 (52), which is currently considered as a very low risk in traditional epidemiological studies. Therefore, it seems that the predictability of uric acid based on the currently available genomic information is very low, which is similar to most human quantitative traits (51). One of the exciting possibilities could be to calculate the risk scores on all available genomic information, thus producing a hypothetical maximum percent of variance for any given trait. However, the main disadvantage of this approach is related to the fact that genetic risk score calculation completely ignores possible interaction between various genes, which are likely present and important in human genome (53).
Two analytic approaches were employed in this study – classification and regression tree and general linear modeling. The first method is often used for its speed, logical simplicity, and general data mining tasks, in cases when there is little a priori knowledge or any coherent set of theories or predictions about which variables are related and how (54). This is an often-encountered situation in genetic epidemiology, and even more in the new approaches of systems biology, where a lot of studies are hypothesis-free and hypothesis-generating. This is the case even with the genome-wide association study, which is actually a data mining equivalent since no prior hypothesis is set and a number of available genetic markers are associated with an outcome variable of interest. Such an approach offers a unique opportunity to discover completely new and unexpected pathways and other modifiers, as it provides a measure for association of the investigated trait with every available SNP marker. However, each genome-wide association study must be considered as a hypothesis-generation only, and requires at least 2 following steps – replication study and further follow-up (bioinformatics or a functional genomics study). This study was based on 3 similar populations, which allowed both stratified and pooled analysis. The similarity of the general pattern of uric acid prediction in all 3 samples makes it likely that this is very close to the real uric acid pattern, thus supporting the results and conclusion presented here.
The second approach, general linear modeling, provides a more traditional measure that should always be reported with genome-wide association results – the percent of variance explained by genetic factor(s). Although this method offers a range of advantages over the bivariate statistics, it also has certain limitations, such as the problem of shared variance, when 2 or more loci may have the same pathway and the introduction of additional loci does not bring in further useful information. The indication of the shared variance is seen in the cases where the sum of variance explained by all individual loci does not add up to the total variance explained by the genetic risk scores. This study also shows a substantial amount of shared variance, which may be attributable to the fact that some genes were represented by more than one SNP or it could be due to similar metabolic actions of some of these genes. A step forward in the clearing out the amount of shared variance could be performed in a multi-level approach, where percent of variance is correlated with linkage disequilibrium between markers and then compared with their interaction, thus helping to understand the pathways between and among various loci in serum uric acid determination.
The genetic determination of uric acid seems to fit well into the polygenic trait theory, where a number of genes are expected to influence a single trait (55). However, uric acid also seems to be under strong single gene effect of SLC2A9, which was described to explain up to 5% of variance of uric acid in women (10). Due to this, uric acid seems to be a good candidate for genetic studies that describe the determination of complex human traits. Furthermore, it offers the increased chances for the development of various diagnostic and treatment opportunities for individuals who suffer symptoms of gout (11).
The limitations of this study include possible differences in some features of the 3 samples, which could have affected the results (in terms of dietary pattern, possible other exposures, and random effects). There is also much smaller interquartile range of genetic risk score in Korčula sample than in the other 2 samples. The calculation of the genetic risk score could have overestimated the percent of variance due to possible shared variance by 2 or more alleles. Furthermore, the genetic risk scores were based on an uneven number of SNPs, which could have affected the results. However, similarity of the general pattern of variance explained among all 3 samples suggests that there might be similar mechanisms that include the genetic effects on serum concentrations of uric acid. Finally, the results could have been affected by the varying extent of relatedness among participants, since it is likely that individuals from isolated populations share higher percent of genetic and environmental factors than those from the outbred, control population. Despite the limitations, this study confirms some previous suggestions of strong sex-related differences in genetic determination of serum uric acid concentration. Although the overall level of explained variance and also predictive value of genetic factors was low, uric acid seems to be among the human quantitative traits that show high percent of variance, reaching values close to 5% in women.
The studies in the Croatian islands were supported through the grants from the Medical Research Council UK to Alan F. Wright, Harry Campbell, and Igor Rudan; and Ministry of Science, Education and Sport of the Republic of Croatia to I.R. (No. 216-1080315-0302). The authors collectively thank a large number of individuals for their individual help in obtaining funding support, organizing, planning, and carrying out the field work related to the project, and for assistance with data management, analysis, and interpretation: the teams from the Medical Research Council Human Genetics Unit in Edinburgh and The University of Edinburgh, UK, led by Professors Nicholas D. Hastie, Alan Wright, and Harry Campbell, including Caroline Hayward and Veronique Vitart (for funding support, intellectual input in study design, and their role in data management, analysis, and interpretation); Professor Pavao Rudan and the staff of the Institute for Anthropological Research in Zagreb, Croatia (organization of the field work, anthropometric and physiological measurements, and DNA extraction in Vis); Professor Stipan Janković and the staff of the University of Split Medical School (organization of the field work, anthropometric and physiological measurements, and DNA extraction in Korčula and Split); Professor Ariana Vorko-Jović and the staff and medical students of the Andrija Štampar School of Public Health of the Medical School, University of Zagreb, Croatia (questionnaires, genealogical reconstruction, and data entry in Vis and Korčula); Dr Branka Salzer from the biochemistry lab “Salzer,” Croatia (measurements of biochemical traits in Vis and Korčula); local general practitioners and nurses (recruitment and communication with the study population); and the employees of several other Croatian institutions who participated in the field work, including but not limited to the University of Rijeka; Croatian Institute of Public Health; Institutes of Public Health in Split and Dubrovnik. SNP Genotyping of the Vis samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, WGH, Edinburgh; of the Korčula samples by the Genotyping Institute in Munich, Germany; and of the Split samples by AROS company in Aarhus, Denmark.
I. R. is the executive editor in the CMJ. To ensure that any possible conflict of interest has been addressed, this article was reviewed according to best practice guidelines of international editorial organizations.