The results of the psychophysics measurements reported here, both in their average values and in the observed distribution of values are in agreement with recent observations by others 
. One feature that distinguishes our study from most recent studies is the replication of the psychophysics test on the same panel of subjects within a relatively brief period of just two weeks. Over this period, threshold determination was performed in duplicate and supra-threshold intensity curves were recorded in triplicate.
The intra-subject variability between those repeat measurements is substantial – both for the detection thresholds and for the supra-threshold intensity ratings. In the case of the detection thresholds the average difference between the two measurements is 0.32 log10 units, which corresponds to a greater than two-fold change in the detection threshold. And, for 17% of subjects the variability between replicate measurements was >0.62 log10 units corresponding to a greater than four-fold change in detection threshold. As a consequence, some subjects that would have been classified as tasters based on the first measurement later displayed detection thresholds that would classify them as non-tasters and vice versa.
This intra-subject variation in phenotype far exceeds the nominal resolution of the stair-case method that was used to measure this phenotype. Therefore it is unlikely that this variation is simply the result of measurement error. Instead, other factors which are not easily controlled do appear to have a substantial effect on day-to-day variations in taste sensitivity.
The intra-subject variability we observed in our study over a rather short period of just a few days is comparable in magnitude to the intra-subject variability reported in studies with substantially longer time periods of weeks to months between repeat measurements 
Together the data underline that neither a subject's detection threshold nor its taster/non-taster status should be viewed as a single, fixed parameter that can be ascertained in a single measurement session. Instead, a subject's detection threshold appears to be a dynamic parameter that is subject to substantial variation over the period of just a few days.
The same caveat applies to the supra-threshold intensity ratings. In fact, intra-subject variability of the supra-threshold ratings was even greater than that of the detection thresholds. Intensity ratings involve a larger degree of subjectivity than do the threshold determinations via the staircase method. Correspondingly, the gLMS scores were somewhat less powerful in identifying TAS2R38-PROP association in our GWA studies. Still, the GWA studies based on the gLMS scores easily identified the correct SNPs. And gLMS scores do have key advantages that, despite this reduced power, may make them attractive phenotyping parameters for future GWA studies in the field of psychophysics. First, after an initial training of participants, gLMS intensity curves can be recorded much faster and with lower personnel effort than detection thresholds. In cases where the costs of genotyping and subject recruitment are not limiting, but testing time is at a premium, gLMS intensity measurements may be particularly attractive. Such a situation may be encountered, for example, when psychophysics measurements are performed as an “add on” to another study. Also, gLMS intensity curves provide both a rough estimate of a compound's detection threshold as well as supra-threshold intensity ratings. In cases where variations in detection thresholds and variations in the perceived intensity of supra-threshold stimuli are driven by two distinct genetic variations, gLMS intensity curves may then allow the identification of both of these genetic variations.
Other genetic factors of PROP taste perception
Past studies have suggested that genetic factors, other than TAS2R38, may also play a role in driving the observed natural variation in the bitter taste perception of PTC and PROP. In we show the list of all 34 SNPs that generated genome-wide significant or “suggestive” p-values (<10−5) in the GWAS on the PROP detection threshold phenotype. Only four of these SNPs have genome-wide significant associations (p<5×10−8) and all four of these SNPs are located within the TAS2R38 gene itself or within its immediate vicinity. Incidentally the same four SNPs (rs713598, rs10246939, rs4726481 and rs1726866) were the only SNPs that achieved genome-wide significance for any of our studied phenotype representation.
PROP detection threshold-associated SNPs with p<10−5.
An additional 14 SNPs are located very near TAS2R38. To test, if these SNPs are simply in linkage disequilibrium with the causal TAS2R38 SNPs, or if they make an independent functional contribution, we corrected each subject's phenotype for their genotype at the top TAS2R38 SNP and then used this corrected phenotype in an additional GWAS. In this second GWAS the p-value for these 14 SNPs decreased to well below statistical significance indicating that they do not make an independent contribution to the PROP detection phenotype.
The remaining 16 SNPs with suggestive p-values of <10−5 are distributed throughout the genome with 13 of them within less than 100kb of an annotated gene. None of these genes suggested a mechanistic link to taste perception.
Previous studies had specifically implicated loci on chromosome 16p 
and chromosome 5p15 
. With 10165 and 6330 quality controlled SNPs respectively these two chromosomal regions are well covered by the SNP chip used in our experiment. We generated local Manhattan and qq-plots for both of these regions (data not shown). In both cases the distribution of p-values is fully consistent with purely random associations.
On the p-arm of chromosome 16 the minimal p-value of 2.8×10−4
is observed for rs9933117 which is located in an intron of XYLT1,
a xylosyltransferase gene. After Bonferroni correction for the large number of SNPs in this region the p-value is non-significant (pcorr
1). A similar situation is observed for the 5p15 region. Here the minimum p-value of 3.6×10−4
is observed for SNP rs1395093. This SNP falls into a gene-free region and after Bonferroni correction the p-value also drops to a non-significant level (pcorr
We further find that variations in both detection threshold and supra-threshold intensities are dominated by the same genetic variations in the TAS2R38 receptor. These observations are entirely consistent with recent results from in vitro
experiments on functionalized bitter receptors 
that indicate that both PROP and PTC activate only TAS2R38 and no other bitter taste receptor.
At the same time we like to make clear that failure of our study to detect additional genetic variants that impact PROP perception in no way precludes the existence of such genetic variants. It is very well possible that more powerful studies (e.g.
studies on larger panels or different cohorts) may identify such associations in the future. Still, we can use the data from the current study to get a rough idea about the likely upper limit for the impact that such genetic variants could have. Using power-to-detect calculations we can see that with a panel of 225 subjects and a genome-wide-significance cutoff of log(p)
a genetic variation that explains more than 11.5% of overall phenotype variance would have been detected in our GWAS with greater than 50% probability. In other words, failure to find additional genetic factors in our study makes the existence of multiple genetic factors with substantial impact on PROP taste perception (r-square >> 11.5%) seem unlikely.
Correspondingly, we can conclude that our current data set is not powerful enough to detect genetic variations with subtle impacts (r-square <11.5%) on PROP perception via the hypothesis-free GWAS approach. Still, this data set may enable other researchers to study such subtle genotype-PROP-taste associations using a target-gene approach. The complete set of the association p-values for the PROP detection threshold phenotype for all genotyped SNPs is included in Supporting Information S1
. The SNPs contained in this data set have passed our quality control criteria, were calculated based on phenotype scores corrected for covariates and have undergone genomic control as described in the methods section.
Effect of genetic ancestry on PROP taste perception
Given the very substantial differences in the frequency of the relevant TAS2R38 genotypes in different ethnic groups (see HapMap data base 
) one might have expected an association between genetic ancestry principal components and PROP taste sensitivity. But, no such association was observed. We attribute this lack of association to the population structure of our panel. The ancestry principal component plot of our panel (data not shown) indicates a continuous admixture of genotypes from different ethnic origins. As a result of this admixture the link between the specific TAS2R38 genotype, which drives PROP taste sensitivity, and the overall genetic ancestry, which represents ethnicity, will be weakened – apparently to the point where a statistical association can no longer be observed.
Non-genetic factors influencing PROP taste perception
Approximately 50% of the observed variation in PROP detection thresholds is accounted for by genetic variation in the TAS2R38 gene. From replicate measurements we can estimate that the measurement error accounts for another 20% of the observed phenotype variability. This leaves approximately 30% of the phenotypic variation we observed to be explained by other factors. Consistent with reports in the literature 
neither gender nor body mass index explain a significant portion of these remaining 30% of variation. Even age (), which is known to influence both overall 
and PROP-specific 
taste sensitivity explains only 3.8% of the overall variation in PROP detection thresholds.
Conclusions for GWAS studies on other chemosensory phenotypes
The results of the current study, together with the fact that common genetic variations with strong impact on sensory phenotype seem to exist 
beyond the PROP-TAS2R38 association, indicate that GWAS studies have the potential to yield important discoveries in the field of human chemosensory perception. Specifically, given a sufficiently strong influence of the genotype on the sensory phenotype, the large, and as we think inherent, uncertainty in the measurement of chemosensory phenotypes will not pose a fundamental obstacle to the success of the GWAS strategy.
Our study shows that careful selection of the measured phenotype and equally careful processing of the data can significantly boost the power of an association signal. The optimal phenotpying strategy will depend critically on a study's setting. For example, in our study detection thresholds determined with the staircase method were clearly the most powerful phenotype for identifying the PROP-TAS2R38 association. However, due to its complexity the staircase protocol required both a greater time commitment from panelists and a substantially greater personnel effort than the measurement of gLMS taste intensity curves. Taking this difference in effort into account, the gLMS phenotype, which is slightly less powerful than the detection threshold phenotype, is clearly the more efficient phenotyping method. Therefore, if phenotyping is carried out on a large, already genotyped cohort, the simpler gLMS approach would appear to be highly attractive. In fact, recent GWASs on quinine taste perception 
and specific anosmias 
have shown that even extremely streamlined phenotyping approaches can be successful as long as this simplicity in the phenotyping method enables access to very large, already-genotyped subject panels.
Below are 5 points of concrete advice that summarize the findings from our study. Taking these 5 points into account in the design of future chemosensory GWAS studies should help boost the genotype-phenotype association signal and with it boost the chance to identify genome-wide significant associations:
- Conversion of a continuous phenotype (e.g. detection threshold) into a binary phenotype (e.g. taster/non-taster status) generally leads to a substantial loss of statistical power and should be avoided. While this loss of power is well-known and well-understood from a statistical perspective, conversion of taste thresholds into taster/non-taster status continues to be such a common practice in the taste field that we feel compelled to point out the resulting loss of statistical power in this context.
- Investing phenotyping resources into the testing of additional panelists adds substantially more power to an association signal than replication measurements on existing panelists. If information from replicate measurements is desirable (e.g. to obtain information on phenotype accuracy or on the variation of a phenotype over time), we recommend to perform such replicate measurements only on a subset of the panelists. Typically a rather modest number of subjects will already be sufficient to get a reliable estimate of intra-subject variability so that the bulk of the phenotyping effort can be dedicated to the inclusion of additional subjects.
- Detection thresholds measured via the staircase method are a more powerful phenotype than suprathreshold intensity ratings obtained with the gLMS approach. Threshold measurements represent an attractive phenotyping strategy when panel size is limited.
- Determination of suprathreshold intensity ratings using the gLMS approach is a more cost effective phenotyping strategy. Use gLMS-based approaches for phenotyping when large, already genotyped subject panels are available.
- Detection threshold data should be log-transformed and then -if replicate measurements were obtained- averaged to generate the input phenotype for the GWAS. This log-transformation accounts for the log-linear relationship between taste stimulus and taste response, leads to normally-distributed residuals in genotype-phenotype regression during GWAS and ultimately boosts the power to find genotype-phenotype associations.
Given that taste perception of PROP bitterness follows the prototypical stimulus-response relationship () found across much of psychophysics the above points will be directly transferable to a wide range of sensory phenotypes.
In conclusion, we have used the well-known association between variations in the TAS2R38 taste receptor gene and variations in taste perception of the bitter compound PROP to evaluate the relative power and efficiency of different chemosensory phenotyping strategies for genome-wide association studies (GWAS). The performance of the GWAS was surprisingly robust to the specific choice of data processing procedures and both detection threshold as well as supra-threshold intensity ratings reproduced the TAS2R38 association unequivocally. Still, careful choice of phenotyping method and parameter representation can provide a substantial boost in the strength of the genotype-phenotype associations. We anticipate that the lessons learned in this study will be valuable for future GWA studies on chemosensory phenotypes where associations between genotype and phenotype are less pronounced than is the case for TAS2R38 and PROP detection.