|Home | About | Journals | Submit | Contact Us | Français|
When literature-based meta-analyses involve outcomes with skewed distributions, the best available data can sometimes be a mixture of results presented on the raw scale and results presented on the logarithmic scale. We review and develop methods for transforming between these results for two-group studies, such as clinical trials and prospective or cross-sectional epidemiological studies. These allow meta-analyses to be conducted using all studies and on a common scale. The methods can also be used to produce a meta-analysis of ratios of geometric means when skewed data are reported on the raw scale for every study. We compare three methods, two of which have alternative standard error formulae, in an application and in a series of simulation studies. We conclude that an approach based on a log-normal assumption for the raw data is reasonably robust to different true distributions; and we provide new standard error approximations for this method. An assumption can be made that the standard deviations in the two groups are equal. This increases precision of the estimates, but if incorrect can lead to very misleading results. Copyright © 2008 John Wiley & Sons, Ltd.
Meta-analyses of clinical trials, epidemiological studies and other types of study may involve continuous outcome data. Continuous data can be skewed, typical examples being concentrations (e.g. of plasma triglycerides), other ratio or reciprocal measures (e.g. percentage reduction), measures related to resource use (e.g. recovery time) or assessment scales when there is a large proportion of ‘normal’ participants with scores towards one extreme of the scale (e.g. measures of cognition in population-based studies). Standard inferences on the means of skewed data are valid for large sample sizes due to the central limit theorem, which determines that the mean of the outcome measurements is approximately normally distributed with standard deviation given by the standard error (obtained by dividing the standard deviation by the square-root of the sample size). Since standard meta-analytic methods assume normality in the distribution of the means, but not the raw data , they are valid when sample sizes within individual studies are sufficient to enable the central limit theorem to hold approximately.
Focussing on the raw outcome measurements is problematic when the sample size is small, as the standard deviation and mean are affected by extreme values in one direction. It may also lead to loss of efficiency regardless of the sample size. This is well recognized by authors of primary research studies. A common approach to dealing with skewed outcome data is to take a logarithmic transformation of each observation and to conduct the analysis using log-transformed values. This yields, for example, a mean of the log-concentration levels together with a standard deviation of the log-concentration levels, leading directly to a confidence interval for the mean log-concentration level. The mean of the logs can be readily transformed to a geometric mean along with a confidence interval. A logarithmic transformation can offer further advantages, including a focus of the analysis on clinically more appropriate measures of effect .
A practical complication in meta-analyses based on summary data is that some studies may present means and standard deviations on the log scale (or geometric means on the raw scale), while other studies present means and standard deviations on the raw scale. It can be difficult to determine exactly which data have been presented. It may, for example, be unclear whether a mean is an arithmetic mean or a geometric mean. Furthermore, some papers may present inappropriate results, such as the exponential of the standard deviation of log-transformed values. Even when the results have been correctly interpreted; however, there remains the problem of combining results on different scales. Here, we present straightforward and approximate transformations that enable meta-analyses on either the raw or the log-transformed scale, irrespective of how results are presented. We do assume; however, that the nature of all results extracted from papers is known, and we focus on making inferences concerning the comparison of two groups.
Consider first a single group; say an intervention or a control group from a clinical trial, or a specific exposure group in an observational epidemiological study. Let n be the sample size in this single group. Let and sx represent the arithmetic mean and standard deviation of raw (not log-transformed) measurements. Lower and upper limits of a 95 per cent confidence interval for the mean, are obtained as
where t is the 97.5 percentage point of the t-distribution with (n − 1) degrees of freedom.
Let and sz represent the arithmetic mean and standard deviation of log-transformed measurements. Lower and upper limits of a 95 per cent confidence interval for are obtained as
The geometric mean may be obtained as . A 95 per cent confidence interval for the geometric mean is given by
Data available to a meta-analyst might be in one of the following formats, although the list is not exhaustive:
The formulae above can be used to convert any of (1)–(8) to either
This should be undertaken before applying the transformation methods below.
For variable X with a log-normal distribution, such that
it is a standard result that the mean and variance of X are given by
We consider three methods for transforming between log-transformed and raw scales, that is, for estimating the mean and variance of X from the sample mean and variance of Z, or vice versa. The first two methods exploit the result above. In Method 1, we transform the mean and standard deviation within each group, and then make the comparison across groups. The standard deviations are thus allowed to differ in the two groups. Method 2 follows the same approach as Method 1, but assumes a common standard deviation underlying both groups. This assumption of common standard deviation could be made on either the raw or the log-transformed scale; we choose the latter as a generally more plausible assumption. Method 3 targets the difference between the groups rather than the group means separately. It does not assume a log-normal distribution for the raw data, and is applicable to other transformations as well as the log transformation.
We also derive expressions for the standard errors of the estimators. One possibility for Methods 1 and 2 is to apply standard methods to the converted means and standard deviations for the two groups to obtain a difference in means and its standard error: we call this the ‘ad hoc’ approach. However, estimators based on the mean and standard deviation on the log scale are more efficient (have smaller standard errors); hence, the resulting standard errors are too small for conversions from raw to logarithm and too large for conversions from logarithm to raw. We therefore derive alternative standard errors from asymptotic Taylor series approximations. All the estimators below are ‘plug-in’ estimators derived by replacing the population parameters with their estimates; they are therefore likely to be unbiased in large samples but biased in small samples. Further work would be required to remove the small-sample bias.
An approximate transformation from Z to X is obtained by substituting estimates for the unknown quantities in the standard result above. Solving the formulae for μ and yield the expressions for the opposite conversions. This moment-based approach has been described previously by Whitehead et al. . For this method and Method 2, we denote the two exposure (or treatment) groups as i = 1 and i = 2.
From raw to logarithm: To convert and sx,i to an approximate mean and standard deviation on the log-transformed scale, take
(where the single dash on denotes transformation using Method 1), and
The required difference in means on the log scale from Method 1 is given by
The standard error is given by
The ‘ad hoc’ estimator of uses the t-test formula:
However, this wrongly assumes that has been computed as an arithmetic mean. The alternative standard error is given by the Taylor series approximation
The last two expressions were obtained by approximating whose asymptotic accuracy was confirmed by simulation. Then we computed
by expanding and using E[Xn] = E[enZ] = enμ+n2σ2/2. A similar argument applies for the covariance.
From logarithm to raw: To convert and sz,i to an approximate mean and standard deviation on the raw scale, take
The required difference in means is now
with standard error
The ‘ad hoc’ standard error is estimated using
and the alternative standard error by the Taylor series approximation
It can be seen that , and so the alternative standard error is smaller than the ‘ad hoc’ one.
Method 2 is similar to Method 1, but assumes a pooled standard deviation on the log-transformed scale.
From raw to logarithm: To convert and si to an approximate mean and standard deviation on the logarithmic scale, we first transform the standard deviations and then pool them.
(where the double dash denotes transformation using Method 2). The required difference in means on the logarithmic scale is given by
The ‘ad hoc’, t-test-type, standard error is given by
and the standard error, based on Taylor approximation, is given by
From logarithm to raw: To convert and sz,i to an approximate mean and standard deviation on the raw scale, we first pool the standard deviations.
The required difference in means, an ‘ad hoc’ standard error and a standard error by Taylor series approximation are given respectively by
Our third method follows from the following general result and applies directly to the difference between groups rather than the two group means separately. Let A = g(B) be the transformation of interest. Then, for example, g(B) = ln(B) or g(B) = exp(B) for the current application. Suppose the data have been analysed under a linear model for B:
where Tk represents covariates for individual k. For the simple comparison of two groups, Tk represents only group allocation, and β is the difference in means. Now let μB be the overall mean, across values of T. Then a first-order Taylor series expansion about μB gives
The difference between the means of the two groups can then be estimated, by subtraction, as . The standard error is obtained similarly as . This first-order approximation neglects terms involving β2 and beyond, and neglects the term involving the variance of B. The former should be acceptable for small effect size β, and the latter if the variance does not depend on T, i.e. if the spread of the distribution is similar across groups. The derivatives turn out to be the overall geometric mean when transforming from logarithm to raw, and the reciprocal of the overall arithmetic (raw) mean when transforming from raw to logarithm.
From raw to logarithm: To convert a difference in means on the raw scale to an approximate difference on the logarithmic scale, take to be the overall arithmetic mean across groups on the raw scale, and use
where dx and SE(dx) are the difference in means and its standard error from raw means.
From logarithm to raw: To convert a difference in means on the logarithmic scale to an approximate difference on the raw scale, take to be the geometric mean of the geometric means across groups (equivalent to the exponential of the arithmetic mean of the means of log-transformed values), and use
where dz and SE(dz) are the difference in means and its standard error from log-transformed values.
Sagoo et al. conducted a systematic review of association between polymorphisms in the lipoprotein lipase (LPL) gene and coronary heart disease, and also studied plasma levels of cholesterol and triglycerides . We address one particular meta-analysis of 14 studies of the association between triglyceride level and being a carrier or non-carrier of the D9N polymorphism in the LPL gene. Triglyceride levels are typically skewed, and are sometimes presented on the log scale. Through a combination of data extraction from the published reports and correspondence with the original investigators, the review authors obtained means and standard deviations on both logarithmic and raw scales for five studies, on the logarithmic scale only for one study and on the raw scale only for eight studies (Table I). Results for individual studies and meta-analyses are provided in Table II and Figures 1 and and2,2, for available (‘true’) data and for transformations using our various methods.
Available data on the raw scale allowed meta-analysis of 13 of the studies. We also undertook meta-analyses of all 14 studies, making transformations from the logarithmic to the raw scale wherever this was possible. For five studies, the ‘true’ results can be compared directly with transformations from logarithmic data, and the results are similar in all cases (Figure 1). Furthermore, there are no substantial differences across the different transformation methods (Table II, Figure 1). It is possible for the effect direction to change on transforming between metrics when assuming separate standard deviations. For example, the Boer 2003b transformed to the raw scale using Method 1 (Table II, first row), produces a point estimate that indicates a higher mean (by 0.009) in the carriers than in the non-carriers, compared with a lower mean (by 0.022) of logs in the observed data. This is because of the larger standard deviation of carriers than non-carriers on the log scale. However, the change in the point estimate is trivial in the context of its confidence interval.
Available data on the log scale allowed meta-analysis of six of the studies. We also undertook meta-analyses of all 14 studies, making transformations from the raw to the logarithmic scale wherever this was possible. Again, for five studies, the ‘true’ and transformed results can be compared directly (Figure 2). One notable discrepancy is in the Copenhagen study, in which the ‘true’ mean difference is smaller than the values estimated by our transformations, and has a somewhat smaller standard error. The bias in the transformations may be because the standard deviations of raw triglyceride levels are relatively large compared with their means, combined with sample size imbalance (see also later simulation results, Table V), or because the data depart more substantially from a log-normal distribution in this study. Point estimates for Method 3 agree well with those for Method 2. In three studies (EARS, FOS and Reykjavik), the assumption of a common standard deviation has a more noticeable effect on the point estimate, so that Method 1 differs from Methods 2 and 3. The studies are also responsible for introducing heterogeneity into the meta-analyses and increasing the summary effect estimate for Method 1. These three studies have substantially different observed standard deviations between carriers and non-carriers (see also later simulation results, Table VI).
We undertook a simulation study to compare the methods. Continuous outcome data were simulated for a single, two-group study, according to various distributions, and subjected to the three transformation methods, both converting the raw simulated data to the logarithmic scale and converting the logs of the simulated data to the raw scale. Since we knew the means and standard deviations on both scales (either theoretically or empirically), we could compare the estimated differences in means (and their standard errors) with those that would have been obtained had the data been analysed on the desired scale.
Our initial set of simulations used log-normally distributed data with equal standard deviations across groups (on the log scale), thus the distributional assumptions underlying all methods hold exactly, and only the asymptotic approximations would affect results. Each group had a sample size of 100. We then evaluated, with further simulations, (i) small sample sizes (10 rather than 100); (ii) imbalance in sample sizes across the two groups; (iii) different standard deviations in the two groups; (iv) a different skewed distribution (gamma distribution); and (v) lack of serious skew (normal distribution, with negative values rejected). The gamma and normal distributions were chosen to have (before rejection of samples) identical means and standard deviations on the raw scale to the initial log-normally distributed data. Full details of the data generation and the parameter values are provided in Table III. Illustrations of all distributions simulated are included in Figure 3.
For each scenario and parameter set (each row in Table III), we undertook 10000 simulations. Each simulation produced three estimates (, and ) with five standard errors (SEA(), SEB(), SEA(), SEB() and SE()) for transformations from the raw to the log scale, and the corresponding numbers for transformations from the log to the raw scale. We summarized them using measures of bias, precision and coverage as follows, where d represents one of the three estimates.
Bias: Bias was defined as mean estimated difference in means (d) minus true difference in means. For log-normal simulations and gamma simulations (raw scale only), the true values were known theoretically. For the others, the true mean differences were estimated empirically across simulations. We present mean bias for log-normal simulations and median bias for gamma and normal simulations due to some extreme and influential values.
Precision: We present mean values of estimated standard errors across simulations, separately for the Taylor series method, SEA (d), and the t-test (‘ad hoc’) method, SEB (d). We also present empirical standard errors of the estimated mean differences. For the log-normal simulations, these are calculated as empirical standard deviations over all 10 000 simulations. For the gamma and normal simulations, we present the difference between the 69th and 31st percentiles as an approximately equivalent measure (for a normal distribution, this difference equals the standard deviation).
Coverage: Coverage was defined as the percentage of simulations in which a 95 per cent confidence interval, obtained as d ± 1.96 × SE(d), included the true difference in means (theoretically or empirically obtained).
Monte Carlo errors for each reported value were calculated, as SD(d)/ for mean bias, as SD(SE(d))/ for estimated standard errors, as for estimated coverage P, and from confidence intervals for medians.
Distributional assumptions met (log-normal distribution): Table IV. For log-normally distributed data with equal standard deviations (on the log scale) and equal sample size, all methods work well when the standard deviation is small (Table IV, Sets 2 and 4). With a large standard deviation; however, three potential problems are apparent (Table IV, Sets 1 and 3). First, there is bias towards the null in Method 3 for the transformation from the log to the raw scale when the means are not equal. This is because of the omitted third and higher-order terms in the Taylor series. Indeed, we can show that for small difference between the groups and equal standard deviations, Method 3 estimates a fraction of the true difference. Second, standard errors using the Taylor approximation are inflated when transforming from the raw to the log scale for Method 1. We believe this is because the asymptotic formula requires very large samples to be valid in this case, perhaps because of the large exponential terms. Third, t-test-based standard errors are too low for raw to log, and too high for log to raw, with corresponding under- or over-coverage. The conversion is in reality less efficient in the former direction and more efficient in the latter direction than is reflected in these ‘naïve’ standard errors. Empirical standard errors for large standard deviations are larger for Method 1 than for Method 2 converting from log to raw (Table IV, Sets 1 and 3) since in Method 1 the two standard deviations (which are not pooled) are subject to greater variability than the pooled standard deviation in Method 2; empirical standard errors are much smaller for Method 3 from log to raw due to the bias towards the null.
Small sample sizes: Results not shown; and imbalanced sample sizes: Table V. Findings were very similar for small sample sizes. The only identifiable sample-size-related problem is an increase in the standard errors for the Taylor approximation method from raw to log, resulting in lower coverage compared with the larger sample size (although in fact producing coverage around 95 per cent for Method 1). When sample sizes are unbalanced, there is bias for large standard deviations in all methods for both transformations (Table V, Sets 9 and 11). This is likely due to a small-sample bias that cancels out across groups when the sample sizes are equal. Coverage for Methods 3, which is adequate when sample sizes are the same, is reduced for unbalanced sample sizes when transforming from raw to log with large standard deviations.
Different standard deviations: Table VI. Bias in Methods 2 and 3 (which assume a common standard deviation on the log scale) can be considerable (Table VI, Sets 13 and 15) when the standard deviations are genuinely different. Coverage drops to as low as 1 per cent in one scenario. Method 1 has broadly similar properties to the case of equal standard deviations, although there is a small bias in the point estimate.
Alternative skew (gamma distribution); and no skew (normal distribution): results not shown. The transformation from raw to log scales is associated with very little bias. Taylor approximation standard errors are again high for Method 1 when the standard deviation is large. T -test-based standard errors are again low for this transformation for both Methods 1 and 2. The transformation from log to raw produces some large biases in all Methods. T-test standard errors for this transformation are over-estimated considerably.
Transformations of normally distributed data to the logarithmic scale have good properties in the scenarios simulated. The opposite transformation produced some bias for all three Methods for one scenario with non-zero effect and large standard deviations.
In meta-analysis, it is desirable to combine effects measured on a common scale from as many studies as possible. One obstacle to achieving this is when results are reported on a log-transformed scale for some studies, but on the raw scale for other studies. We have presented several methods for transforming data from two-group studies presented on a logarithmic scale to a raw scale and from a raw scale to a logarithmic scale, thus enabling meta-analyses of all studies to be conducted on one or other scale. The methods also allow a meta-analysis to be undertaken on a log-transformed scale even if all studies report data on the raw scale. This enables estimation of a meta-analytic ratio of geometric means. Such a metric may provide a natural ‘standardization’ across studies, hence reducing heterogeneity, and provides an alternative to the ratio of arithmetic means that is sometimes used .
Our first method (Method 1) assumes log-normal distributions with different standard deviations, Method 2 assumes log-normal distributions with a common standard deviation (on the log scale), while Method 3 assumes no particular distribution, but requires similar distributional shapes in the two groups and small effect sizes. On application of the methods to an example, in which most data were reported on the raw scale, we observed some differences between the three methods. Some studies gave substantially different results for Method 1 because of a difference in standard deviations across groups; other studies gave different results for Method 3 because its associated standard errors can be different. In one study, all transformations produced a biased result.
We evaluated the properties of the three methods in a simulation study. This did not reveal a uniformly preferable method. All methods were reasonably robust to data having distributions other than the log-normal. The most serious threat to validity from among the scenarios we simulated was when the standard deviations differed between the groups. Method 1 offers clear advantages in this situation. When standard deviations are large compared with means, biased estimates (in either direction) can be obtained and there is a variation in the precision with which the three methods estimate differences in means: Method 3 produces the most precise estimates when transforming from the log to the raw scale; methods are similar when transforming from the raw to the log scale. We derived a Taylor approximation to the standard error and compared it with a t-test-based approach. The Taylor approximation can overestimate standard errors (particularly for raw to log transformations with large standard deviations), but otherwise seems to perform well. The more naïve t-test-based approach is less good as it treats transformed means and standard deviations as if they were simple arithmetic means. However, it can be implemented more readily in commonly used meta-analysis software such as RevMan , metan  (for Stata) and Comprehensive Meta-analysis . Its performance is probably adequate for most meta-analytic purposes.
One possible extension to our proposed methods would be to replace our estimators, which are maximum likelihood and therefore may have small-sample bias, with bias-corrected estimators . However, no closed-form standard error is available to our knowledge.
In conclusion, we recommend the use of Method 1 whenever standard deviations are likely to be different in the two groups, with Taylor approximation standard errors for the log to raw transformation. For transformations from raw to log scales, the Taylor approximation standard errors can be large, resulting in down-weighting of these studies in a meta-analysis. When standard deviations are similar, greater precision can be obtained using Method 2, especially when transforming to the raw scale. Method 3 offers a general framework that can be used for different data transformations.
Since the methods allow meta-analyses to be conducted on either the raw or the log-transformed scale, decisions on which scale to use will be required. Several considerations may guide the choice of scale, including (i) fidelity to the data available, by using the scale most frequently reported; (ii) best meeting meta-analytic assumptions, by using the scale believed to have less skew; (iii) minimizing consistency (heterogeneity) of results; (iv) applying the results to another problem (for example, if the results are to feed into a further analysis that requires data on a specific scale). The simulation study did not indicate consistently better properties of one direction of transformation over the other.
We thank Gurdeep Sagoo for sharing knowledge and data for the application, and two referees for helpful comments. This work was funded by MRC Grants U. 1052.00.011 and U. 1052.00.006 and with support from the PHG Foundation.