|Home | About | Journals | Submit | Contact Us | Français|
To establish indirect reference intervals from patient results obtained during routine laboratory work as an alternative to laborious and expensive producing of their own reference range values according to international instructions.
All results for thyrotropin (TSH) and free thyroxine (T4) that were stored in our laboratory information system between 2004 and 2008 were included in this study. After a logarithmic transformation of the raw data, outliers were excluded. Non-parametric reference intervals were estimated statistically after visual observation of the distribution using stem-and-leaf plots and histograms. A standard normal deviation test was performed to test the significance of differences between sub-groups.
There was no significant difference in serum TSH or free T4 concentrations between male and female participants. Because no differences were found within the time span of the study, combined reference intervals were calculated. Indirect reference values were 0.43-3.93 mU/L for TSH and 11.98-21.33 pmol/L for free T4.
Using patient laboratory data values is a relatively easy and cheap method of establishing laboratory-specific reference values if skewness and kurtosis of the distribution are not too large.
Reference values are medical decision-making tools that are provided by a clinical laboratory to aid the physician in differentiating a diseased patient from a healthy individual. Establishing correct reference values and intervals is an important task for a clinical laboratory. However, a major problem that the laboratory faces is obtaining a sufficient number of specimens from healthy individuals representative of the population that the laboratory serves.
The complexity of establishing reference values and the cost and labor are additional difficulties. It has been concluded by most of the European learned societies on clinical chemistry that each clinical laboratory should produce its own reference values (1). However, very few laboratories actually do so. To make things easier, the International Federation of Clinical Chemistry (IFCC) published recommendations for the transferability of the reference values from one institute to another, but even in this case, the laboratories involved should obtain comparable results and this can only be achieved by conducting long-term inter-laboratory studies of the analytical methods in use, in terms of precision and accuracy (2,3). It may well be true that the best reference values for an individual are so-called “subject-based reference values,” derived from their own prior data, but again, such data are not often available.
Additionally, the IFCC definition of healthy ambulatory individuals may not be optimal references for hospitalized patients, because of differences in physical activity, diet, level of stress, diurnal rhythms, or other factors related to hospital stay. From this point of view, a hospitalized patient, not affected by the disease in question, but subject to the same conditions would be a better reference for a patient having a certain disease (4).
For all these reasons, some scientists working in the area have investigated the possibility of establishing reference values from large collections of laboratory data, using sophisticated laboratory information systems and statistical programs (5). The major advantage of using such an approach is that it saves a significant amount of money and work by using data that already exist.
Because subclinical thyroid dysfunction has few or no definitive clinical signs or symptoms, it is essentially a laboratory diagnosis. Thus, the standardization of normal reference ranges of thyroid function tests is as important as the sensitivity of the tests and appropriate quality control procedures (6). This study was designed to estimate indirect reference values for two important thyroid function tests: thyrotropin (TSH) and free thyroxine (T4), using data stored in our laboratory information system from 2004 to 2008.
The results of TSH and free T4 from all the hospitals and health centers associated with Acibadem Health Group were included in the study. All diagnostic data have been stored in our laboratory information system (Tenay, Istanbul, Turkey) since 2004. Among them, the data from all patients above the age of 20 were selected to get adult thyroid hormone concentrations. Thyroid patients were not excluded from the study because they were expected to have very high or low thyroid function test results and would possibly be excluded while detecting the outliers. Only the first result for each patient was included. Quality control results and extreme values, such as those >200 or <0.005 mU/L for TSH and >100 pmol/L for free T4, were excluded from the study without any statistical analysis, because these values represent the detection limits of the respective methods. All measurements were performed using Elecsys 2010 analyzers (Roche Diagnostics, Mannheim, Germany) at all locations. Two levels of internal quality control were conducted daily and the tests were covered by an external quality assessment scheme (DGKL, Bonn, Germany); the laboratories included in the study had also been accredited by Deutsche Akkreditierungsstelle Chemie GmbH (http://www.dach-gmbh.de/), according to ISO 15189 standards, since 2005.
All data were analyzed by SPSS, version 11.5 for Windows (SPSS Inc., Chicago, IL, USA). Values were sorted in ascending order and, because the raw values showed a skewed curve with a long tail toward higher TSH values, they were first treated by log transformation of the original TSH values. The same treatment was applied for free T4 values as well. Histograms were visually inspected again because it was not likely to get a normal distribution even after transformation if skewness and kurtosis of the distribution were too large (by visual inspection). Outliers were identified and omitted using stem-and-leaf and box plots in SPSS statistical software. The software identifies outliers first by computing the interquartile range (IQR) between the lower and upper quartiles of the distribution and then by determining the data lying outside 3 IQRs from the upper or lower edge of the box. The procedure was repeated until no extremes were left. Then, the values were converted back to normal values by an anti-logarithmic procedure and extremes were again excluded.
For both analytes, a non-parametric method was applied to estimate the indirect reference intervals. The rank numbers of the 2.5th and 97.5th percentiles were computed as 0.025(n + 1) and 0.975(n + 1), respectively. The reference intervals were then also estimated for men and women separately. The 90% confidence intervals for lower and upper limits were calculated according to the recommendations of the IFCC.
The standard normal deviation test (Z-test) was performed to reveal the significance of differences between the sub-groups (eg, between years 2004-2005, 2005-2006, etc., and sex) according to the formula below (7):
where x1 and x2 are the calculated means of the two subgroups, s12 and s22 are the variances, and n1 and n2 are the number of patients in each group. The calculated z value is to be compared with a “critical” z value (7):
If the calculated z was greater than the critical z, then the means of the groups were considered to be different from each other. The critical value is a threshold value in which z corresponds to a sample size of n=120 from each subgroup. Value of z=3 is used to represent a difference between subgroups just large enough to justify separate ranges (8).
The shapes of the distributions of both analytes were positively skewed, which means that the mean of the distribution was at the right of the median. The total number of patients included in the study was 73465 for TSH and 73387 for free T4 at first, and then 18.1% of TSH and 7.7% of free T4 results were excluded either because they lied outside the detection limit of the methodology or were detected as outliers after statistical treatment (Table 1). The indirect reference limits and 90% confidence intervals for TSH and free T4 were determined separately for each year between 2004 and 2008 and by sex (Table 2 and Table 3). For two subclasses (men and women), the statistical significance of the difference between subclass means was tested by the standard normal deviation test (z-test). No difference was found between serum TSH and free T4 concentrations in male vs female participants (Table 4). Thus, both sexes were combined for further calculations. The standard normal deviation test was also applied to see whether any significant difference occurred between consecutive years; no difference was found (Table 5). Finally, reference and confidence intervals for TSH (n=55318; 21.3% male, 78.7% female) and free T4 (n=62713; 21.4% male, 78.6% female) were calculated from patient data (indirect method) and compared with transferred data, according to IFCC recommendations, which we currently use in our laboratory (direct method) and the manufacturer’s recommended reference intervals (Table 6).
We established indirect reference intervals for TSH and free T4, using data stored in our laboratory information system. Reference intervals predict that approximately 95% of the values in the population will lie within the range given by lower and upper values; and of the remaining 5%, half the values will be higher and half will be lower than the limits of this range if the variable being measured has a normal distribution. Skewed distributions are also found for most of the analytes, but can often be mathematically transformed to a normal distribution. Data collected from a patient must be interpreted in comparison with reference data. Therefore, reference values and intervals are important tools in providing a basis for interpreting laboratory results. However, to produce high quality reference values according to the recommendations of IFCC for all relevant analytes is far beyond the capacity of a single laboratory. The values for most analytes, including thyroid hormones, are to some degree population-, method-, and instrument-dependent (9-14). For example, thyroid hormone values would be quite different between radioimmunoassay and chemiluminescence methods. Thus, one would find a series of thyroid reference data that have been published using different analytical methods and instruments, and choose one of them to use in a laboratory that best suits the situation. However, differences between populations are also an issue and can reduce the validity of reference values (15). Even the transfer of reference intervals is not an easy task for laboratories. A well-known method to determine if one can use some other laboratories’ reference interval is to measure the analyte on 20 healthy people and compare these values with the provided 95% interval. If 3 or more of the 20 values lie outside the interval, then one cannot transfer that reference interval. Additionally, gathering and collecting sufficient numbers of samples from sub-populations, for example, pediatric groups or pregnant women, is not possible for a clinical laboratory that faces a typical load of daily routine work.
All these factors have forced clinical laboratories to use “somebody else’s” reference values, specifically those derived from the equipment of reagent manufacturers, or values obtained from the literature. In either case, the reference range given on a laboratory report may actually be little more than a “good guess” for a normal population.
Various indirect methods have been developed for establishing reference intervals from the results of unselected hospital patients. One is the Bhattacharya method, which assumes the distribution to be Gaussian. The modified Bhattacharya method gives higher upper reference limits for most of the analytes examined (16). This method is also applied to internal quality control data and compared with the average of normals (AoN) method (17). Other mathematical formulas have also been proposed (18).
Reference intervals for TSH reported in different populations and different analyzers show significant differences in the lower and upper limits: lower limits ranged from 0.17 to 0.6 and upper limits from 3.63 to 5.95 mU/L (9-13). These discrepancies arise because of the use of different analyzers in different locations and also the methodology used to establish reference intervals. Ethnic differences should also be considered when establishing reference intervals (19,20).
From the clinical point of view, it is important to decide whether the patient has a subclinical thyroid condition, because this determines what to do when a patient is found to have mild abnormalities in TSH levels. Recent recommendations regarding subclinical hyperthyroidism suggest different advice on its management, depending on whether the TSH concentration is lower than 0.1 mIU/L or 0.1-0.4 mIU/L (21). This makes the lower reference limit as important as the upper reference limit for TSH. Moreover, there are published laboratory guidelines indicating that more than 95% of normal individuals have TSH levels below 2.5 mU/L (14). Establishing of a more precise and true reference interval for TSH has important implications for both screening and treatment of thyroid disease. Although clinical data inform us that TSH values greater than 2.5 mU/L are predictive of progression to overt hypothyroidism (22), there are also contradictory opinions claiming that no firm evidence is available that lowering the upper limit will provide any short- or long-term benefit for the patient. Additionally, this may increase the risk of thyroxine over-treatment, possibly resulting in subclinical hyperthyroidism (23). However, cut-off limits for diagnosis or considering treatment of subclinical thyroid dysfunction are a separate issue and should not be confused with the reference ranges. Such cut-offs may be lower than the upper limit of the reference interval (24). There are other studies calculating reference limits obtained in hospitalized patients for catalytic activity of serum enzymes (25) or for other clinical chemistry analytes (26).
Our overall limits for TSH were 0.43-3.93 (90% CI was 0.425-0.435 and 3.887-3.974 for lower and upper limits, respectively); our upper limit for TSH was slightly lower than the transferred and manufacturer’s values. While our lower reference limit agreed well with the transferred values, it was significantly higher than the manufacturer’s limit. There were no differences between the years, showing that the method we used to estimate the reference intervals was robust and not affected by environmental factors during the years; the overall number of participants’ results in the study exceeded 50000. Additionally, to avoid bad practices in the calculation of reference intervals, such as computing the intervals without visual inspection of the data, stem-and-leaf plots and histograms were used, and they revealed heavily skewed data. Also, to avoid arbitrary truncation of data, another common mistake, an outlier detection program was used until no outlier was left. Although there are some other algorithms for the detection of outliers in reference distributions, they still need reliable statistical evidence before being used (27). The reference interval for free T4 was 11.98-21.33 pmol/L, with a 90% confidence interval of 11.67-12.30 for the lower, and 20.777-21.898 for the upper reference limits, calculated from more than 60000 results. There was again no statistical difference by year or sex.
The confidence intervals are calculated for lower and upper limits for both analytes. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. Thus, confidence intervals are used to indicate the reliability of an estimate, and it shows how likely the interval is to contain the parameter.
In conclusion, establishing reference values from patients’ results has many advantages, including being the cheapest and easiest way to collect data. Because they are derived from patients with the same conditions, they are likely to match clinical results better. There are also several arguments against using a hospitalized population, including the idea that all the methods are “indirect” and not in accordance with the IFCC recommendations. Although little is known about the participants from whom the values are derived, and “normal” values may vary between hospitals, this is an advantage rather than a disadvantage because by using hospital data, the idea of developing reference values for each laboratory and hospital becomes possible. Finally, the establishment of a more precise and true reference intervals for both analytes would give a better chance for diagnosis or considering treatment of thyroid dysfunction than using manufacturer’s values or transferred intervals.