Search tips
Search criteria 


Logo of croatmedjFree full text at www.cmj.hrAboutSubscribeSubmitInfo for AuthorsFree full text at
Croat Med J. 2010 April; 51(2): 124–130.
PMCID: PMC2859416

Indirect Reference Intervals Estimated from Hospitalized Population for Thyrotropin and Free Thyroxine



To establish indirect reference intervals from patient results obtained during routine laboratory work as an alternative to laborious and expensive producing of their own reference range values according to international instructions.


All results for thyrotropin (TSH) and free thyroxine (T4) that were stored in our laboratory information system between 2004 and 2008 were included in this study. After a logarithmic transformation of the raw data, outliers were excluded. Non-parametric reference intervals were estimated statistically after visual observation of the distribution using stem-and-leaf plots and histograms. A standard normal deviation test was performed to test the significance of differences between sub-groups.


There was no significant difference in serum TSH or free T4 concentrations between male and female participants. Because no differences were found within the time span of the study, combined reference intervals were calculated. Indirect reference values were 0.43-3.93 mU/L for TSH and 11.98-21.33 pmol/L for free T4.


Using patient laboratory data values is a relatively easy and cheap method of establishing laboratory-specific reference values if skewness and kurtosis of the distribution are not too large.

Reference values are medical decision-making tools that are provided by a clinical laboratory to aid the physician in differentiating a diseased patient from a healthy individual. Establishing correct reference values and intervals is an important task for a clinical laboratory. However, a major problem that the laboratory faces is obtaining a sufficient number of specimens from healthy individuals representative of the population that the laboratory serves.

The complexity of establishing reference values and the cost and labor are additional difficulties. It has been concluded by most of the European learned societies on clinical chemistry that each clinical laboratory should produce its own reference values (1). However, very few laboratories actually do so. To make things easier, the International Federation of Clinical Chemistry (IFCC) published recommendations for the transferability of the reference values from one institute to another, but even in this case, the laboratories involved should obtain comparable results and this can only be achieved by conducting long-term inter-laboratory studies of the analytical methods in use, in terms of precision and accuracy (2,3). It may well be true that the best reference values for an individual are so-called “subject-based reference values,” derived from their own prior data, but again, such data are not often available.

Additionally, the IFCC definition of healthy ambulatory individuals may not be optimal references for hospitalized patients, because of differences in physical activity, diet, level of stress, diurnal rhythms, or other factors related to hospital stay. From this point of view, a hospitalized patient, not affected by the disease in question, but subject to the same conditions would be a better reference for a patient having a certain disease (4).

For all these reasons, some scientists working in the area have investigated the possibility of establishing reference values from large collections of laboratory data, using sophisticated laboratory information systems and statistical programs (5). The major advantage of using such an approach is that it saves a significant amount of money and work by using data that already exist.

Because subclinical thyroid dysfunction has few or no definitive clinical signs or symptoms, it is essentially a laboratory diagnosis. Thus, the standardization of normal reference ranges of thyroid function tests is as important as the sensitivity of the tests and appropriate quality control procedures (6). This study was designed to estimate indirect reference values for two important thyroid function tests: thyrotropin (TSH) and free thyroxine (T4), using data stored in our laboratory information system from 2004 to 2008.



The results of TSH and free T4 from all the hospitals and health centers associated with Acibadem Health Group were included in the study. All diagnostic data have been stored in our laboratory information system (Tenay, Istanbul, Turkey) since 2004. Among them, the data from all patients above the age of 20 were selected to get adult thyroid hormone concentrations. Thyroid patients were not excluded from the study because they were expected to have very high or low thyroid function test results and would possibly be excluded while detecting the outliers. Only the first result for each patient was included. Quality control results and extreme values, such as those >200 or <0.005 mU/L for TSH and >100 pmol/L for free T4, were excluded from the study without any statistical analysis, because these values represent the detection limits of the respective methods. All measurements were performed using Elecsys 2010 analyzers (Roche Diagnostics, Mannheim, Germany) at all locations. Two levels of internal quality control were conducted daily and the tests were covered by an external quality assessment scheme (DGKL, Bonn, Germany); the laboratories included in the study had also been accredited by Deutsche Akkreditierungsstelle Chemie GmbH (, according to ISO 15189 standards, since 2005.

Estimation of reference intervals

All data were analyzed by SPSS, version 11.5 for Windows (SPSS Inc., Chicago, IL, USA). Values were sorted in ascending order and, because the raw values showed a skewed curve with a long tail toward higher TSH values, they were first treated by log transformation of the original TSH values. The same treatment was applied for free T4 values as well. Histograms were visually inspected again because it was not likely to get a normal distribution even after transformation if skewness and kurtosis of the distribution were too large (by visual inspection). Outliers were identified and omitted using stem-and-leaf and box plots in SPSS statistical software. The software identifies outliers first by computing the interquartile range (IQR) between the lower and upper quartiles of the distribution and then by determining the data lying outside 3 IQRs from the upper or lower edge of the box. The procedure was repeated until no extremes were left. Then, the values were converted back to normal values by an anti-logarithmic procedure and extremes were again excluded.

For both analytes, a non-parametric method was applied to estimate the indirect reference intervals. The rank numbers of the 2.5th and 97.5th percentiles were computed as 0.025(n + 1) and 0.975(n + 1), respectively. The reference intervals were then also estimated for men and women separately. The 90% confidence intervals for lower and upper limits were calculated according to the recommendations of the IFCC.

The standard normal deviation test (Z-test) was performed to reveal the significance of differences between the sub-groups (eg, between years 2004-2005, 2005-2006, etc., and sex) according to the formula below (7):

An external file that holds a picture, illustration, etc.
Object name is CroatMedJ_51_0124-M1.jpg

where x1 and x2 are the calculated means of the two subgroups, s12 and s22 are the variances, and n1 and n2 are the number of patients in each group. The calculated z value is to be compared with a “critical” z value (7):

An external file that holds a picture, illustration, etc.
Object name is CroatMedJ_51_0124-M2.jpg

If the calculated z was greater than the critical z, then the means of the groups were considered to be different from each other. The critical value is a threshold value in which z corresponds to a sample size of n = 120 from each subgroup. Value of z = 3 is used to represent a difference between subgroups just large enough to justify separate ranges (8).


The shapes of the distributions of both analytes were positively skewed, which means that the mean of the distribution was at the right of the median. The total number of patients included in the study was 73 465 for TSH and 73 387 for free T4 at first, and then 18.1% of TSH and 7.7% of free T4 results were excluded either because they lied outside the detection limit of the methodology or were detected as outliers after statistical treatment (Table 1). The indirect reference limits and 90% confidence intervals for TSH and free T4 were determined separately for each year between 2004 and 2008 and by sex (Table 2 and Table 3). For two subclasses (men and women), the statistical significance of the difference between subclass means was tested by the standard normal deviation test (z-test). No difference was found between serum TSH and free T4 concentrations in male vs female participants (Table 4). Thus, both sexes were combined for further calculations. The standard normal deviation test was also applied to see whether any significant difference occurred between consecutive years; no difference was found (Table 5). Finally, reference and confidence intervals for TSH (n = 55 318; 21.3% male, 78.7% female) and free T4 (n = 62 713; 21.4% male, 78.6% female) were calculated from patient data (indirect method) and compared with transferred data, according to IFCC recommendations, which we currently use in our laboratory (direct method) and the manufacturer’s recommended reference intervals (Table 6).

Table 1
The number of patients included in study and the numbers and percentages of excluded patients, detected as outliers calculated for each test and year*
Table 2
Indirect reference limits and confidence intervals for free thyroxine from the patient data according to years
Table 3
Indirect reference limits and confidence intervals for thyrotropin from the patient data according to years
Table 4
The z-scores for thyrotropin and free thyroxine through 2004-2008 between men and women
Table 5
The z-scores for the comparisons between the consecutive years
Table 6
Reference and confidence intervals for thyrotropin and free thyroxine calculated from patient data (indirect method).


We established indirect reference intervals for TSH and free T4, using data stored in our laboratory information system. Reference intervals predict that approximately 95% of the values in the population will lie within the range given by lower and upper values; and of the remaining 5%, half the values will be higher and half will be lower than the limits of this range if the variable being measured has a normal distribution. Skewed distributions are also found for most of the analytes, but can often be mathematically transformed to a normal distribution. Data collected from a patient must be interpreted in comparison with reference data. Therefore, reference values and intervals are important tools in providing a basis for interpreting laboratory results. However, to produce high quality reference values according to the recommendations of IFCC for all relevant analytes is far beyond the capacity of a single laboratory. The values for most analytes, including thyroid hormones, are to some degree population-, method-, and instrument-dependent (9-14). For example, thyroid hormone values would be quite different between radioimmunoassay and chemiluminescence methods. Thus, one would find a series of thyroid reference data that have been published using different analytical methods and instruments, and choose one of them to use in a laboratory that best suits the situation. However, differences between populations are also an issue and can reduce the validity of reference values (15). Even the transfer of reference intervals is not an easy task for laboratories. A well-known method to determine if one can use some other laboratories’ reference interval is to measure the analyte on 20 healthy people and compare these values with the provided 95% interval. If 3 or more of the 20 values lie outside the interval, then one cannot transfer that reference interval. Additionally, gathering and collecting sufficient numbers of samples from sub-populations, for example, pediatric groups or pregnant women, is not possible for a clinical laboratory that faces a typical load of daily routine work.

All these factors have forced clinical laboratories to use “somebody else’s” reference values, specifically those derived from the equipment of reagent manufacturers, or values obtained from the literature. In either case, the reference range given on a laboratory report may actually be little more than a “good guess” for a normal population.

Various indirect methods have been developed for establishing reference intervals from the results of unselected hospital patients. One is the Bhattacharya method, which assumes the distribution to be Gaussian. The modified Bhattacharya method gives higher upper reference limits for most of the analytes examined (16). This method is also applied to internal quality control data and compared with the average of normals (AoN) method (17). Other mathematical formulas have also been proposed (18).

Reference intervals for TSH reported in different populations and different analyzers show significant differences in the lower and upper limits: lower limits ranged from 0.17 to 0.6 and upper limits from 3.63 to 5.95 mU/L (9-13). These discrepancies arise because of the use of different analyzers in different locations and also the methodology used to establish reference intervals. Ethnic differences should also be considered when establishing reference intervals (19,20).

From the clinical point of view, it is important to decide whether the patient has a subclinical thyroid condition, because this determines what to do when a patient is found to have mild abnormalities in TSH levels. Recent recommendations regarding subclinical hyperthyroidism suggest different advice on its management, depending on whether the TSH concentration is lower than 0.1 mIU/L or 0.1-0.4 mIU/L (21). This makes the lower reference limit as important as the upper reference limit for TSH. Moreover, there are published laboratory guidelines indicating that more than 95% of normal individuals have TSH levels below 2.5 mU/L (14). Establishing of a more precise and true reference interval for TSH has important implications for both screening and treatment of thyroid disease. Although clinical data inform us that TSH values greater than 2.5 mU/L are predictive of progression to overt hypothyroidism (22), there are also contradictory opinions claiming that no firm evidence is available that lowering the upper limit will provide any short- or long-term benefit for the patient. Additionally, this may increase the risk of thyroxine over-treatment, possibly resulting in subclinical hyperthyroidism (23). However, cut-off limits for diagnosis or considering treatment of subclinical thyroid dysfunction are a separate issue and should not be confused with the reference ranges. Such cut-offs may be lower than the upper limit of the reference interval (24). There are other studies calculating reference limits obtained in hospitalized patients for catalytic activity of serum enzymes (25) or for other clinical chemistry analytes (26).

Our overall limits for TSH were 0.43-3.93 (90% CI was 0.425-0.435 and 3.887-3.974 for lower and upper limits, respectively); our upper limit for TSH was slightly lower than the transferred and manufacturer’s values. While our lower reference limit agreed well with the transferred values, it was significantly higher than the manufacturer’s limit. There were no differences between the years, showing that the method we used to estimate the reference intervals was robust and not affected by environmental factors during the years; the overall number of participants’ results in the study exceeded 50 000. Additionally, to avoid bad practices in the calculation of reference intervals, such as computing the intervals without visual inspection of the data, stem-and-leaf plots and histograms were used, and they revealed heavily skewed data. Also, to avoid arbitrary truncation of data, another common mistake, an outlier detection program was used until no outlier was left. Although there are some other algorithms for the detection of outliers in reference distributions, they still need reliable statistical evidence before being used (27). The reference interval for free T4 was 11.98-21.33 pmol/L, with a 90% confidence interval of 11.67-12.30 for the lower, and 20.777-21.898 for the upper reference limits, calculated from more than 60 000 results. There was again no statistical difference by year or sex.

The confidence intervals are calculated for lower and upper limits for both analytes. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. Thus, confidence intervals are used to indicate the reliability of an estimate, and it shows how likely the interval is to contain the parameter.

In conclusion, establishing reference values from patients’ results has many advantages, including being the cheapest and easiest way to collect data. Because they are derived from patients with the same conditions, they are likely to match clinical results better. There are also several arguments against using a hospitalized population, including the idea that all the methods are “indirect” and not in accordance with the IFCC recommendations. Although little is known about the participants from whom the values are derived, and “normal” values may vary between hospitals, this is an advantage rather than a disadvantage because by using hospital data, the idea of developing reference values for each laboratory and hospital becomes possible. Finally, the establishment of a more precise and true reference intervals for both analytes would give a better chance for diagnosis or considering treatment of thyroid dysfunction than using manufacturer’s values or transferred intervals.


1. Jones G, Barker A. Reference intervals. Clin Biochem Rev. 2008;29:S93–7. [PMC free article] [PubMed]
2. Solberg HE, Stamm D. International Federation of Clinical Chemistry, Scientific Division: approved recommendation on the theory of reference values. Part 4. Control of analytical variation in the production, transfer and application of reference values. Eur J Clin Chem Clin Biochem. 1991;29:531–5. [PubMed]
3. Solberg HE. PetitClerc C. International Federation of Clinical Chemistry (IFCC), Scientific Committee, Clinical Section, Expert Panel on Theory of Reference Values. Approved recommendation (1988) on the theory of reference values. Part 3. Preparation of individuals and collection of specimens for the production of reference values. J Clin Chem Clin Biochem. 1988;26:593–8. [PubMed]
4. Solberg HE. Using a hospitalized population to establish reference intervals: pros and cons. Clin Chem. 1994;40:2205–6. [PubMed]
5. Kouri T, Kairisto V, Virtanen A, Uusipaikka E, Rajamäki A, Finneman H, et al. Reference intervals developed from data for hospitalized patients: computerized method based on combination of laboratory and diagnostic data. Clin Chem. 1994;40:2209–15. [PubMed]
6. Surks MI, Ortiz E, Daniels GH, Sawin CT, Col NF, Cobin RH, et al. Subclinical thyroid disease: scientific review and guidelines for diagnosis and management. JAMA. 2004;291:228–38. doi: 10.1001/jama.291.2.228. [PubMed] [Cross Ref]
7. Horn PS, Pesce AJ. Reference intervals: an update. Clin Chim Acta. 2003;334:5–23. doi: 10.1016/S0009-8981(03)00133-5. [PubMed] [Cross Ref]
8. Harris EK, Boyd JC. On dividing reference data into subgroups to produce separate reference ranges. Clin Chem. 1990;36:265–70. [PubMed]
9. Friis-Hansen L, Hilsted L. Reference intervals for thyreotropin and thyroid hormones for healthy adults based on the NOBIDA material and determined using a Modular E170. Clin Chem Lab Med. 2008;46:1305–12. doi: 10.1515/CCLM.2008.258. [PubMed] [Cross Ref]
10. Dhatt GS, Griffin G, Agarwal MM. Thyroid hormone reference intervals in an ambulatory Arab population on the Abbott Architect i2000 immunoassay analyzer. Clin Chim Acta. 2006;364:226–9. doi: 10.1016/j.cccn.2005.07.003. [PubMed] [Cross Ref]
11. Gonzalez-Sagrado M, Martin-Gil FJ. Population-specific reference values for thyroid hormones on the Abbott ARCHITECT i2000 analyzer. Clin Chem Lab Med. 2004;42:540–2. doi: 10.1515/CCLM.2004.091. [PubMed] [Cross Ref]
12. Hubl W, Schmieder J, Gladrow E, Demant T. Reference intervals for thyroid hormones on the architect analyser. Clin Chem Lab Med. 2002;40:165–6. doi: 10.1515/CCLM.2002.028. [PubMed] [Cross Ref]
13. Taimela E, Kairisto V, Koskinen P, Leino A, Irjala K. Reference intervals for serum thyrotropin, free thyroxine and free triiodothyronine in healthy adults in Finland, measured by an immunoautomate based on time-resolved fluorescence (AutoDELFIA). Eur J Clin Chem Clin Biochem. 1997;35:889–90. [PubMed]
14. Demers LM, Spencer CA, editors. Laboratory support for diagnosis and monitoring of thyroid disease. Laboratory Medicine Practice Guidelines. Washington (DC): National Academy of Clinical Biochemistry; 2002.
15. Zophel K, Wunderlich G, Kotzerke J. Should we really determine a reference population for the definition of thyroid-stimulating hormone reference interval? Clin Chem. 2006;52:329–30. doi: 10.1373/clinchem.2005.060111. [PubMed] [Cross Ref]
16. Oosterhuis WP, Modderman TA, Pronk C. Reference values: Bhattacharya or the method proposed by the IFCC? Ann Clin Biochem. 1990;27:359–65. [PubMed]
17. Oosterhuis WP, Modderman TA, Dinkelaar RB, Zwinderman AH, van der Helm HJ. Bhattacharya: a new application for quality control. Ann Clin Biochem. 1991;28:386–92. [PubMed]
18. Ferre-Masferrer M, Fuentes-Arderiu X, Puchal-Ane R. Indirect reference limits estimated from patients' results by three mathematical procedures. Clin Chim Acta. 1999;279:97–105. doi: 10.1016/S0009-8981(98)00164-8. [PubMed] [Cross Ref]
19. Boucai L, Surks MI. Reference limits of serum TSH and free T4 are significantly influenced by race and age in an urban outpatient medical practice. Clin Endocrinol (Oxf) 2009;70:788–93. doi: 10.1111/j.1365-2265.2008.03390.x. [PubMed] [Cross Ref]
20. Dhatt GS, Jayasundaram R, Wareth LA, Nagelkerke N, Jayasundaram K, Darwish EA, et al. Thyrotrophin and free thyroxine trimester-specific reference intervals in a mixed ethnic pregnant population in the United Arab Emirates. Clin Chim Acta. 2006;370:147–51. doi: 10.1016/j.cca.2006.02.008. [PubMed] [Cross Ref]
21. Goichot B, Sapin R, Schlienger JJ. Subclinical hyperthyroidism: considerations in defining the lower limit of the thyrotropin reference interval. Clin Chem. 2009;55:420–4. doi: 10.1373/clinchem.2008.110627. [PubMed] [Cross Ref]
22. Wartofsky L, Dickey RA. The evidence for a narrower thyrotropin reference range is compelling. J Clin Endocrinol Metab. 2005;90:5483–8. doi: 10.1210/jc.2005-0455. [PubMed] [Cross Ref]
23. Brabant G, Beck-Peccoz P, Jarzab B, Laurberg P, Orgiazzi J, Szabolcs I, et al. Is there a need to redefine the upper normal limit of TSH? Eur J Endocrinol. 2006;154:633–7. doi: 10.1530/eje.1.02136. [PubMed] [Cross Ref]
24. Waise A, Price HC. The upper limit of the reference range for thyroid-stimulating hormone should not be confused with a cut-off to define subclinical hypothyroidism. Ann Clin Biochem. 2009;46:93–8. doi: 10.1258/acb.2008.008113. [PubMed] [Cross Ref]
25. Schumann G, Klauke R. New IFCC reference procedures for the determination of catalytic activity concentrations of five enzymes in serum: preliminary upper reference limits obtained in hospitalized subjects. Clin Chim Acta. 2003;327:69–79. doi: 10.1016/S0009-8981(02)00341-8. [PubMed] [Cross Ref]
26. Ilcol YO, Aslan D. Use of total patient data for indirect estimation of reference intervals for 40 clinical chemical analytes in Turkey. Clin Chem Lab Med. 2006;44:867–76. doi: 10.1515/CCLM.2006.139. [PubMed] [Cross Ref]
27. Solberg HE, Lahti A. Detection of outliers in reference distributions: performance of Horn's algorithm. Clin Chem. 2005;51:2326–32. doi: 10.1373/clinchem.2005.058339. [PubMed] [Cross Ref]

Articles from Croatian Medical Journal are provided here courtesy of Medicinska Naklada