|Home | About | Journals | Submit | Contact Us | Français|
The theory of reference values was developed more than 30 years ago, but its application in most clinical laboratories is still incomplete today. This is for several reasons, the most relevant ones being the lack of standardisation of the analytical methods, resulting in method-dependent values, and the difficulty in recruiting the proper number of reference subjects for establishment of reference intervals. With the recent progress in method standardisation the first problem is reducing while the second can be addressed optimally via multicentre collaborative studies that aim to establish common reference intervals. To be effective this approach requires the following prerequisites: 1) the existence of a reference measurement system for the analyte; 2) field methods producing results traceable to the reference system; and 3) a carefully planned multicentre reference interval study. Such a procedure will produce results traceable to the reference measurement system for a large number of reference subjects, under controlled pre-analytical conditions. It will also enable a better understanding of the various sources of population variability, if there is the need for partitioning of a reference interval or if there are any limitations to adopting the established reference intervals on a national or global scale. Once reference intervals are determined, clinical laboratories can adopt a common reference interval provided: 1) the population that the laboratory services is similar to the one studied; 2) methods producing traceable results are used; and 3) analytical quality is within defined targets of precision and bias. Moreover, some validation of the interval using a small sample of reference individuals from the laboratory’s population is advisable.
In Schneider’s 1960s paper entitled “Some thoughts on normal, or standard, values in clinical medicine”, he states: “… practical medicine is basically founded on comparison. If medicine is to be scientific, we must not only understand the structural, functional and chemical relations operating in individuals, but we must also understand the basis of our comparisons”.1
Today, almost 50 years later, in the age of evidence-based medicine, and in contrast with the enormous developments in the field of medicine, a sound basis for these comparisons is often lacking in the clinical laboratory. Nevertheless, according to Horn and Pesce: “the reference interval is the most widely used medical decision-making tool”, even if its practical usefulness is lower than its theoretical power.2 This is due to the fact that obtaining a “good” reference interval is a very demanding activity, in terms of time, money and knowledge.
But what is meant by a “good” reference interval? It is an interval that, when applied to the population serviced by the laboratory, correctly includes most of the subjects with characteristics similar to the reference group and excludes the others. Usually we consider “health-related” reference intervals to mean that the subjects with values within the interval have a lower probability of being affected by a specific disease, while those outside the interval have a higher statistical probability of having the disease or, at least, that the observed value is not normal for a healthy person. The percentage of unhealthy people included in the reference interval or, vice versa, the percentage of healthy subjects outside the interval, defines the “goodness” of the interval.
The factors responsible for this misclassification were already recognised by Schneider who identified the three contributing causes namely, intraindividual, interindividual and analytical variability.1 Intra- and interindividual variability are inextricably bound. Nevertheless, the relative sizes of these two sources of variation can substantially affect the usefulness of the reference interval as a guide to the status of an individual. Eugene Harris in 1974 demonstrated that only when intraindividual variability (CVI) is greater than interindividual variability (CVG), i.e. CVI/CVG>1.4, does the distribution of values from any individual cover much of the reference interval.3 But this is an uncommon situation. In contrast, when CVI/CVG is <0.6, which occurs quite commonly, the dispersion of values for an individual will span only a part of the population-based reference interval. In this case the reference interval will not be sensitive to changes for that individual and, on average, for any individual.4
One way to improve the usefulness of reference intervals is to reduce the interindividual variability by partitioning the intervals as much as possible. Stratification by age and gender is the minimum pre-requisite, but other ways include by race, ethnic group, body mass index or nutritional habits (e.g. vegetarians). Herein is the problem of the selection of an appropriate number of reference subjects, properly screened to exclude relevant pathologies, and subdivided by gender, age, race, ethnicity and lifestyle. Diagnostic manufacturers routinely perform reference interval studies for hundreds of analytes using their different assays and platforms and produce method-specific results that may not be comparable across space and time, or equivalent between methods depending upon calibration traceability. Thus, there is difficulty in providing evidence for the existence of authentic and clinically significant differences among races or ethnic groups, even for the most common analytes.
As a shortcut to circumvent these problems several authors have proposed the so called “indirect methods” to define reference intervals.5–16 These methods are based on statistical manipulation of existing data to select the “healthy” group from the entire population. There are two main reasons against this approach. Firstly, it is not in keeping with the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) requirements for reference intervals which stipulate that the characteristics of the reference subjects be clearly defined. With the data mining approach very little is known about the subjects and the method relies upon statistical methods to exclude the unhealthy group. Only if you can combine relevant medical history information and analytical data can you fulfil the requirements,10,11 but this is an “a posteriori” approach, not an indirect one. Secondly, there is insufficient control of the pre-analytical and analytical conditions. Although the indirect approach can be very useful for local situations or “dif cult” groups of subjects like neonates and children, or as a means to confirm the “goodness” of the selected reference interval, it cannot be used to set common reference intervals.
Given the enormity of obtaining sufficient numbers of reference subjects, the requirement that each clinical laboratory produces its own reference intervals is a practical impossibility. The effort needed for one single analyte let alone multiplying this hundreds of times for all analytes is beyond the routine laboratory’s capacity, and to repeat this process continuously to keep the intervals updated with assay and platform changes appears completely unrealistic. The difficulties with the current approach to reference intervals are clearly described in the commentary by Jones et al.17
To address these difficulties a concerted approach is required by the interested parties who include:
Defining common physiological (health-related) reference intervals through a multicentre collaborative experiment can be an efficient way to fulfil the expectations of all stakeholders. This approach overcomes the problem of recruiting large numbers of reference individuals and, at the same time, enables investigation of the influences of race, environmental conditions and life style habits on reference intervals. Sharing the work among multiple sites reduces the cost at each site and the time needed to complete the task.
The main obstacle preventing the adoption of common reference intervals, besides the real existence of local population differences, was (and still is) the lack of comparable values between routine methods due to insufficient standardisation. This is true in particular for the enzymes but also for many other common analytes such as creatinine. To improve the analytical quality it is necessary to establish a sound accuracy base and, to obtain this, a reference measurement system is required.20 Up to now, complete reference measurement systems have been established for only a minority of analytes, but much work is in progress and further standardisation efforts will allow the spread of common reference intervals.21
The concept of adopting common reference intervals is simple. If the analytical method is the same or yields identical results because it is correctly standardised, and the population has the same characteristics or it is known that a specific analyte is not significantly influenced by ethnicity or the environment, then common reference intervals can be used.
Unfortunately the practical application of this simple concept is not as easy as it would appear. A number of Prerequisites must be in place before adopting common reference intervals (Table).
Assuming that a reference measurement system exists, the most demanding task is the definition of an adequate set of reference values. This should include subjects from different races and ethnic groups and from various environments in order to document if clinically significant differences do really exist. If they do, then it may be worth while to do partitioning, or else not recommend the application of a common reference interval unless it is wider, which may reduce its utility. The best way to gain this information is to develop a multicentre study involving several clinical laboratories. This approach has been done by a Spanish group and further developed by the Nordic countries.22–31
A multicentre study for the establishment of common reference intervals must be carefully organised to produce results that can be adopted by any clinical laboratory that operates under similar pre-analytical and analytical conditions. It requires:
To be able to apply common reference intervals a clinical laboratory has to verify similarity of the pre-analytical conditions, the analytical method used and its performance, and the characteristics of the population to be serviced.
The reference intervals can be used only if the same pre-analytical conditions are applied (e.g. use of serum, fasting subjects), or if it is possible to demonstrate that the modification has introduced no effects, e.g. analyte levels are not modified by meals, lithium heparin and serum equivalence.
The method in use must produce results traceable to the reference measurement system for the specified analyte. For European countries, if the reagent is ‘CE’ marked and it is used strictly according to manufacturer’s specifications this should be the rule. The European IVD Directive puts traceability as an essential requisite but as demonstrated by the measurement of some enzyme activities, traceability does not necessarily ensure result comparability.6,43 The analytical specificity of the method is a key point, especially for the measurement of enzymes. If the specificity of the measurand is not the same as for the reference method, traceability cannot be obtained, e.g. measurement of transaminases in the absence of pyridoxal phosphate or the use of different substrates for α-amylase.
The analytical quality of the method in use should be controlled in order to keep the imprecision and the bias within a stated limit. Targets for maximal imprecision can be derived from the criteria related to biological variability.44 According to Gowans et al., the maximum acceptable analytical bias, for use of the same reference interval, is defined as <0.25 (CVI2 +CVG2)1/2 (population-based variability) which is equivalent to the con dence interval of the reference limit for the sample size of 120 individuals.45 Gradation of the specifications to 0.125, 0.25 and 0.375 times the population-based variability allows for optimum, desirable and minimum quality, respectively.46 A list of estimated biological within and between-subject variations and analytical quality specifications can be found at Westgard’s website.47 The presence of a bias and its magnitude can be verified by method comparison on fresh patient’s samples or from External Quality Assurance Scheme (EQAS) results or from interlaboratory internal quality control programs.
These can vary according to the analyte. If race or life habits are known not to significantly influence the reference intervals, it is sufficient to verify the pre-analytical and analytical aspects (e.g. electrolytes).
If race, ethnic groups or life style are known to influence reference intervals or if no information is available on these, it is advisable that the clinical laboratory validates the intervals on a small sample group of its own population. This validation can be done according to the CLSI Document C28-A2, paragraph 8.2.33 The Document suggests examining 20 reference individuals from a laboratory’s own subject population. They should represent a healthy population and satisfy the selection criteria. After discarding any apparent outlier, if no more than 2 of the 20 tested values fall outside the interval, this can be adopted. If three or more fall outside these limits, the experiment should be repeated with another 20 subjects. If this second time no more than 2 of the 20 tested values fall outside the interval, adopt the interval; if again 3 or more fall outside it means that probably the population differs and a specific reference interval is needed (provided that all the pre-analytical and analytical aspects are correct). This binomial test works well when the reference values have a Gaussian-like distribution, but is very insensitive when the distribution is highly skewed (J. Boyd, personal communication). More powerful statistical tests than the binomial test described above can be carried out, e.g. the Kolmogorov-Smirnov test, which compares the full dataset from tested reference individuals with the 20 reference specimens for a given laboratory. Finally a further approach is to calculate the reference intervals from the 20 subjects using a robust algorithm like the one proposed by Horn et al. to check if the obtained limits are within the confidence limits of the common reference interval.42
The processes described are not easy, fast or straightforward. The reality is that development of the reference measurement systems and compliance by manufacturers to calibration traceability happen slowly. The time, effort and money required to establish reference intervals are large and clinical laboratories are disinclined to modify reference intervals as this is a demanding task also requiring education of clinicians and patients. Large multicentre studies are needed for the definition of common reference intervals and are probably the only way to make real progress in this field and bridge the large gap now existing between a very nice theory (IFCC and CLSI documents) and a very poor practice.
The difficulties are related to the need to verify traceability by the distribution of frozen sera and to co-ordinate several centres to perform thousands of tests and enrol hundreds or thousands of individuals, all which have considerable costs. One objection to this is whether such an effort is needed for something that is outdated and will be surpassed by decision limits. The answer is if a common reference interval is established following stringent scientific criteria (both biological and analytical), in theory this is done forever and will not need to be repeated over and over by an infinite number of laboratories around the world. To define a decision limit properly requires that a reference limit is correctly calculated. In fact, excluding the peculiar situation of total cholesterol and lipoproteins, the definition of a decision limit implies the comparison between two populations, namely, the healthy reference population and the unhealthy or ill one (or more depending on the test and on the type of decision limit to be set). Thus reference intervals are only the beginning and once developed work has to be done with patients in various pathological conditions to be able to define cut-offs and decision limits according to the diagnostic sensitivity and the specificity of the test.
The challenge for the future is the development of intraindividual reference values, as already foreseen by Harris.3 This is a concrete possibility today with the progress in information technology and the improvements in analytical standardisation.48,49 The computation of individualised reference limits should eventually fulfil Schneider’s plea for more reliable means of comparisons with which to judge the health of a patient.
Competing Interests: None declared.