|Home | About | Journals | Submit | Contact Us | Français|
As an outcome of the 2010 Asian Pacific Conference for Chromatography and Mass Spectrometry in Hong Kong, a collaborative working group was formed to promote the harmonisation of mass spectrometry methods. The Mass Spectrometry Harmonisation Working Group resides under the combined auspices of the Asia-Pacific Federation for Clinical Biochemistry and Laboratory Medicine (APFCB) and the Australasian Association of Clinical Biochemists (AACB). A decision was made to initially focus attention on serum steroids due to the common interest of members in this area; with the first steroid to assess being testosterone.
In principle, full standardisation with traceability should be achievable for all steroids as they are small compounds with defined molecular weight and structure. In order to achieve this we need certified reference materials, reference methods, reference laboratories, reference intervals and external quality assurance programs; each being an important pillar in the process. When all the pillars are present, such as for serum testosterone, it is feasible to fully standardise the liquid chromatography – tandem mass spectrometry (LC-MS/MS) methods. In a collaborative process with interested stakeholders, we commenced on a pathway to provide ongoing assessment and seek opportunities for improvement in the LC-MS/MS methods for serum steroids. Here we discuss the outcomes to date and major challenges related to the accurate measurement of serum steroids with a focus on serum testosterone.
The history of mass spectrometry in medical testing now spans over fifty years, with the majority of this period seeing its confinement to specialist laboratories.1 The breakthrough of coupling the liquid phase to the mass spectrometer provided the stimulus for this methodology to expand its reach into routine laboratories.2 The broad implementation of LC-MS/MS in clinical diagnostic laboratories now encompasses a range of techniques from expanded new born screening programs, to toxicology screening, therapeutic drug monitoring, and quantification of biogenic amines, vitamins and hormones; all of which relate to small molecules with defined molecular weights and structures.3 In addition, more recently methods for larger molecular weight compounds have emerged in translational research laboratories; and are likely to move broadly into the clinical diagnostic area in the near future.4 Irrespective of the analytical group, accurate quantification is essential to ensure appropriate clinical interpretation.
Initially, LC-MS/MS entered the clinical arena with resounding accolades of being the new “gold standard” as the problems of immunoassay sensitivity and accuracy would be solved. Many LC-MS/MS techniques in the early period of the 2000s were simply “dilute and shoot” and sample cleanup / chromatographic separation was minimal. In memories of the second AACB Chromatography Mass Spectrometry meeting (themed “Coming together to separate”) held in Sydney in 2007 these ideas were still being discussed.5 However also around the mid 2000s the literature was consistently reporting limitations in methods.6,7 This led to questions of accuracy with differences seen between the home brew methods. This disillusionment culminated in many ways with the now infamous retraction of LC-MS/MS vitamin D results by a major laboratory in 2008.8 Hence by the time of the next regional Chromatography Mass Spectrometry conference in Hong Kong in 2010 there was significant discussion about how we could ensure the reliability of our LC-MS/MS assays as they expanded into the repertoire of methods offered by an increasing number of diagnostic laboratories.9
As an outcome of the 2010 conference in Hong Kong a collaborative working group was formed; the Mass Spectrometry Harmonisation Working Group (MSHWG).9 The group resides under the combined auspices of the APFCB and the AACB.10,11 The goal of the MSHWG is to promote harmonisation, and where practicable, standardisation of mass spectrometry methods through a consensus approach with laboratories; principally in the Asia and Pacific area. A decision was made to initially focus attention on serum steroids due to the common interest of members in this area.12
In principle, full standardisation with traceability should be achievable for all steroids as they are small defined molecular weight compounds. In order to achieve this, certified reference materials, reference methods, reference laboratories, reference intervals and external quality assurance programs are required; each being an important pillar in the process.13,14 The Joint Committee for Traceability in Laboratory Medicine (JCTLM) was established to support this process worldwide through the development of a database to recognise primary reference materials, methods and laboratories. Currently some (e.g. serum cortisol, estradiol, progesterone and testosterone) but not all steroids (e.g. serum 17-hydroxyprogesterone, androstenedione, cortisone and dihydrotestosterone) measured routinely by mass spectrometry have complete listings in the JCTLM database; see Table 1 for detail.15 When all the JCTLM pillars are present, such as for serum testosterone, it is feasible to fully standardise our LC-MS/MS methods.
The terms ‘standardisation’ and ‘harmonisation’ when related to laboratory medicine define two distinct, albeit closely linked concepts. Yet both are based on traceability principles described in the International Organization for Standardization (ISO) standard 17511,16 in which the term ‘standardisation’ is used when results for a measurand are equivalent, and the results are traceable to the International System of Units (SI) through a high-order primary reference material and/or a reference measurement procedure (RMP).17 By contrast, the term ‘harmonisation’ is generally used when results are equivalent, being either traceable to a reference material or based on a consensus approach, namely in agreement with the mean values obtained with different methods, but neither a suitable high-order primary reference material nor a RMP is available. However, the term “harmonisation” can also be used more broadly to relate to the overall testing process, encompassing the pre-analytical phase, methods of analysis, calibration materials, reporting units and reference intervals. Harmonisation in this broader sense can be applied to the critical aspects that should be aligned to promote agreement. The development of implementation guidelines and best practice statements form part of this process. In relation to steroids, we use the term harmonisation in this broader context.
In a collaborative process with interested stakeholders, we commenced on a pathway to provide ongoing assessment and seek opportunities for improvement in the LC-MS/MS methods for serum steroids. Here we discuss the outcomes to date and major challenges related to the accurate measurement of serum steroids with a focus on serum testosterone.
Harmonisation of the total testing process, and where practicable standardisation of the analytical method with established trueness, is fundamental to the delivery of quality pathology. Whilst this goal is not new, advances in information technology, the move towards the electronic health record and the recognition of patients as part of the global village, have led to the appreciation that discordance in results between laboratories and between methods is no longer acceptable practice. There are many different aspects to harmonisation which encompass the total testing process. Often the first step in the recognition of discordance between laboratory results is through assessment against an External Quality Assurance (EQA) scheme. As such, EQA is recognised as the fifth pillar and in many aspects is central to this process.18 In this section we discuss some of the strategies used to gain a further understanding of method performance and improve agreement.
Eight laboratories, including four laboratories from Australia, two laboratories in Hong Kong, one laboratory from Austria and the National Measurement Institute of Australia (NMIA) participated in this process. In addition, the group included scientific staff from the RCPAQAP.
In order to have ongoing evaluation of performance, participation in an EQA program provides a basis for objective peer comparison. Hence, a mandatory requirement to participate in this activity is to enrol and submit results to a common EQA program. In Australasia, the Royal College of Pathologists of Australasia Quality Assurance Programs (RCPAQAP) Endocrine Program is the obvious common denominator (hence the mutually agreed program) for these and future initiatives for the harmonisation of serum steroids as it provides an ongoing mechanism to objectively assess analytical performance.19 Participation in this process was (and still is) by open invitation based on the analytical expertise and interest in overall method improvement. Of note there were a minority of laboratories (Australian research based) who did not wish to participate in this common peer review process (i.e. submit results to the RCPAQAP) and hence were deemed ineligible for inclusion.
The RCPAQAP Endocrine material consists of six linearly related levels; each level is analysed twice in a cycle. There are two cycles per annum. This material is lyophilised and during the manufacturing process the base material may be charcoal stripped and supplemented with analytes of interest. The program uses analytical performance goals to assess the quality of results. These goals, called Allowable Limits of Performance (ALP), are quality standards which allow participating laboratories to assess their performance and respond accordingly.
For analytes in the Endocrine program, these ALP goals are set using the internationally agreed hierarchy and biological variability is the highest level applicable.20,21 These goals can be applied for monitoring and or diagnosis of a patient and in the latter case both imprecision and bias are included in the calculation. When monitoring is the aim, as for testosterone, the calculation is based on imprecision. Professor Callum Fraser’s fitness for purpose definitions are then applied to fine tune the biological variation:
Then the ALP is calculated as two times the CVa set for the program. The level set for the program of minimum, desirable or optimal is based on at least 80% of participants being able achieve the performance. The ALP for testosterone is based on the intra-individual biological variation data of 9.25% obtained from the Ricos database23 and interpreted against the minimum CVa is therefore 6.9375% (i.e. 9.25% × 0.75 = 6.9375%). This is then multiplied by two for the 95% range (uncertainty of measurement) and finally rounded to ±15%. In practice the RCPAQAP ALP for testosterone is applied as +/−0.4 nmol/L up to 2.7 nmol/L, then ±15%.13
Target values are considered preferable to the use of medians, particularly when there are significant differences between method or instrument groups; such as mass spectrometry compared to immunoassay methods. Target values are ideally set by higher order methods with established traceability and trueness. Practically, target values of the RCPAQAP material are assigned for at least levels 2, 4 and 6 and the other values are then determined by linear regression by one or more reference laboratories. This process can vary slightly depending on whether level 1 is part of the linear range and the cost involved in the setting process and the data returned to the RCPAQAP. In 2012 as an example, target values for serum testosterone material were assigned by WEQAS for levels 2, 4 and 6 followed by linear regression to obtain the targets for levels 1, 3 and 5.
The RCPAQAP collects method details from participants based on method principle, sample preparation technique (where relevant), instrument brand and calibrator source as part of the enrolment process. Previously, members of this group have successfully developed and utilised participant questionnaires to provide further insight into the analytical methods used by RCPAQAP participants.24–26 Hence in 2013, through the RCPAQAP, a detailed questionnaire was developed and sent to participating laboratories to look more closely at the serum steroid methods with a focus on testosterone. The basis for the development of this questionnaire was information obtained from an initial questionnaire generated from all group participants in 2011 (unpublished). It addressed the pre-analytical, analytical and post-analytical components of each laboratory’s serum testosterone method and was designed to understand in more detail the similarities and differences that may exist between LC-MS/MS serum testosterone methods.
At the top of the traceability chain are primary certified reference materials (CRM). These materials are often made by metrology institutes and have stated uncertainty of measurement. This provides the anchor for trueness provided commutability is established. In Australia, the NMIA makes a number of steroid reference materials; as do other metrology institutes around the world. This includes a variety of CRM for steroids including testosterone (NMIA M914B).27
For the investigation of serum testosterone harmonisation, NMIA developed two reference methods as anchors for the studies conducted; an Ultra-High Performance LC-MS/MS method and a gas chromatography high resolution mass spectrometry (GC-HRMS) method. This was a significant undertaking requiring many months to fully validate the approach. These methods are listed in the BIPM metrology data base28 and once published will be submitted to the JCTLM database. Briefly, the methods consisted of the following (details provided by Dr Veronica Vamathevan from NMIA):
The NMIA M914B material was obtained from the Chemical Reference Materials Facility at NMIA and was certified with a purity of 99.7% ± 1.7%. Stock solutions of testosterone were prepared using this certified pure substance reference material. Working standard solutions of testosterone were prepared at concentrations of approximately 0.1, 0.4, 1.0, 1.8, 4.8, 16, 35 and 55 ng/g in methanol. Deuterated (D3−) testosterone (NMIA D644) was also obtained from NMIA. Internal standard solutions of deuterated testosterone were prepared gravimetrically in 20% methanol/water at similar concentrations to the native testosterone solutions. Calibration and internal standard solutions were stored at −20°C in the dark.
Sample and calibration blends were prepared as follows for isotope dilution analysis.
Samples were prepared as described above. Testosterone in the sample and calibration blends was separated from matrices using two-dimensional Ultra-High Performance LC-MS/MS (Thermo TSQ Vantage/TLX1). A Waters CSH Phenyl-Hexyl column with an acetonitrile/formic acid (0.02%, aqueous) mobile phase was employed in the first dimension and coupled with a Waters BEH Shield column and a methanol/formic acid (0.02% formic, aqueous) mobile phase in the second dimension. A narrow window containing testosterone was transferred from the first dimension to the second dimension by means of a dual valve switching system for additional separation of compounds in the sample extract by chromatography. The mass spectrometer was operated in the positive ion mode with electrospray ionisation and multiple reaction monitoring (MRM) of fragment ions. The MRM transitions monitored were 289.2 > 109.1 and 289.2 > 97.1m/z for native testosterone and 292.2 > 109.1 and 292.2 > 100.2 m/z for deuterated testosterone.
A second reference measurement procedure was developed for confirmatory analysis. Serum samples were prepared and extracted as described above and subjected to preparative HPLC clean-up. A C18 Alltima column (4 × 250 mm, 5 um, Grace) was used with a mobile phase of acetonitrile/water. The elution of testosterone during chromatography was monitored using a UV-Visible detector at a wavelength of 245 nm. A fraction containing testosterone was collected in a glass tube and evaporated to dryness at 50°C under nitrogen. The dried extracts were derivatised with trimethylsilyl iodosilane (TMIS) reagent and then analysed by GC-HRMS analysis (Finnigan MAT95). Samples were chromatographed on an Agilent VF-17MS column (0.25 mm × 30 m, 0.25 um film thickness). The GC-HRMS was operated in the Multiple Ion Detection (MID) mode at a resolution of approximately 3000.
Many LC-MS/MS laboratories currently gravimetrically prepare their own calibrators and purchase primary materials to check their prepared standards.19 On a small scale this works and provides an excellent foundation for trueness, but as serum steroid MS methods become more common, it would be more practical (and probably more robust) to use a secondary commercial calibrator.
There are advantages as a group to use a common calibrator for harmonisation and peer support. Certainly laboratories have been doing this in an ad hoc fashion for many years as it also provides leverage for trouble shooting issues. Therefore, as a group we decided to investigate the first secondary calibrator for MS based serum steroid methods commercially available (i.e. that could be purchased independently, as distinct from being part of a full mass spectrometry method kit) to see if it was a potential candidate as a secondary calibrator.
Following an extensive search and subsequent discussions, an established commercial calibrator was selected to trial as a “Common Calibrator” (CC) for this project. This calibrator is supplied as a seven level set (containing 17 steroids), which is in line with the published 2011 recommendations from Honour.29 The AbsoluteIDQ® Steroid Calibrators (lot number 388421, BIOCRATES Life Sciences AG, Innsbruck, Austria) were supplied as lyophilised materials (calibrator 1–7, separate calibrator matrix for reconstitution). The Steroid Calibrator set (labelled as “Research Use Only”) is designed for LC-MS/MS based analysis of steroid hormones. In addition to testosterone, the CC set contains the following steroid hormones: aldosterone, androstenedione, androsterone, corticosterone, cortisol, cortisone, 11-deoxycorticosterone, 11-deoxycortisol, dehydroepiandrosterone (DHEA), dehydroepiandrosterone sulfate (DHEAS), dihydrotestosterone (DHT), 17β-estradiol (E2), estrone (E1), etiocholanolone, 17α-hydroxyprogesterone (17OHP), and progesterone.30
Values for these calibrators are routinely assigned by weighing in pure steroids purchased from Sigma-Aldrich. The density of the Biocrates calibrator is 1.006 kg/m3. These calibrators have a proficiency test certificate awarded on a quarterly-basis (based on their use with the Biocrates kit steroid method) for the accredited proficiency test program HM (hormone group 1, testosterone, aldosterone, cortisol, 17β-estradiol, progesterone, DHEAS, and 17α-hydroxyprogesterone) of the German Reference Institute for Bioanalytic (RfB -Referenzinstitut für Bioanalytik) under the umbrella of the Deutsche Vereinte Gesellschaft für klinische Chemie und Laboratoriumsmedizin (DGKL).31 For testosterone the average relative bias through 11 proficiency tests, each with two test samples, was −1.5%. It means that the average accuracy of 22 reported values against target values was 98.5%. The standard deviation of these measurements accuracies was 4.7%.
For the Common Calibrator (CC) study, indicative values for testosterone were assigned by NMIA using the methods described above.
Two common calibrator sets, one EQA sample set (2012 RCPAQAP material) and two sets of de-identified, “Unknown”, fresh frozen human serum samples (one male and one female) were sent to each laboratory. The two Unknown serum samples were collected from two project team members (a male 36 years and a female 24 years) and stored in 1 ml frozen aliquots. Samples were analysed in duplicate by each laboratory’s (n=8) routine LC-MS/MS method on two separate occasions. As there was only one set of EQA material distributed these samples were frozen after the first analysis and then thawed for the second analytical run. An outline of the protocol is provided in Figure 1.
For the routine diagnostic laboratories, the RCPAQAP and CC material were reconstituted as per the manufacturer’s instructions on the day of analysis. The CC was supplied with a separate lyophilised matrix vial and reconstituted with 10 mL high-purity water (Milli-Q water). This matrix was then used to reconstitute the seven levels of the lyophilised CC; each with 1.2 mL of matrix solution. The “Unknown” fresh frozen serum samples were analysed as supplied by each laboratory.
Being one of the eight laboratories, target or indicative values for the RCPAQAP, CC and Unknown material were determined by NMIA’s LC-MS/MS two dimensional method. As proof of concept (i.e. when sufficient sample volume was available to achieve sensitivity) the target values were cross-checked against the NMIA’s GC-HRMS serum testosterone method. GC-MS has the advantage of not suffering from the potential matrix effects from phospholipids that can confound LC-MS/MS serum analysis.32 All target values provided by NMIA for this project were by weight (denominator in grams) and then converted to volume (denominator in litres) for the clinical laboratory comparison. To ensure alignment with other harmonisation initiatives NIST SRM971, which is a serum matrix matched commutable material, was used as a QC material.
For NMIA a more rigorous approach was applied for the reconstitution of the material as follows:
The process of value assignment by metrology institutes is an extensive, exact and time consuming task. Target values are assigned in mass units by metrology institutes. Density is then approximated for the material in order to calculate the concentration in volume units. The volume used by NMIA for determination depends on the expected approximate concentration of the measurand. In this process the metrology institute aims to analyse a consistent mass and hence will vary the sample volume used in the process of value assignment. In many instances four replicates are performed for each measurement. This is a higher order approach to that used by the clinical diagnostic laboratory.
The process of reference and indicative values for the RCPAQAP, CC and Unknown samples serves as an example of the process used by metrology (i.e. NMIA).
Target values for testosterone in the RCPAQAP samples were determined from multiple mass fraction determinations made on four different bottles of each sample. At least two sub-samples were analysed from each bottle and analysed in independent experiments performed on different days. As the results of LCMS/MS and GC-HRMS analysis were in excellent agreement, the testosterone mass fractions determined using both reference measurement procedures were used in the calculation of reference values. Measurement uncertainties in the reference values were estimated as described in ISO/IEC Guide 98-3.33 The associated absolute and relative expanded uncertainties in the reference values were determined at a level of confidence of 95%.
Information (i.e. indicative) values were determined for the mass fractions of testosterone in the seven CC solutions and the Unknown fresh human sera samples. Due to their limited sample volume, only two mass fraction determinations were possible on these samples and thus the target values provided for these samples are information values only. The two mass fraction determinations were performed in two separate experiments conducted on different days. The indicative values are the average mass fractions of the two determinations made on each sample. For CC levels 1 and 2, only one mass fraction determination was possible. The two vials of CC levels 1 and 2 supplied were combined following reconstitution to enable approximately 2.4 g of sample to be used for analysis. This was necessary due to the low testosterone concentrations present in these samples. These indicative values did not have their uncertainty determined.
Comparison of method details: The questionnaire was informative in nature. Results were collated and summarised using a Microsoft Excel spreadsheet. Interpretation of the data was qualitative.
Comparison of methods using the common calibrator: The values determined by NMIA were utilised as the assigned value of the common calibrator set, RCPAQAP material and Unknown samples.
To assess if the adjustment of results with the CC was statistically significant for an individual laboratory an unpaired two tailed t-test was used to compare the results returned for each sample (RCPAQAP levels and Unknown samples) pre and post adjustment with the CC. Consistent standard deviation was not assumed and the statistical significance was determined using the Holm-Sidak method, with alpha=5.000%.
To determine if there was a statistically significant change for the group as a whole pre and post adjustment with the CC, results were compared for the group of laboratories by 2 way ANOVA with p<0.05 indicating a statistical significant difference for the group. Bland Altman Difference Plots were also developed to visually characterise the percentage difference across all levels compared to the target value and compared to desirable total allowable error (see below).
Microsoft Excel and GraphPad Prism version 6 software were used for assessment of the data.34
Biological variation and fitness for purpose: To assess the limits of performance in the CC trial, biological variation data and the desirable specification for fitness for purpose compared with the NMIA assigned values were used as the criteria for acceptance of results.23
The desirable imprecision can be determined based on:
where CVa is the analytical imprecision and CVi is the intra-individual biological variation.
The desirable bias (Ba) can be determined based on:
where CVg is the between subject biological variation. Hence the total allowable error (TEa) will be the combination of the CVa and the Ba calculated from the desirable specifications using the equation
Given the assignment of values was made by NMIA, bias was assumed to be zero for the CC trial; hence imprecision was the major error of measurement considered. With regard to serum testosterone, the Ricos biological variation data base for desirable specifications gives the CVi and CVg as 9.25% and 22.05% respectively. This is used to calculate the CVa as 4.63%, Ba as 5.98%, and TEa as 13.61%. Table 2 provides the current biological variation data available for all steroids.23
Comparison of results over time: In 2010 the RCPAQAP commenced target setting of the serum testosterone material. From 2010 to 2015 the RCPAQAP end of cycle reports were compared for the number of participating laboratories, LCMS/MS peer group median bias and also median imprecision of the participants. Trends in imprecision and bias (from 2010) were used to assess between laboratory performance over time.19
Whilst there are some significant differences between the LCMS/MS methods for the RCPAQAP participants, there are also areas of commonality demonstrated from the questionnaire. In brief, all participants responded to this questionnaire, i.e. a return rate of 100%. Eight laboratories (including NMIA) returned nine sets of results; with NMIA reporting results for two methods (LC-MS/MS and GC-HRMS). Most laboratories prepared samples by liquid-liquid extraction (LLE) and one used solid phase extraction (SPE). Consistent MRMs were seen (289>109 and 289>97) for the testosterone quantifier and qualifier, respectively. The source of the laboratory’s calibrator, number of calibration levels and deuterated sites on the internal standard differed (Table 3).
The NMIA methods and materials were used to assign target values to the RCPAQAP, and provide indicative values for the CC and Unknown samples throughout all studies. Initially the 2010 RCPAQAP Endocrine material’s target values were obtained from another metrology institute (WEQAS). In 2012 the WEQAS target values were compared to the target values obtained by NMIA for the RCPAQAP material; good agreement was demonstrated between NMIA and WEQAS targets (based on Bland-Altman difference plots where the 95% confidence interval included zero); results not provided due to confidentiality. NMIA also assigned targets for the seven levels of the CC material and the values demonstrated good agreement with the gravimetric values supplied by the manufacturer of the CC. (Table 4).
The values determined by NMIA were used as the assigned value for the CC trial. To determine if there was a statistically significant difference in results pre and post adjustment with the CC, results for the group of laboratories were compared by ANOVA and found to be statistically significant (p<0.05). ANOVA was also applied to the recalculation of the human testosterone samples against the CC which did demonstrate a significant change (p<0.05) in the group results for the male serum (NMIA assigned value 16.19 nmol/L) whereas the change for the female serum (NMIA assigned value 0.57 nmol/L) was not statistically significant for the group. The Bland-Altman plots pre and post adjustment with the common calibrator did not appear to tighten performance for the group as a whole, however the group results did improve for the Unknown Female sample in terms of overall percentage difference (Figure 2). Additional detail of the results from the CC pilot is provided in the Appendix for the interested reader. This lack of significant change in the CC pilot is important, as it indicates that the different calibrators employed, how they are prepared and used, all lead to the same results i.e. as performance was not improved by the CC. This is an area of strength for the current laboratory practice and calibrator quality overall.
For the first time in cycle 43 (first half of 2015) the median imprecision for the LC-MS/MS group (median CV of 4.2%) met the desirable imprecision for testosterone; i.e. >50% of LC-MS/MS participants achieved the biological variation imprecision target of 4.63%. This imprecision group median is also the best performing analytical method group in the RCPAQAP; out-performing immunoassay methods. In addition, the median high level bias (based on RCPAQAP level 6, i.e. adult male concentration level) for the LC-MS/MS group has demonstrated at least desirable performance since cycle 35. Regression analysis has also improved for the LC-MS/MS group with the linear regression for cycle 43 in 2015 being y=0.99x +0.314. The visual comparison of the median bias and median imprecision for the LC-MS/MS testosterone method group over time also demonstrates an overall improvement in performance. (Figure 3).
There has been a flurry of LC-MS/MS methods in the peer review published literature over the past decade which attest to the accurate and precise analysis of serum steroids, particularly testosterone.30,35–50 LC-MS/MS technology does offer a number of significant advantages compared to immunoassay methods for many small molecular weight measurands. Some of these advantages include the simultaneous analysis of multiple steroids in the same run as well as improved specificity and sensitivity of the target analytes. A purported disadvantage is the current level of technical expertise required to run and interpret the data generated, however this is likely to present less of a problem in the near future as we move towards improved sample processing and reach agreement, with evidence, on best laboratory practice. The broad implementation of this technique into routine clinical biochemistry laboratories is now at hand and it is timely to ensure the methods are harmonised and where practicable fully standardised to safeguard their optimum clinical use.51,52
In this review we have provided a practical snapshot of the current routine LC-MS/MS serum testosterone methods used by medical testing laboratories in the Asia-Pacific Region. To provide ongoing comparison of method performance it is essential for laboratories to participate in a common EQA scheme; and here we have highlighted a number of benefits with the RCPAQAP Endocrine Program serving as an example. The review of participant methods, through the distribution of a questionnaire, determined that there are areas of commonality but also significant areas of difference. Even so, the group as a whole has seen an improvement in imprecision and bias for the group in the last five years. This demonstrates the practical application of working together to improve harmonisation.
As part of the traceability chain, primary reference methods are required to provide the link to the working methods. Mass spectrometry is generally considered a superior technique compared to immunoassay for steroid hormones and as such used as a reference anchor. LC-MS/MS methods also need this traceability anchor for steroids. GC-MS / GC-MS/MS / GC-HRMS verification provides a separate methodological platform for this purpose. This approach is ideal for steroids (which can be made volatile) as they often demonstrate improved separation by GC, and do not suffer from the significant problem of ion alteration due to phospholipid interference. GC analysis also has some disadvantages, namely the potential alteration of the target analyte during the derivatisation process, which is necessary for making the steroids “volatile”. When there is agreement between the result generated by LC-MS/MS and GC-MS a solid anchor is provided for method harmonisation. The development of a GC-MS/MS method to complement the LC-MS/MS methods by NMIA demonstrates the value of this approach to support “trueness” of results.
Even with the aid of primary reference materials and methods, the employment of a common secondary calibrator only made a significant improvement in the distribution of the female Unknown human results compared to the indicative values provided by NMIA. The lack of change with the CC (other than the female sample) can be seen as a good thing, in that there is already good quality being delivered; indicating that value assignment of calibrators, commutability and calibrator preparation and handling were done well. This is important as it indicates that the different calibrators in use, and how they are prepared and used, all lead to the same results (i.e. not improved by a common calibrator) and hence is a statement of strength for current laboratory and calibrator quality.
Trials to improve the standardisation of methods through the application of a common calibrator have been reported previously.53–55 The results from our CC pilot were consistent with these earlier studies. The change of bias did vary between laboratories. As expected imprecision was not affected. Hence we postulate that the routine methods themselves (as distinct from the calibrator) may be influencing the imprecision and bias; which are likely to include the areas of difference observed in our participant questionnaire. In particular we speculate that assay imprecision could be affected by: 1) Instrument maintenance, especially of the ion source, and in rare occasions also the cross-talk in the collision cell; 2) phospholipid interference may have contributed to this variation for some laboratories; and 3) the choice of isotope-labelled internal standard.6 Additional information and problems raised from our investigations related to the biological variation data and the practical approach to maintaining ongoing traceability.
To determine if the bias and imprecision in results were acceptable, biological variation data was used to establish clinically significant differences.23
In determining the acceptability of these new “recalculated” values, the acceptable limits were determined against the desirable total allowable error from the Ricos biological variation database.22,23 This data is however potentially problematic for serum testosterone as it is primarily generated from studies conducted on adult males.56–62 There is currently no biological variation data for serum testosterone available for children. There is however one study that provides combined information for adult male (n=13) and female (n=13) plasma testosterone levels analysed by immunoassay which demonstrates the intra-individual and inter-individual CV for plasma testosterone to be 12.6% and 40.8% respectively.63 Hence the use of the biological variation database may not be appropriate to determine acceptability of analytical performance for serum testosterone levels in women and children. Further studies are therefore needed to determine the CVi and CVg in women and children using assays that are appropriate for the task.
Other harmonisation initiatives have also shown that LC-MS/MS methods do not fully meet the desirable imprecision, bias and total allowable error performance criteria. In a study of the certified assays in the Centres for Disease Control and Prevention (CDC) HoSt program, five “certified” laboratories (four LC-MS/MS and one immunoassay) were challenged with 40 specimens that had assigned testosterone values based on the CDC reference method. These laboratories were compared over a one year period against the CDC reference method. Biological variation data was used to determine acceptable performance. None of the LC-MS/MS laboratories achieved desirable imprecision or bias 100% of the time; with only one lab showing 100% desirable imprecision for males. Only one LC-MS/MS laboratory met the desirable TEa 100% of the time. Even when the minimum performance criteria was applied, only one out of four LC-MS/MS labs met the minimum imprecision level and two out of four labs met the bias and TEa performance criteria.64 “As has been pointed out previously, simply implementing a mass spectrometry-based assay does not equate with accuracy and precision; it is essential that all assays are rigorously validated”.64 Having an appreciation of the methods used by these laboratories would provide insight into the similarities and differences that may be influencing LC-MS/MS method performance.
The choice of isotope labelling for the internal standard for LC-MS/MS analysis of serum testosterone can potentially have a significant influence on the results. The outcome of our method questionnaire demonstrated that the choice of internal standard varied between laboratories; with four (50%) laboratories using D2. It is generally considered better to use a D3 or higher deuterium-labelled internal standard as there are less isotope effects compared with D2.29 A study by Owen and colleagues comparing D2 with D5 and also C13 internal standards for testosterone demonstrates the influence on patient results based on the choice of internal standard; with this variability being consistent in the male and female serum testosterone range.65 In Owen’s study, the D2 results were reported to “give results close to the reference method using conditions described here, but this may not give the best results using different sample clean-up procedures and chromatography columns”.65 The D5 and C13 results (compared to D2) were generally lower.65 The selection of D2 is not ideal as it is only two additional daltons from the target analyte which may lead to interference from the target analyte at high concentrations due to the presence of 13C2 isotopomers of the target.66 On the other hand, stable isotopes can only compensate for ion alteration effects if they co-elute with the compound and hence are present in the ion source at the same time. The greater the number of deuterated atoms, the less likely the internal standard will co-elute; however D5 is usually acceptable for low resolution chromatography. C13-labelled testosterone is considered a more stable and acceptable, albeit more expensive, alternative to deuterium labelled internal standards and is now commercially available for testosterone and overall may be a better alternative to support serum testosterone harmonisation.
The collaborative process of harmonisation of LC-MS/MS measurement of serum testosterone highlights the advantages and also issues related to establishing and maintaining standardisation with traceability. These background studies also provide a mechanism to generate initial recommendations and associated gaps in knowledge related to the LC-MS/MS analysis of serum testosterone; which are presented in Table 5.
The determination of commutability of the calibrators and EQA material is vital and in this study we have made a presumption of commutability based on the studies with the two Unknown samples compared to the RCPAQAP and CC material based on the observed slopes (Appendix). In addition, the data generated from the RCPAQAP Liquid Serum Chemistry Program also supports commutability of the material for the MS based methods (data not shown). Even so, there is more work to be performed in this area, which could include the sharing of more native samples to: (a) demonstrate the between-method assay performance and (b) validate (or not) the commutability of the EQA material for LC-MS/MS methods. However, the validation of each laboratory’s calibrator through sample sharing is not an easy process. Hence, the formal validation of commutability of the RCPAQAP (and other EQA program) material is essential to monitor and interpret data related to harmonisation.
The findings of our harmonisation project provide directions for further harmonisation of mass spectrometry based serum steroid methods. Even with these initial recommendations in place there is still need to address a number of other issues, which include:
Since 2010, when we commenced the formal collaborative process to improve agreement of our LC-MS/MS methods, we have now achieved an overall group imprecision for serum testosterone that meets the desirable specifications based on biological variation. Whilst the selection of serum testosterone as the ‘test case’ model to apply to this collaborative process was based on the common interest of the founding group membership, it also proved to be the ideal steroid to trial. This was because there was already clear information related to testosterone in the JCTLM and biological variation databases and that the common EQA program used traceable targets. Unfortunately, this is not the case for a number of other steroids listed in the RCPAQAP Endocrine program. Hence, alternative processes will also need to be considered to aid the progression of harmonisation of all common steroids measured by LC-MS/MS.
Obtaining between laboratory agreements of results is a challenging process. Here we have highlighted the current challenges of standardisation for serum steroids measured by LC-MS/MS focusing on serum testosterone. This is the first such report to fully characterise the LC-MS/MS methods for serum testosterone in common use by clinical diagnostic laboratories. In all aspects the participation in a common EQA program is essential to ensure ongoing agreement of results and the activities presented here provide a practical demonstration of the central importance of EQA in this process. It is gratifying to know that over time, as a group, performance in the common EQA scheme has significantly improved in terms of both imprecision and bias.
First and foremost we wish to thank Ms Danny Sampson for her impetus and encouragement for the formation of this working group. We wish to thank Drs Veronica Vamathevan, John Murby and Lindsey Mackay from the National Measurement Institute in North Ryde, NSW for their tireless work in promoting traceability and developing the serum testosterone reference methods. In relation to this project, there were many additional people who have supported this work and we gratefully extend our thanks to the following individuals: Mr Michael Rennie from PM Separations for his coordination of distribution of the CC study materials to each participating laboratory; Mr Ian Farrance for his input into the statistical analysis; Dr Christa Cobbaert for her insights into the practical assessment of commutability; Ms Jill Tate for her vast unassuming knowledge and support of this project as chair of the AACB Harmonisation Committee; and Dr Leslie Lai for his unyielding support for this project as President of the APFCB.
|Laboratory||RCPAQAP Material Slope||Common Calibrator Material Slope||Unknown Fresh frozen serum Slope|
|Summary of slopes|
|SD of slope||0.046||0.045||0.049|
|Minus 2SD of slope||0.873||0.908||0.878|
|Plus 2SD of slope||1.058||1.089||1.074|
|Potential overall difference (95% range) between slopes||19%||18%||20%|
Note: Laboratory 8 was the reference laboratory, i.e. NMIA LC-MS/MS method.
Competing Interests: None declared (RFG,CSH, KEH, JJ, JPG, BCM, CF, YPI, BRC, CB, HTP, LMJ). TK and HTP are employees of Biocrates Life Sciences AG.