This study clearly demonstrates the unsatisfactory agreement between commonly used diagnostic tests for LM. The major methodological problem is the lack of a "gold standard" for the diagnosis of LM. Jejunal biopsies for assessment of lactase activity is an unreliable method due to the irregular dissemination of lactase in the intestine, the available genetic test does not detect all genetic disorders related to LM and does not diagnose secondary LM, breaths tests are highly dependent on the microflora throughout the gut, and serum glucose depends on the glucose absorption and metabolism [
15]. Therefore this study assesses primarily the agreement between the test variables. But because assessment of agreement between tests requires a positive or negative result of the test, we had to diagnose LM in each patient. The final diagnosis of LM was based on an overall evaluation of all tests performed in each subject. This is the only applicable method when no formal "gold standard" is available. When the diagnosis was established, the best cut-off levels (normal values) for each of the continuous variables were chosen. A high sensitivity was preferred to avoid false negative results at the expense of a lower specificity.
Lactose breath test with measurement of H2+CH4x2 was judged as the best test. It was superior to H2+CH4 because of better sensitivity and a somewhat higher area under the ROC-curve (tables and ). The sensitivity and specificity was 100% with cut-off levels (normal values) < 18 ppm and < 53 ppm respectively. Results in the range from 18 ppm to 52 ppm render further tests necessary to obtain a conclusive diagnosis. The agreement between H2 + CH4 and H2+CH4x2 was, as expected, very good because most subjects with LM predominantly produce H2, and the variables are slight modifications of each other. Nevertheless, H2 + CH4x2 seem to be preferable in clinical use and have satisfactory diagnostic properties.
Breath tests with measurement of only H2 have been judged as reliable tests for LM [
16,
17]. The recently published Rome Consensus Conference report states that measurement of breath CH4 excretion is not currently recommended to improve the diagnostic accuracy of the H2 breath test due to lack of evidence, and that further studies on other gases (mainly CH4) than H2 should be encouraged [
15]. In this study, the agreement between H2 and any combination of H2 and CH4 was very good, but the lower sensitivity of H2 only made it inferior to the combination of H2 + CH4x2. Since about 30% of the adult population is so-called CH4-producers and the methanogenesis consumes large quantities of H2 to produce CH4, it is reasonable to measure CH4 in addition to H2. This study showed that measurement of CH4 in addition to H2 increased the diagnostic accuracy of the breath test and that H2 + CH4x2 was the best one despite the fact that the concentration of CH4 is variable both in fasting conditions and after meals [
18].
S-glucose is an alternative to breath test. The agreement with the breath test was modest and clinically unsatisfactory, but the diagnostic properties (sensitivity, specificity, PPV, NPV, and LR) were identical with that of H2 (table ). The poor agreement and identical diagnostic properties are the result of different diagnostic classification into health and disease of the two methods. Since no gold standard is at hand it is impossible to judge between them, but because of the low specificity of s-glucose and lower area under the ROC-curve we conclude in accordance with other publications that it is inferior to breath test with measurement of H2 or H2 + CH4x2 (tables and ) [
17,
19].
Registration of symptoms after intake of lactose has been used as a simple test for LM [
14]. Evaluation of the onset, severity and duration of symptoms for 8 hours has been recommended [
15]. This study shows in accordance with previous reports, that symptoms in general are highly unreliable and unfit for clinical use [
2]. Symptoms questionnaires and symptom based criteria such as "Early and Long Lasting (ELL)-symptoms" have better diagnostic properties [
2,
16]. In this study ELL-symptoms were superior to unspecified symptoms after intake of lactose. These findings are in agreement with the Rome Consensus Conference that symptoms should be evaluated during and for some hours after the test, and that onset and duration are of importance [
15]. But even ELL-symptoms showed unacceptable diagnostic properties and poor agreement with any of the other test (table and ). This fits with the clinical observation that the prevalence of perceived lactose intolerance, which is also related to visceral hypersensitivity, is significantly higher than that of LM, and that subjects with LM can consume a variable but limited amount of lactose without developing symptoms [
9,
11,
20].
In this study the genetic test was performed in only ten subjects with discrepancies at the first combined test and the results therefore give limited information about the usability of the test. The test is probably highly indicative of lactase non-persistence in adults [
12,
13]. But the fact that LM might be due to other genetic abnormalities and organic disorders in the gastrointestinal tract limits the clinical utility of the test [
21,
22].
The selection criteria were pragmatic and based on perceived milk intolerance or symptoms judged as possible LM by the doctor. The selection was not strictly scientific, but according to everyday practice. The prevalence of LM was rather low despite the fact that most patients had symptoms related to intake of milk or lactose and were referred with suspected LM. This is in accordance with other studies in Scandinavia showing a low prevalence of LM both in the general population and in patients with FGID [
2,
3]. A somewhat lower BMI was the only clinical characteristic of subjects with LM, and has also been reported in other trials [
23].
Performance of the breath tests varies. In this study, the tests were performed according to recently published guidelines concerning devices for breath sampling, stationary and immediate analyses, prolonged expiration and correction for alveolar CO2, use of antibiotics, diet, cigarette smoking and physical exercise [
15]. However, no mouth washing was performed, and colonic clearing was not sufficiently taken into account, but was never performed in close relation to the test. A three-sample H2 breath test is favourable compared to a two-sample [
24]. The five-sample test used in this trial strengthens the results. The length of the test was three hours; 4 hours have been recommended because some subjects have a slow transit [
15]. In all, it is unlikely that these minor deviations from the recently published recommendations have had any significant influence on the results. Also the dose of lactose varies. Twenty-five gram lactose (equivalent to 500 mL milk), the dose used in this trial, seems reasonable and is the recommended dose [
15]. This amount gives symptoms in most subjects with LM and is within the range of normal consumption [
9,
11,
25].
Practical and correct dietary advice to patients with FGID and food intolerance is impossible without valid and reliable tests for food intolerance. Such tests are by and large missing. Patients with food intolerance often make unnecessary changes in the diet which for some result in malnutrition [
6,
26]. Further improvement of the diagnostic armamentarium for food intolerance is desired to improve dietary treatment.