|Home | About | Journals | Submit | Contact Us | Français|
Inspired by the international consensus on defining and grading of bruxism (Lobbezoo F, Ahlberg J, Glaros AG, Kato T, Koyano K, Lavigne GJ et al. J Oral Rehabil. 2013;40:2), this commentary examines its contribution and underlying assumptions for defining sleep bruxism (SB). The consensus’ parsimonious redefinition of bruxism as a behaviour is an advance, but we explore an implied question: might SB be more than behaviour? Behaviours do not inherently require clinical treatment, making the consensus-proposed ‘diagnostic grading system’ inappropriate. However, diagnostic grading might be useful, if SB were considered a disorder. Therefore, to fully appreciate the contribution of the consensus statement, we first consider standards and evidence for determining whether SB is a disorder characterised by harmful dysfunction or a risk factor increasing probability of a disorder. Second, the strengths and weaknesses of the consensus statement’s proposed ‘diagnostic grading system’ are examined. The strongest evidence-to-date does not support SB as disorder as implied by ‘diagnosis’. Behaviour alone is not diagnosed; disorders are. Considered even as a grading system of behaviour, the proposed system is weakened by poor sensitivity of self-report for direct polysomnographic (PSG)-classified SB and poor associations between clinical judgments of SB and portable PSG; reliance on dichotomised reports; and failure to consider SB behaviour on a continuum, measurable and definable through valid behavioural observation. To date, evidence for validity of self-report or clinician report in placing SB behaviour on a continuum is lacking, raising concerns about their potential utility in any bruxism behavioural grading system, and handicapping future study of whether SB may be a useful risk factor for, or itself a disorder requiring treatment.
Noted as the top cited paper in all of 2013 in the Journal of Oral Rehabilitation, ‘Bruxism defined and graded: an international consensus’ (1) meets a pressing clinical and research need by presenting a novel definition and assessment method for a controversial phenomenon that is rarely directly observable and measurable, that is repetitive jaw muscle activity involving grinding or clenching of the teeth either during sleep or when awake. Recently, the authors of this commentary had multiple discussion sessions in which they considered the implications of the consensus statement for viewing SB as a behaviour, disorder or risk factor. The resulting commentary represents the product of their discussions. We considered reconvening the original consensus group to try to create a new consensus, but concluded that it would be inefficient and that a new consensus might never be achieved. Instead, we present our ideas as a commentary, acknowledging paradigm-breaking aspects of the 2013 paper, but also noting critical areas requiring more discussion and revision. Even though a reader may consider it precedent setting for the first author of the consensus paper to critique ‘his own’ prior paper, we believe that it provides an encouraging and forward-thinking model of the process by which opinions and positions can evolve.
The consensus paper raises insightful points, particularly the provision of a parsimonious revised definition emphasising that bruxism is a behaviour or activity, not clearly a ‘habit’ and not clearly defined as a ‘disorder’. It provides a definition that is circumscribed and specific in its description. In doing so, it moves us away from the concept of bruxism as an abnormality, as even a statistical abnormality is not a clinical abnormality unless it is clearly associated with a negative health outcome. In this commentary, we also outline how this advance may be undermined by the consensus statement’s proposal for a ‘diagnostic grading system’, implying that bruxism is a disorder.
Finally, the consensus paper advances the field by clearly separating sleep bruxism from awake bruxism. Unfortunately, the knowledge base concerning awake bruxism is more limited than the literature concerning sleep bruxism (SB). Thus, our subsequent comments focus on SB.
Behaviours of all types, including SB, can be deemed worthy of research study. On the other hand, considering SB to be a disorder or risk factor for a disorder has important clinical implications. Defining a behaviour as a disorder (or a strong risk factor for a disorder) implies a need for clinical treatment and management. If no effective treatment is available, it implies the need to develop effective treatment. Diagnosing harmless behaviours as disorders wastes clinician and patient time, resources and effort, and potentially causes unnecessary patient distress and risk of the possible negative side effects of unnecessary treatment.
When we move beyond simple classification or description of SB behaviour into the realm of diagnosis of the disorder of SB, we need some standard for deciding that SB is part of a disorder.
As proposed by Jerome Wakefield, the criterion of ‘harmful dysfunction’ can be used to identify a characteristic or behaviour for which a disorder may be defined, that is.
…dysfunction is a scientific and factual term based in evolutionary biology that refers to the failure of an internal mechanism to perform a natural function for which it was designed, and harmful is a value term referring to the consequences that occur to the person because of the dysfunction and are deemed negative by sociocultural standards (2).
Deviations that do not harm an individual are not labelled as disorders; only if the deviation occurs because a regulatory function of the masticatory system is not working as it should and the dysfunction affects the overall well-being of the person in some way, then it may be conceptualised as a disorder. Moreover, in dentistry, some dysfunctions that are not statistical deviations (e.g. dental caries or periodontal disease) are nevertheless diagnosable disorders (2).
For SB, the literature on negative oral health outcomes associated with it has been overwhelmingly based on assessment methods that do not represent the current gold standard of assessment for the behaviour of SB: polysomnography (PSG) with audiovisual recordings designed to aid differentiation of other sleep movements from SB.
In general, when SB is based on the gold standard rather than self or clinical report, the diagnosis-required association between SB and negative health outcomes becomes weak or even non-existent, according to a number of critical reviews (3–5). Thus, it can be easily argued that studies using current state-of-the-art methods to assess SB have not yet consistently identified a negative health outcome that can be attributed to behaviour mirroring the consensus statement’s definition of bruxism.
From a paleoanthropological perspective (6–9), tooth grinding that is largely characteristic of SB may have had an adaptive purpose in keeping teeth sharp, and the behaviour may have been maintained over the course of evolution. The consensus paper (1) even speculates that SB may have positive physiological functions, such as sustaining unobstructed airflow. In a related manner, SB may aid salivary lubrication during sleep, thereby protecting health of the upper alimentary tract (10–13).
In contrast to viewing SB as a disorder, might it be a risk factor? Viewing SB as a risk factor would mean that, when it occurs at a certain frequency or intensity, it increases likelihood of an individual developing a health disorder. A risk factor need not meet the standards of being a disorder itself, that is inherently indicating a harmful dysfunction, but it must show at least a regular, statistically significant association with a health disorder. Again, the critical reviews of the literature on SB (3–5) fail to support existence of the required associations between SB and oral health disorders.
Is there a point at which SB is associated with harmful consequences? Might we eventually establish a useful cut-point for defining SB as a risk factor? The specific cut-point to consider SB as a risk factor may vary, depending upon the specific harmful health outcome that might ultimately be empirically demonstrated to be a consequence of SB, for example tooth damage, temporomandibular disorders (TMDs) and periodontal damage. Similarly, various vulnerability factors such as genetic propensities (14) may eventually be shown to interact with SB activity characterised by these cut-points to produce harm. To the extent that SB becomes a risk factor only in certain vulnerable individuals, we would need to be able to efficiently obtain a marker of the vulnerability factor (s).
A special digital edition of the British Medical Journal (15) expresses concern that we spend so much time in clinical practice managing the proliferation of risk factors for disease, so-called incidentalomas (16) that we get in the bad habit of overdiagnosing by labelling a risk factor for a disease/disorder as a disorder itself, when it is a characteristic or behaviour rather than a harmful dysfunction. Of course, one disease or disorder could be a risk factor for another disease or disorder, but a risk factor is not inherently a disorder. A risk factor could be a behaviour or activity, such as SB, or a static or predictable characteristic of an individual such as gender or age.
Moreover, even a statistically significant predictive risk factor is only worth assessing clinically if it is reliably and practically assessed (17). Behaviours such as SB are inherently going to be more difficult to assess than static, stable or easily predictable characteristics. In addition, the most clinically useful risk factors are not just predictors of a disorder but are also modifiable. Thus, continuously distributed characteristics such as blood pressure and bone density have established cut-points (i.e. for defining hypertension and osteoporosis, respectively) that maximise their association with disorders such as cardiovascular disease and hip fracture, respectively. Treatments for these risk factors have been developed that have been shown to reduce the risk of occurrence of their associated disorder or disease. Consider the knowledge base needed to develop cut-point defined risk factors for hypertension and osteoporosis. For SB, we are concerned that, other than some type of PSG recording, there is no practicable method for assessing SB on a large scale, let alone demonstration of a reliable cut-point that maximises its relationship with an oral health outcome. Such a method would need to be identified in order for SB to be considered a clinically useful risk factor, one worthy of routine clinical assessment. The search for efficient algorithmic alternatives to the gold standard for SB has been suggested by some automated home-based systems rather than laboratory-based PSG studies, but these methods are still cumbersome for large-scale use compared to the even more cumbersome sleep laboratory study.
We have argued that the data have not yet clearly identified SB as a disorder itself or a risk factor for negative oral health outcomes. Practical constraints make it even less likely that it will eventually be considered a clinically useful risk factor. Nevertheless, this position does not discount the possibility that there are extreme cases in which a plausible post hoc explanation for a severe oral health disorder is that the patient engaged in severe SB. However, the occurrence of unusual situations in which SB might be a risk factor for a major negative health outcome occasionally does not justify routinely viewing it as a disorder or routinely treating it, nor does it replace the need to gather more evidence examining the unusual situation in which SB appears to have major health consequences.
Lavigne et al’s (18) Research Diagnostic Criteria for SB (RDC/SB) are generally considered to represent the gold standard for diagnosis of SB behaviour as a disorder. The original RDC/SB criteria (18) were based on quantitative/frequency cut-points intended to maximise sensitivity and specificity relative to the American Sleep Disorders Association (ASDA) International Classification definition of SB (19). The ASDA definition and the RDC/SB PSG-based quantification of behaviour, designed to maximise correspondence with the definition, examine the characteristics of SB that would indicate a harmful dysfunction. Lavigne et al.’s RDC/SB were later revised (20) to include a ‘moderate SB’ group that appeared at higher risk of having masticatory muscle pain than high-frequency sleep bruxers. Although the RDC/SB should be acknowledged as a valiant and well-cited initial approach, the validity of the diagnostic rules was inextricably linked to the ASDA definition which is confounded with health outcomes by assumption rather than evidence. ASDA required evidence of either abnormal tooth wear, sounds associated with bruxism or jaw muscle discomfort, in addition to self-reported SB. It does not consider whether gold-standard PSG-assessed SB is actually associated with these harms. The authors of the ASDA definition appear to assume that use of a behavioural assessment method known or presumed to be weak (i.e. self-report) can be hypothetically improved by only paying attention to the unreliable behavioural report when it is accompanied by a variety of health consequences. Of course, this approach is problematic, because it fails to consider that such consequences have only been reported when the behaviour is poorly measured, not when it is measured by better PSG methods. Thus, it is no surprise that those using the tautological ASDA definition to test a relation between SB the behaviour and facial pain (21) or between SB and tooth wear (22) occasionally find an association between SB and those health outcomes. It is actually startling to acknowledge that, given the definition with which the PSG-based RDC/SB is intended to maximise correspondence, only one study found an association between TMD status and RDC/SB (23), which was not apparent in an earlier subsample (24) and which found such high rates of SB in both case (63%) and control (33%) samples that either standards for SB or overall sample selection (or both) raise serious questions about research design. In contrast, multiple other studies have failed to find a relation between RDC/SB-diagnosed SB and TMD pain (3, 20, 25–27). Hence, whether the behaviour of SB bears on prognosis and therapy (28) cannot be assessed using the ASDA definition. The PSG-based RDC/SB system with cut-points or cut bands (29) designed to maximise correspondence with the ASDA diagnostic criteria is also somewhat limited for this purpose.
We are aware of no studies that compare clinical diagnoses of SB with the current gold standard measures of SB. However, clinical diagnoses of SB have failed to significantly relate to portable electromyographic (EMG)-based diagnoses of SB (30) using the most promising portable EMG system (31). Moreover, clinical ratings of SB based on augmented stone casts (i.e. gold-plated molar casts, with fine attrition detail) have poor inter-rater and intra-rater reliability (32), which is unrelated to the clinician’s confidence that bruxism can be assessed via tooth wear on stone casts. More recent studies using stone casts (33) reach similar conclusions. Note that these conclusions do not contradict findings that tooth wear, especially if wear is advanced and dentin is exposed (34, 35), can be scored reliably in clinical settings, but it nevertheless requires training (35) and/or knowledge of a formalised grading system (34, 36). Neither of these resources is likely to be used by most dentists making judgments about SB in usual clinical settings. Furthermore, self-reported and clinically based diagnoses of SB in TMD patients have unacceptably low levels of agreement. Most recently, a study examining the ability of a variety of signs and symptoms to predict the gold standard of PSG-based RDC/SB (18) concluded that none was able to identify those with SB accurately (37). Even the more strongly associated symptoms such as temporal headaches and muscle fatigue, all had positive predictive values below 30%, when properly adjusting for an estimated population prevalence of PSG-based RDC/SB of 10% or less (38) rather than the artificially constructed case–control sample RDC/SB prevalence rate of 50%. As SB has yet to be demonstrated as reliably associated with a clinical condition, we have no reason to anticipate higher than a 10% RDC/SB prevalence rates in clinical samples.
In theory, any behaviour can be classified as present or absent, or measured on some multipoint or continuous scale. Classification or taxonomic rules can be applied to any activity, characteristic or object, whether or not the activity represents a disorder. Classification systems are based on observable bundles of traits or characteristics. In medicine and dentistry, our taxonomic rules must meet an additional standard: for diagnosis of a disorder, we focus on classifications which bear on health-related prognosis and, perhaps, choice of therapy (28). Frameworks like Wakefield’s (2) harmful dysfunction criteria are particularly useful to set standards for determining whether a disorder is present, one requiring a method for diagnosis.
Although the consensus statement paper never explicitly discusses the concept of SB as a disorder, it proposes a systematic ‘diagnostic grading system’. This is puzzling, because one does not diagnose mere behaviour; one diagnoses disorders. The consensus paper may have created a grading system to order the strength of evidence from different methods for assessing bruxism, but use of ‘diagnostic’ terminology is misleading.
In the diagnostic grading system, the consensus paper (1) proposes that using only self-report questionnaires can identify ‘possible’ bruxism, while the addition of clinical examination is required to identify ‘probable’ bruxism. ‘Definite’ SB is proposed as requiring both of these as well as PSG.
Neither the PSG-based RDC/SB (18) nor the diagnostic grading system in the consensus paper move us efficiently in the direction of determining whether SB behaviour might be associated with negative health outcomes and be either a disorder itself or a risk factor for a disorder. Diagnosis implies a dichotomous classification at a clinically meaningful cut-point that is designed to identify levels or frequencies of behaviours likely to represent a harmful dysfunction. Clearly, the diagnostic grading system in the consensus paper considers direct behavioural observation to be superior, but no reference to clinical prognosis or appropriate therapy is incorporated into the diagnostic grading system. The international consensus grading system inherently encourages weighing more heavily evidence of bruxism from methods that have produced strong evidence of reliability and direct observation suggestive of validity, but fails to reject the use of weaker, potentially error-filled methods.
Moreover, the ‘stackable’ grading system turns out to not be a tenable one. It is based on the assumption that self-report is fully sensitive but insufficiently specific compared to clinical report and that clinical report is fully sensitive but insufficiently specific compared to ‘gold standard’ direct PSG observation. However, recent and historical evidence informs us that these assumptions are incorrect. In fact, some individuals engage in considerable SB activity during PSG studies but do not self-report the behaviour. For example, when the RDC/SB PSG standard is set at a cut-point considered to represent moderate SB (20), more than 30% of TMD cases and nearly 85% of controls who meet the PSG standard fail to self-report that they were ever told they grind their teeth at night (39). Thus, sensitivity of self-report –the inverse of the false negative rate– is well below the presumed level of 100%. In general, correspondence between evidence from PSG studies versus various questions related to self-report of SB do not exceed chance levels (39). Moreover, both self-report and clinical reports tend to be dichotomised as ‘yes’ or ‘no’, sometimes, but not always differentiating between awake and SB or referencing a specific time period (e.g. ever vs. last 2 weeks). Of course, when considered simply as a behaviour, SB is best viewed as quantifiable activity occurring on a continuum, with cut-points or cut bands only developed if the cut-point maximises association of SB as a risk factor for a specific health outcome better than a continuously scored bruxism severity measure.
Even if the true intent of the diagnostic grading system was ‘grading evidence of bruxism’ as behaviour rather than ‘diagnostic grading’, we are still left with a major problem: how do we classify an individual who shows high levels of SB on PSG examination, but fails to self-report it? How do we classify an individual whose clinician is confident that the patient bruxes during sleep, the patient does not report it, and even a two-night PSG study finds extremely low or non-existent evidence of SB activity? According to the proposed grading system, their classifications would be indeterminate.
In a provocatively titled paper, ‘SB Etiology: The Evolution of a Changing Paradigm’ (40), the authors discuss changing concepts of aetiology, but paradoxically cling to the concept of the need to identify effective treatments. In contrast, a recent review on management of SB (41) warns of the danger of overtreatment and the need to only treat SB with clinical consequences. Unfortunately, the international consensus paper (1) may have overlooked its own paradigm-shifting position, when it proposed a ‘diagnostic grading system’ for bruxism.
The consensus paper states that bruxism is difficult to manage because evidence-based treatment requires a clear definition. Is this really a problem of behavioural definition? Instead, we now argue here that SB is difficult to manage, because it need not be routinely managed and does not necessarily represent a harmful dysfunction. The above noted publication (40) is a single example among many of how the most critical paradigm shift from the consensus paper (1) was quickly forgotten. The problem is not lack of a bruxism definition, but conflation of SB behaviour or activity with a disorder requiring treatment. Thus, to the extent that the ‘diagnostic grading system’ seems to endorse the concept of bruxism as a disorder requiring treatment, we risk attempting to treat SB behaviours which, according to current literature using best available assessment methods, appear to be largely benign.
As PSG recordings of SB activity are expensive and labour intensive, future research is needed to identify acceptable alternatives. One of the devices suggested to be most promising (42) was evaluated in a small sample of individuals pre-selected as likely extreme bruxers versus non-bruxers. The continuously measured score of SB using the ‘Bruxoff’ device compared well to a comprehensive portable PSG device (r = 0 95, P < 0 0001). Future research on this device and other portable devices needs to be conducted, with SB initially scored as a continuous measure of activity.
Although an extreme-group strategy is appropriate at an early research stage when disorder is presumed, more population-representative samples that are not pre-selected to represent either extreme of the continuum of SB are needed to better understand SB behaviour. The best test of a proxy measure is whether it is associated with a gold standard behavioural observation measure in a representative group containing all ranges of behaviour, not whether it can perform well at the simpler tasks of discriminating between behavioural extremes (e.g. no SB vs extreme SB).
Furthermore, the call for selection of more representative samples of individuals without regard to symptoms assumed a priori to be associated with SB reminds us that, as we move forward, we should consider SB to be a behaviour which requires a continuous distribution for initial assessment. If specific cut-points on the continuum of SB behaviour are shown to relate to specific oro-facial problems, then a cut-point on the continuum can be established above which the individual exhibits a level of SB that may be a risk factor for a disorder rather than merely a definable behaviour. The cut-point (or cut band) may differ between individuals for different orofacial health problems, or even within an individual for different orofacial health problems.
A large body of research using self-reported SB, clinician-assessed SB and polysomnographically assessed (RDC/SB) SB was likely necessary to reach the conclusion that a paradigm shift in conceptualisation of SB behaviour is required. The concern about use of self-report and clinician reports of SB is definitely not a call to cease bruxism research but rather to accelerate it. Using the best possible measures, we need to know whether SB behaviour is stable over short or long periods. Right now, we only know that extremes of SB behaviour are relatively stable (43), but we do not know whether SB behaviour is stable in more representative population samples. Is a single night of best-method observation sufficient to ‘score’ an individual’s SB behaviour? Do we need several nights of observation? If there is instability over time or when social context changes, failure to assess it over adequate periods of time or in a variety of situations could theoretically attenuate the behaviour’s relation with negative health outcomes.
Until additional studies of the nature proposed above are completed, we consider it premature to consider SB more than a behaviour that may lead to harm, but as yet cannot be considered a harmful dysfunction itself (i.e. disorder) or even a risk factor for harmful oral health outcomes. SB behaviours should be identified using the best possible assessment methods. Development of gold standard methods for SB assessment still requires better understanding of the trajectory and variability of SB over time and context. At present, given the evidence of poor correspondence between self-report, clinical evaluations, and current state-of-the-art direct observation or recording methods, only the last set of methods is recommended for purposes of both research and clinical practice.
Finally, we remind readers that a commentary serves a different purpose than a consensus statement by a large group of experts. A commentary authored by a small group of individuals is intended to provoke scientific and clinical discussion and debate, and to set new agendas for further research in which important but unresolved issues can move towards resolution.
No ethical approvals are relevant for this commentary.
This article was supported in part by R01DE018569 and by financial support from the University of Amsterdam to support Dr. Lobbezoo’s Visiting Professorship at the New York University College of Dentistry, USA, during the academic year 2014–2015.
Conflict of interest
Dr. Lobbezoo reports other from Sunstar Suisse, grants from Sunstar Suisse, grants from Somnomed, during the conduct of the study. The other authors have stated explicitly that there are no conflict of interests in connection with this article.