Search tips
Search criteria

Results 1-25 (1156513)

Clipboard (0)

Related Articles

1.  Use of Expert Panels to Define the Reference Standard in Diagnostic Research: A Systematic Review of Published Methods and Reporting 
PLoS Medicine  2013;10(10):e1001531.
Loes C. M. Bertens and colleagues survey the published diagnostic research literature for use of expert panels to define the reference standard, characterize components and missing information, and recommend elements that should be reported in diagnostic studies.
Please see later in the article for the Editors' Summary
In diagnostic studies, a single and error-free test that can be used as the reference (gold) standard often does not exist. One solution is the use of panel diagnosis, i.e., a group of experts who assess the results from multiple tests to reach a final diagnosis in each patient. Although panel diagnosis, also known as consensus or expert diagnosis, is frequently used as the reference standard, guidance on preferred methodology is lacking. The aim of this study is to provide an overview of methods used in panel diagnoses and to provide initial guidance on the use and reporting of panel diagnosis as reference standard.
Methods and Findings
PubMed was systematically searched for diagnostic studies applying a panel diagnosis as reference standard published up to May 31, 2012. We included diagnostic studies in which the final diagnosis was made by two or more persons based on results from multiple tests. General study characteristics and details of panel methodology were extracted. Eighty-one studies were included, of which most reported on psychiatry (37%) and cardiovascular (21%) diseases. Data extraction was hampered by incomplete reporting; one or more pieces of critical information about panel reference standard methodology was missing in 83% of studies. In most studies (75%), the panel consisted of three or fewer members. Panel members were blinded to the results of the index test results in 31% of studies. Reproducibility of the decision process was assessed in 17 (21%) studies. Reported details on panel constitution, information for diagnosis and methods of decision making varied considerably between studies.
Methods of panel diagnosis varied substantially across studies and many aspects of the procedure were either unclear or not reported. On the basis of our review, we identified areas for improvement and developed a checklist and flow chart for initial guidance for researchers conducting and reporting of studies involving panel diagnosis.
Please see later in the article for the Editors' Summary
Editors' Summary
Before any disease or condition can be treated, a correct diagnosis of the condition has to be made. Faced with a patient with medical problems and no diagnosis, a doctor will ask the patient about their symptoms and medical history and generally will examine the patient. On the basis of this questioning and examination, the clinician will form an initial impression of the possible conditions the patient may have, usually with a most likely diagnosis in mind. To support or reject the most likely diagnosis and to exclude the other possible diagnoses, the clinician will then order a series of tests and diagnostic procedures. These may include laboratory tests (such as the measurement of blood sugar levels), imaging procedures (such as an MRI scan), or functional tests (such as spirometry, which tests lung function). Finally, the clinician will use all the data s/he has collected to reach a firm diagnosis and will recommend a program of treatment or observation for the patient.
Why Was This Study Done?
Researchers are continually looking for new, improved diagnostic tests and multivariable diagnostic models—combinations of tests and characteristics that point to a diagnosis. Diagnostic research, which assesses the accuracy of new tests and models, requires that each patient involved in a diagnostic study has a final correct diagnosis. Unfortunately, for most conditions, there is no single, error-free test that can be used as the reference (gold) standard for diagnosis. If an imperfect reference standard is used, errors in the final disease classification may bias the results of the diagnostic study and may lead to a new test being adopted that is actually less accurate than existing tests. One widely used solution to the lack of a reference standard is “panel diagnosis” in which two or more experts assess the results from multiple tests to reach a final diagnosis for each patient in a diagnostic study. However, there is currently no formal guidance available on the conduct and reporting of panel diagnosis. Here, the researchers undertake a systematic review (a study that uses predefined criteria to identify research on a given topic) to provide an overview of the methodology and reporting of panel diagnosis.
What Did the Researchers Do and Find?
The researchers identified 81 published diagnostic studies that used panel diagnosis as a reference standard. 37% of these studies reported on psychiatric diseases, 21% reported on cardiovascular diseases, and 12% reported on respiratory diseases. Most of the studies (64%) were designed to assess the accuracy of one or more diagnostic test. Notably, one or more critical piece of information on methodology was missing in 83% of the studies. Specifically, information on the constitution of the panel was missing in a quarter of the studies and information on the decision-making process (whether, for example, a diagnosis was reached by discussion among panel members or by combining individual panel member's assessments) was incomplete in more than two-thirds of the studies. In three-quarters of the studies for which information was available, the panel consisted of only two or three members; different fields of expertise were represented in the panels in nearly two-thirds of the studies. In a third of the studies for which information was available, panel members made their diagnoses without access to the results of the test being assessed. Finally, the reproducibility of the decision-making process was assessed in a fifth of the studies.
What Do These Findings Mean?
These findings indicate that the methodology of panel diagnosis varies substantially among diagnostic studies and that reporting of this methodology is often unclear or absent. Both the methodology and reporting of panel diagnosis could, therefore, be improved substantially. Based on their findings, the researchers provide a checklist and flow chart to help guide the conduct and reporting of studies involving panel diagnosis. For example, they suggest that, when designing a study that uses panel diagnosis as the reference standard, the number and background of panel members should be considered, and they provide a list of options that should be considered when planning the decision-making process. Although more research into each of the options identified by the researchers is needed, their recommendations provide a starting point for the development of formal guidelines on the methodology and reporting of panel diagnosis for use as a reference standard in diagnostic research.
Additional Information
Please access these Web sites via the online version of this summary at
Wikipedia has a page on medical diagnosis (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Equator Network is an international initiative that seeks to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies; its website includes information on a wide range of reporting guidelines, including the STAndards for the Reporting of Diagnostic accuracy studies (STARD), an initiative that aims to improve the accuracy and completeness of reporting of studies of diagnostic accuracy
PMCID: PMC3797139  PMID: 24143138
2.  Expanding Disease Definitions in Guidelines and Expert Panel Ties to Industry: A Cross-sectional Study of Common Conditions in the United States 
PLoS Medicine  2013;10(8):e1001500.
Financial ties between health professionals and industry may unduly influence professional judgments and some researchers have suggested that widening disease definitions may be one driver of over-diagnosis, bringing potentially unnecessary labeling and harm. We aimed to identify guidelines in which disease definitions were changed, to assess whether any proposed changes would increase the numbers of individuals considered to have the disease, whether potential harms of expanding disease definitions were investigated, and the extent of members' industry ties.
Methods and Findings
We undertook a cross-sectional study of the most recent publication between 2000 and 2013 from national and international guideline panels making decisions about definitions or diagnostic criteria for common conditions in the United States. We assessed whether proposed changes widened or narrowed disease definitions, rationales offered, mention of potential harms of those changes, and the nature and extent of disclosed ties between members and pharmaceutical or device companies.
Of 16 publications on 14 common conditions, ten proposed changes widening and one narrowing definitions. For five, impact was unclear. Widening fell into three categories: creating “pre-disease”; lowering diagnostic thresholds; and proposing earlier or different diagnostic methods. Rationales included standardising diagnostic criteria and new evidence about risks for people previously considered to not have the disease. No publication included rigorous assessment of potential harms of proposed changes.
Among 14 panels with disclosures, the average proportion of members with industry ties was 75%. Twelve were chaired by people with ties. For members with ties, the median number of companies to which they had ties was seven. Companies with ties to the highest proportions of members were active in the relevant therapeutic area. Limitations arise from reliance on only disclosed ties, and exclusion of conditions too broad to enable analysis of single panel publications.
For the common conditions studied, a majority of panels proposed changes to disease definitions that increased the number of individuals considered to have the disease, none reported rigorous assessment of potential harms of that widening, and most had a majority of members disclosing financial ties to pharmaceutical companies.
Please see later in the article for the Editors' Summary
Editors' Summary
Health professionals generally base their diagnosis of physical and mental disorders among their patients on disease definitions and diagnostic thresholds that are drawn up by expert panels and published as statements or as part of clinical practice guidelines. These disease definitions and diagnostic thresholds are reviewed and updated in response to changes in disease detection methods, treatments, medical knowledge, and, in the case of mental illness, changes in cultural norms. Sometimes, the review process widens disease definitions and lowers diagnostic thresholds. Such changes can be beneficial. For example, they might ensure that life-threatening conditions are diagnosed early when they are still treatable. But the widening of disease definitions can also lead to over-diagnosis—the diagnosis of a condition in a healthy individual that will never cause any symptoms and won't lead to an early death. Over-diagnosis can unnecessarily label people as ill, harm healthy individuals by exposing them to treatments they do not need, and waste resources that could be used to treat or prevent “genuine” illness.
Why Was This Study Done?
In recent years, evidence for widespread financial and non-financial ties between pharmaceutical companies and the health professionals involved in writing clinical practice guidelines has increased, and concern that these links may influence professional judgments has grown. As a result, a 2011 report from the US Institute of Medicine (IOM) recommended that, whenever possible, guideline developers should not have conflicts of interest, that a minority of the panel members involved in guideline development should have conflicts of interest, and that the chairs of these panels should be free of conflicts. Much less is known, however, about the ties between industry and the health professionals involved in reviewing disease definitions and whether these ties might in some way contribute to over-diagnosis. In this cross-sectional study (an investigation that takes a snapshot of a situation at a single time point), the researchers identify panels that have recently made decisions about definitions or diagnostic thresholds for conditions that are common in the US and describe the industry ties among the panel members and the changes in disease definitions proposed by the panels.
What Did the Researchers Do and Find?
The researchers identified 16 publications in which expert panels proposed changes to the disease definitions and diagnostic criteria for 14 conditions that are common in the US such as hypertension (high blood pressure) and Alzheimer disease. The proposed changes widened the disease definition for ten diseases, narrowed it for one disease, and had an unclear impact for five diseases. Reasons included in the publications for changing disease definitions included new evidence of risk for people previously considered normal (pre-hypertension) and the emergence of new biomarkers, tests, or treatments (Alzheimer disease). Only six of the panels mentioned possible harms of the proposed changes and none appeared to rigorously assess the downsides of expanding definitions. Of the 15 panels involved in the publications (one panel produced two publications), 12 included members who disclosed financial ties to multiple companies. Notably, the commonest industrial ties among these panels were to companies marketing drugs for the disease being considered by that panel. On average, 75% of panel members disclosed industry ties (range 0% to 100%) to a median of seven companies each. Moreover, similar proportions of panel members disclosed industry ties in publications released before and after the 2011 IOM report.
What Do These Findings Mean?
These findings show that, for the conditions studied, most panels considering disease definitions and diagnostic criteria proposed changes that widened disease definitions and that financial ties with pharmaceutical companies with direct interests in the therapeutic area covered by the panel were common among panel members. Because this study does not include a comparison group, these findings do not establish a causal link between industry ties and proposals to change disease definitions. Moreover, because the study concentrates on a subset of common diseases in the US setting, the generalizability of these findings is limited. Despite these and other study limitations, these findings provide new information about the ties between industry and influential medical professionals and raise questions about the current processes of disease definition. Future research, the researchers suggest, should investigate how disease definitions change over time, how much money panel members receive from industry, and how panel proposals affect the potential market of sponsors. Finally it should aim to design new processes for reviewing disease definitions that are free from potential conflicts of interest.
Additional Information
Please access these Web sites via the online version of this summary at
A PLOS Medicine Research Article by Knüppel et al. assesses the representation of ethical issues in general clinical practice guidelines on dementia care
Wikipedia has a page on medical diagnosis (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
An article on over-diagnosis by two of the study authors is available; an international conference on preventing over-diagnosis will take place this September
The 2011 US Institute of Medicine report Clinical Practice Guidelines We Can Trust is available
A PLOS Medicine Essay by Lisa Cosgrove and Sheldon Krimsky discusses the financial ties with industry of panel members involved in the preparation of the latest revision of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM), which provides standard criteria for the classification of mental disorders
PMCID: PMC3742441  PMID: 23966841
3.  Radiological progression and its predictive risk factors in silicosis 
OBJECTIVES—To investigate the risk factors predicting radiological progression in silicosis in a prospective cohort study of patients with silicosis who were previously exposed to silica from granite dust.
METHODS—From among a total of 260 patients with silicosis contracted from granite work, 141 with available serial chest x ray films of acceptable quality taken over a period of 2 to 17 (mean 7.5) years, were selected for study. Ninety four (66.7%) had ended exposure 5 or more years perviously (mean 10.1 years, maximum 28 years). Radiological progression was assessed by paired comparison of the initial and most recent radiographs, with two or more steps of increase in profusion of small opacities according to the 12 point scale of the International Labour Organisation (ILO) classification of radiographs of pneumoconiosis, taken from the majority reading by a panel of three independent readers.
RESULTS—Overall, 37% of patients with silicosis had radiological evidence of progression. From the initial radiographs, 24 (31.6%) of those with radiological profusion category 1, 15 (37.5%) of those with radiological profusion category 2, and 13 (52%) of those with complicated silicosis (including all seven with category 3 profusion of small opacities) showed radiological progression. As expected, progression was more likely to be found after longer periods of follow up (the interval between the two chest x ray films) with a 20% increased odds of progression for every additional year of follow up. After adjustment for varying intervals of follow up, the probability of radiological progression was found to be significant if large opacities were present in the initial chest x ray film. Progression was also less likely to be found among those who had ended exposure to silica longer ago, although the result was of borderline significance (p=0.07). Tuberculosis was also associated with increased likelihood of progression (borderline significance).
CONCLUSIONS—There is a high probability of radiological progression in silicosis after high levels of exposure to granite dust among workers who were followed up for up to 17 years. A significant risk factor is the extent of radiological opacities in the initial chest x ray film. The probability of progression is also likely to be reduced with longer periods after the end of exposure.

Keywords: silicosis; radiological progression; granite quarry
PMCID: PMC1740153  PMID: 11404452
4.  Classification of chest radiographs for epidemiological purposes by people not experienced in the radiology of pneumoconiosis. 
Under controlled conditions 16 people (eight non-medical) inexperienced in the radiology of occupational lung diseases repeatedly classified 300 selected chest radiographs using the 1971 ILO U/C International Classification of Radiographs of Pneumoconioses. Eight experienced medical readers had previously classified 220 of the selected radiographs for profusion of small rounded opacities. Variability among readers was greater in experimental panels than among the experienced readers. But the average consistency between pairs of novice readers in their use of the 12 categories of profusion for the same radiographs was similar (about 29%) to the average consistency among the experienced readers. Subsequent work with nine of the participants showed that eight of them were able to produce classifications of coal miners' chest radiographs that correlated well with estimates the miners' exposures to respirable coal mine dust. It is concluded that the ILO classification scheme provides a sound descriptive system for recording the appearances of chest radiographs. Under controlled conditions the scheme may be used for epidemiological studies by those with no specialist knowledge or clinical experience. This presupposes that the radiographs concerned will have been examined previously for diagnostic purposes by a suitable qualified physician.
PMCID: PMC1008883  PMID: 7272238
5.  The Study of Observer Variation in the Radiological Classification of Pneumoconiosis 
In a long-term investigation such as the National Coal Board's Pneumoconiosis Field Research (P.F.R.), it is essential to establish satisfactory and stable procedures for making the necessary observations and measurements. It is equally important regularly to apply suitable methods of checking the accuracy and consistency of the various observations and measurements. One aspect of vital importance in the P.F.R. is the classification of the series of chest radiographs taken, at intervals, of all the men under observation. This is inevitably a subjective process, and (as with other similar fields of work) it is desirable to obtain some understanding of the basic process behind the operation. This can usefully be done by the help of “models” designed to describe the process, if necessary in simplified terms. The problem of the radiological classification of pneumoconiosis has been studied hitherto in terms of coefficients of disagreement (inter-observer variation) and inconsistency (intra-observer variation), but for various reasons the method was not considered entirely satisfactory. New methods of approach were therefore developed for studying the performance of the two doctors responsible for the film reading in the Research, and two distinct “models” were derived. The advantages and disadvantages of each are described in the paper, together with the applications of the two models to the study of some of the problems arising in the course of the investigation.
The first model is based on the assumption that if a film is selected at random from a batch representing a whole colliery population, and that if the film is of “true” category i, the chance of its being read as another category (j) is a constant, Pij, which depends upon the observer concerned, the particular batch of films being read, and the values of i and j. This model enables the performance of the readers to be monitored satisfactorily, and it has also been used to investigate different methods for arriving at an agreed, or “definitive”, assessment of radiological abnormality. The Pij model suffers from the disadvantage of applying only to “average” films, and the assumptions made are such that it manifestly does not provide an entirely realistic representation of the reading process on any particular film.
The second “improved” model was therefore developed to overcome this criticism. Briefly, it is considered that each film is representative of a unique degree of abnormality, located on a continuum, or abnormality scale, which covers the whole range of simple pneumoconiosis. The scale of abnormality is then chosen in such a way that, whatever the true degree of abnormality of the film, the observer's readings will be normally distributed about the true value with constant bias and variability at all points along the scale. The very large number of readings available has been analysed to determine the optimum positions of the category boundaries on the abnormality scale and in this way the scale has been unambiguously defined. The model enables the routine reading standards to be monitored, and it has also been used to investigate the underlying distribution of abnormality at individual collieries. Its chief disadvantage is the extensive computational work required.
The “fit” of both models to the data collected in the Research is shown to be satisfactory and on balance it appears that both have applications in this field of study. The method chosen in any given circumstance will depend upon the particular requirement and the facilities available for computational work.
PMCID: PMC1038082  PMID: 13698433
6.  Assessment of radiological progression of simple pneumoconiosis in individual miners 
Liddell, F. D. K. (1974).British Journal of Industrial Medicine,31, 185-195. Assessment of radiological progression of simple pneumoconiosis in individual miners. The studies reported aimed to determine the best method of assessing radiological progression of simple pneumoconiosis in the individual, so that his progression score could be related to other known information about him. The main concern was with subjects for whom three serial posteroanterior chest radiographs were available at approximately quinquennial intervals.
As in other investigations, the 12-point scale of the National Coal Board elaboration led to markedly lower observer error and variability than did the 4-point International Labour Office classification, without distorting levels of progression.
Side-by-side reading led to substantially lower observer error and variability than did independent reading. Although the levels of progression in side-by-side reading were on average a little lower than in independent reading, the effect varied between readers and sessions, being frequently reversed.
Of the three possible methods of side-by-side assessment, the only one without contraindictions was that in which all three films for each subject were viewed together, and there were some specific indications for this approach. Viewing only the first and last films led to some loss of information; viewing all three possible pairs was very expensive of time, both in organization and in actual reading, and was not entirely consistent (additive); and disguise of temporal order of the films proved impractical.
It is concluded that the method of choice for assessing progression in the individual from serial films at roughly quinquennial intervals is to view all films together in known temporal order, recording into the most detailed classification available.
PMCID: PMC1009583  PMID: 4412101
7.  Functional similarities of asbestosis and cryptogenic fibrosing alveolitis. 
Thorax  1988;43(9):708-714.
The pathological features in the lung in asbestosis and cryptogenic fibrosing alveolitis are similar. Patients with asbestosis, however, appear to have less severe impairment of transfer factor (TLCO) than those with fibrosing alveolitis for a given level of radiographic abnormality when assessed on the basis of the International Labour Organisation (ILO) profusion score. The impairment of lung function in the two disorders has been compared in more detail in 29 patients with asbestosis and 25 with fibrosing alveolitis, arterial oxygen desaturation during exercise being used to define the severity of the disorders. Arterial oxygen saturation (ear oximeter) and oxygen uptake were measured during incremental exercise on a cycle ergometer. TLCO (single breath technique) and total lung capacity (TLC, plethysmograph) were measured. Chest radiographs were graded for profusion according to the ILO international classification. Patients with asbestosis had significantly higher mean values for TLCO and TLC and lower mean profusion scores than those with fibrosing alveolitis. When stratified for the degree of arterial oxygen desaturation, however, no significant differences were found in TLCO, TLC, or profusion score between the two disorders. To the extent that arterial oxygen desaturation with exercise reflects the morphological severity of the disease, these results suggest that, for a given degree of interstitial lung disease, asbestosis and cryptogenic fibrosing alveolitis are functionally and radiologically similar.
PMCID: PMC461460  PMID: 3194877
8.  Improving Melanoma Classification by Integrating Genetic and Morphologic Features 
PLoS Medicine  2008;5(6):e120.
In melanoma, morphology-based classification systems have not been able to provide relevant information for selecting treatments for patients whose tumors have metastasized. The recent identification of causative genetic alterations has revealed mutations in signaling pathways that offer targets for therapy. Identifying morphologic surrogates that can identify patients whose tumors express such alterations (or functionally equivalent alterations) would be clinically useful for therapy stratification and for retrospective analysis of clinical trial data.
Methodology/Principal Findings
We defined and assessed a panel of histomorphologic measures and correlated them with the mutation status of the oncogenes BRAF and NRAS in a cohort of 302 archival tissues of primary cutaneous melanomas from an academic comprehensive cancer center. Melanomas with BRAF mutations showed distinct morphological features such as increased upward migration and nest formation of intraepidermal melanocytes, thickening of the involved epidermis, and sharper demarcation to the surrounding skin; and they had larger, rounder, and more pigmented tumor cells (all p-values below 0.0001). By contrast, melanomas with NRAS mutations could not be distinguished based on these morphological features. Using simple combinations of features, BRAF mutation status could be predicted with up to 90.8% accuracy in the entire cohort as well as within the categories of the current World Health Organization (WHO) classification. Among the variables routinely recorded in cancer registries, we identified age < 55 y as the single most predictive factor of BRAF mutation in our cohort. Using age < 55 y as a surrogate for BRAF mutation in an independent cohort of 4,785 patients of the Southern German Tumor Registry, we found a significant survival benefit (p < 0.0001) for patients who, based on their age, were predicted to have BRAF mutant melanomas in 69% of the cases. This group also showed a different pattern of metastasis, more frequently involving regional lymph nodes, compared to the patients predicted to have no BRAF mutation and who more frequently displayed satellite, in-transit metastasis, and visceral metastasis (p < 0.0001).
Refined morphological classification of primary melanomas can be used to improve existing melanoma classifications by forming subgroups that are genetically more homogeneous and likely to differ in important clinical variables such as outcome and pattern of metastasis. We expect this information to improve classification and facilitate stratification for therapy as well as retrospective analysis of existing trial data.
Boris Bastian and colleagues present a refined morphological classification of primary melanomas that can be used to improve existing melanoma classifications by defining genetically homogeneous subgroups.
Editors' Summary
Skin cancers—the most commonly diagnosed cancers worldwide—are usually caused by exposure to ultraviolet (UV) radiation in sunlight. UV radiation damages the DNA in skin cells and can introduce permanent genetic changes (mutations) into the skin cells that allow them to divide uncontrollably to form a tumor, a disorganized mass of cells. Because there are many different cell types in the skin, there are many types of skin cancer. The most dangerous type—melanoma—develops when genetic changes occur in melanocytes, the cells that produce the skin pigment melanin. Although only 4% of skin cancers are melanomas, 80% of skin cancer deaths are caused by melanomas. The first signs of a melanoma are often a change in the appearance or size of a mole (a pigmented skin blemish that is also called a nevus) or a newly arising pigmented lesion that looks different from the other moles (an “ugly duckling”). If this early sign is noticed and the melanoma is diagnosed before it has spread from the skin into other parts of the body, surgery can sometimes provide a cure. But, for more advanced melanomas, the outlook is generally poor. Although radiation therapy, chemotherapy, or immunotherapy (drugs that stimulate the immune system to kill the cancer cells) can prolong the life expectancy of some patients, these treatments often fail to remove all of the cancer cells.
Why Was This Study Done?
Now, however, scientists have identified some of the genetic alterations that cause melanoma. For example, they know that many melanomas carry mutations in either the BRAF gene or the NRAS gene, and that the proteins made from these mutated genes (“oncogenes”) help cancer cells to grow uncontrollably. The hope is that targeted drugs designed to block the activity of oncogenic BRAF or NRAS might stop the growth of those melanomas that make these altered proteins. But how can the patients with these specific tumors be identified in the clinic? The expression of altered proteins is likely to affect the microscopic growth patterns (“histomorphology”) of melanomas. However, the current histomorphology-based classification system for melanomas, which distinguishes four main types of melanoma, does not help clinicians choose the best treatment for their patients. In this study, the researchers have tried to improve melanoma classification by looking for correlations between histomorphological features and genetic alterations in a large collection of melanomas.
What Did the Researchers Do and Find?
The researchers examined several histomorphological features in more than 300 melanoma samples and used statistical methods to correlate these features with the mutation status of BRAF and NRAS in the tumors. They found that some individual histomorphological features were strongly associated with the BRAF (but not the NRAS) mutation status of the tumors. For example, melanomas with BRAF mutations had more melanocytes in the upper layers of the epidermis (the outermost layer of the skin) than did those without BRAF mutations (melanocytes usually live at the bottom of the epidermis). Then, by combining several individual histomorphological features, the researchers built a model that correctly predicted the BRAF mutation status of more than 90% of the melanomas. They also found that, among the variables routinely recorded in cancer registries, being younger than 55 years old was the single most predictive factor for BRAF mutations. Finally, in another large group of patients with melanoma, the researchers found that those patients predicted to have a BRAF mutation on the basis of their age survived longer than those patients predicted not to have a BRAF mutation using the same criterion.
What Do These Findings Mean?
These findings suggest that an improved classification of melanomas that combines an analysis of known genetic factors with histomorphological features might divide melanomas into subgroups that are likely to differ in terms of their clinical outcome and responses to targeted therapies when they become available. Additional studies are needed to investigate whether the histomorphological features identified here can be readily assessed in clinical settings and whether different observers will agree on the scoring of these features. The classification model defined by the researchers also needs to be validated and refined in independent groups of patients. Nevertheless, these findings represent an important first step toward helping clinicians improve outcomes for patients with melanoma.
Additional Information.
Please access these Web sites via the online version of this summary at
A related PLoS Medicine Research in Translation article is available
The MedlinePlus encyclopedia provides information for patients about melanoma
The US National Cancer Institute provides information for patients and health professionals about melanoma (in English and Spanish)
Cancer Research UK also provides detailed information about the causes, diagnosis, and treatment of melanoma
PMCID: PMC2408611  PMID: 18532874
9.  Irregularly shaped small shadows on chest radiographs, dust exposure, and lung function in coalworkers' pneumoconiosis. 
The predominant shapes of small opacities on the chest radiographs of 895 British coalminers have been studied. The aims were to determine whether irregular (as distinct from rounded) small opacities can be identified reproducibly, whether their occurrence is related to dust exposure, and whether they are associated with excess prevalence of respiratory symptoms or impairments of lung function. Six of the doctors responsible for regular radiological surveys of all British coalminers each classified all 895 radiographs twice and independently, using the International Labour Organisation's 1980 classification system. The majority view was that 39 films showed predominantly irregular small opacities, 131 showed predominantly small rounded opacities, and 587 showed no small opacities. Readers' opinions varied about the presence and shapes of shadows on the other 138 films. In general, consistency between readers (and within readers on repeated viewings) was satisfactory. The occurrence and profusion of irregular shadows were related significantly both to the men's ages and additionally to their cumulative exposure to respirable coalmine dust as determined from 15 years' dust monitoring close to where the miners had worked. For any given level of exposure, the average level of profusion of the small irregular opacities was less than the corresponding profusion of small rounded opacities. The prevalence rates of chronic cough and phlegm, and of breathlessness, were higher in those with small irregular opacities than in those with no small opacities (category 0/0), but the differences were not statistically significant after adjustment for other factors including smoking habits. The presence of irregular (but not rounded) small shadows was associated with an impairment in respiratory function averaging about 190 ml deficits in both FEV1 and FVC. These deficits were not explicable in terms of the men's ages, body sizes, and smoking habits and they were in addition to the lung function losses attributable to the miners' dust exposure as such. It is concluded that the presence and profusion of small irregular opacities should be taken into consideration when assessing the severity of coalworkers' simple pneumoconiosis.
PMCID: PMC1007944  PMID: 3342187
10.  A study of Spanish sepiolite workers. 
Thorax  1993;48(4):370-374.
BACKGROUND--Sepiolite is an absorbent clay that is used as pet litter. It forms thin crystals, which are a transition between chain and layered silicates. Inhalation studies in animals have shown no evidence of pulmonary damage. This paper reports a cross sectional study of the total work force of the largest sepiolite production plant in the world. METHODS--Two hundred and eighteen workers (210 men and eight women) were studied. Height, age, and smoking history were recorded. Chest radiographs were read according to the International Labour Office (ILO) classification by two readers. Readings were used to construct a numerical score, which was then used in statistical analyses. Forced expiratory volume in one second (FEV1) and forced vital capacity (FVC) were divided by the square of the height. Casella size selective personal samplers were used in randomly selected operatives to collect dust eight years before the rest of the study was carried out. These samples were evaluated gravimetrically. Total dust samples were examined by optical and electron microscopes. Results were analysed by bivariate linear regression, chi 2 tests, and analysis of variance. RESULTS--When allowance was made for smoking habit workers exposed to dry dust showed a significantly greater decline in FEV1 with age than workers with little exposure to dry dust. A similar pattern applied to FVC. Radiographic score showed deterioration with age but no clear differences from other variables. High concentrations of dust were found in the bagging department and also in the classifier shed. CONCLUSIONS--The major finding was that lung function deteriorated more rapidly in those who had had more exposure to dust, but there was no evidence of any accompanying radiographic change.
PMCID: PMC464435  PMID: 8511734
11.  Mortality from lung cancer among Sardinian patients with silicosis. 
The mortality of 724 subjects with silicosis, first diagnosed in 1964-70 in the Sardinia region of Italy, was followed up through to 31 December 1987. Smoking, occupational history, chest x ray films, and data on lung function were available from clinical records for each member of the cohort. The overall cohort accounted for 10,956.5 person-years. The standardised mortality ratios (SMRs) for selected causes of death (International Classification of Diseases (ICD) eighth revision) were based on the age specific regional death rates for each calendar year. An excess of deaths for all causes (SMR = 1.40) was found, mainly due to chronic obstructive lung disease, silicosis, and tuberculosis with an upward trend of the SMR with increasing severity of the International Labour Office (ILO) radiological categories. Twenty two subjects died from lung cancer (SMR = 1.29, 95% confidence interval (95% CI) = 0.8-2.0). The risk increased after a 10 and 15 year latency but the SMR never reached statistical significance. No correlation was found between lung cancer and severity of the radiological category, the type of silica (coal or metalliferous mines, quarries etc), or the degree of exposure to silica dust. A significant excess of deaths from lung cancer was found among heavy smokers (SMR = 4.11) and subjects with airflow obstruction (SMR = 2.83). A nested case-control study was planned to investigate whether the association between lung cancer and airway obstruction was due to confounding by smoking. No association was found with the ILO categories of silicosis or the estimated cumulative exposure to silica. The risk estimate for lung cancer by airflow obstruction after adjusting by cigarette consumption was 2.86 for a mild impairment and 7.23 for a severe obstruction. The results do not show any clear association between exposure to silica, severity of silicosis, and mortality from lung cancer. Other environmental or individual factors may act as confounders in the association between silicosis and lung cancer. Among them, attention should be given to chronic airways obstruction as an independent risk factor for lung cancer in patients with silicosis.
PMCID: PMC1035332  PMID: 1998606
12.  An elaboration of the I.L.O. classification of simple pneumoconiosis 
Liddell, F. D. K., and Lindars, D. C. (1969).Brit. J. industr. Med.,26, 89-100. An elaboration of the I.L.O. classification of simple pneumoconiosis. Simple pneumoconiosis in chest radiographs presents a continuum of increasing abnormality. Liddell (1963) introduced a 12-point scale obtained by dividing each of the four I.L.O. categories (International Labour Office, 1959) into one central and two marginal zones. In this system, which has come to be known as the N.C.B. elaboration, readers record for each radiograph the I.L.O. category of choice (0, 1, 2 or 3), followed by an adjacent I.L.O. category if that had been seriously considered; otherwise, the same category is repeated. Very clear normals are denoted as 0/-, and `high' category 3 films as 3/4.
This paper reviews the evidence from seven reading trials in which 12 National Coal Board (N.C.B.) film readers have taken part. About 28,000 assessments on a total of well over 2,000 single radiographs have been analysed. (The reading of serial radiographs to assess progression is dealt with elsewhere.)
All readers used the elaboration successfully, but they differed in the extents to which they placed films in central and in marginal zones; they were more consistent when preliminary briefing had been given. Film quality had little influence on the use of the zones, except that 0/- tended to be reserved for films of good quality.
Despite the variation in the use of the zones, marked improvements accrued from the use of the elaboration in both intra- and inter-observer error for all readers, and for films of poor quality as well as for good films. The validity of expressing simple pneumoconiosis prevalence rates in terms of I.L.O. categories derived from N.C.B. elaboration readings was confirmed. Although the exact widths of the zones along the continuum remain to be determined, all the evidence suggests that they represent steadily increasing abnormality.
Thus, the N.C.B. elaboration is a practical procedure which amplifies, but neither distorts nor supplants, the I.L.O. classification. It is reported to be easier to use.
PMCID: PMC1008901  PMID: 5780111
13.  Main Report 
Genetics in Medicine  2006;8(Suppl 1):12S-252S.
States vary widely in their use of newborn screening tests, with some mandating screening for as few as three conditions and others mandating as many as 43 conditions, including varying numbers of the 40+ conditions that can be detected by tandem mass spectrometry (MS/MS). There has been no national guidance on the best candidate conditions for newborn screening since the National Academy of Sciences report of 19751 and the United States Congress Office of Technology Assessment report of 1988,2 despite rapid developments since then in genetics, in screening technologies, and in some treatments.
In 2002, the Maternal and Child Health Bureau (MCHB) of the Health Resources and Services Administration (HRSA) of the United States Department of Health and Human Services (DHHS) commissioned the American College of Medical Genetics (ACMG) to: Conduct an analysis of the scientific literature on the effectiveness of newborn screening.Gather expert opinion to delineate the best evidence for screening for specified conditions and develop recommendations focused on newborn screening, including but not limited to the development of a uniform condition panel.Consider other components of the newborn screening system that are critical to achieving the expected outcomes in those screened.
A group of experts in various areas of subspecialty medicine and primary care, health policy, law, public health, and consumers worked with a steering committee and several expert work groups, using a two-tiered approach to assess and rank conditions. A first step was developing a set of principles to guide the analysis. This was followed by developing criteria by which conditions could be evaluated, and then identifying the conditions to be evaluated. A large and broadly representative group of experts was asked to provide their opinions on the extent to which particular conditions met the selected criteria, relying on supporting evidence and references from the scientific literature. The criteria were distributed among three main categories for each condition: The availability and characteristics of the screening test;The availability and complexity of diagnostic services; andThe availability and efficacy of treatments related to the conditions. A survey process utilizing a data collection instrument was used to gather expert opinion on the conditions in the first tier of the assessment. The data collection format and survey provided the opportunity to quantify expert opinion and to obtain the views of a diverse set of interest groups (necessary due to the subjective nature of some of the criteria). Statistical analysis of data produced a score for each condition, which determined its ranking and initial placement in one of three categories (high scoring, moderately scoring, or low scoring/absence of a newborn screening test). In the second tier of these analyses, the evidence base related to each condition was assessed in depth (e.g., via systematic reviews of reference lists including MedLine, PubMed and others; books; Internet searches; professional guidelines; clinical evidence; and cost/economic evidence and modeling). The fact sheets reflecting these analyses were evaluated by at least two acknowledged experts for each condition. These experts assessed the data and the associated references related to each criterion and provided corrections where appropriate, assigned a value to the level of evidence and the quality of the studies that established the evidence base, and determined whether there were significant variances from the survey data. Survey results were subsequently realigned with the evidence obtained from the scientific literature during the second-tier analysis for all objective criteria, based on input from at least three acknowledged experts in each condition. The information from these two tiers of assessment was then considered with regard to the overriding principles and other technology or condition-specific recommendations. On the basis of this information, conditions were assigned to one of three categories as described above:Core Panel;Secondary Targets (conditions that are part of the differential diagnosis of a core panel condition.); andNot Appropriate for Newborn Screening (either no newborn screening test is available or there is poor performance with regard to multiple other evaluation criteria).
ACMG also considered features of optimal newborn screening programs beyond the tests themselves by assessing the degree to which programs met certain goals (e.g., availability of educational programs, proportions of newborns screened and followed up). Assessments were based on the input of experts serving in various capacities in newborn screening programs and on 2002 data provided by the programs of the National Newborn Screening and Genetics Resource Center (NNSGRC). In addition, a brief cost-effectiveness assessment of newborn screening was conducted.
Uniform panel
A total of 292 individuals determined to be generally representative of the regional distribution of the United States population and of areas of expertise or involvement in newborn screening provided a total of 3,949 evaluations of 84 conditions. For each condition, the responses of at least three experts in that condition were compared with those of all respondents for that condition and found to be consistent. A score of 1,200 on the data collection instrument provided a logical separation point between high scoring conditions (1,200–1,799 of a possible 2,100) and low scoring (<1,000) conditions. A group of conditions with intermediate scores (1,000–1,199) was identified, all of which were part of the differential diagnosis of a high scoring condition or apparent in the result of the multiplex assay. Some are identified by screening laboratories and others by diagnostic laboratories. This group was designated as a “secondary target” category for which the program must report the diagnostic result.
Using the validated evidence base and expert opinion, each condition that had previously been assigned to a category based on scores gathered through the data collection instrument was reconsidered. Again, the factors taken into consideration were: 1) available scientific evidence; 2) availability of a screening test; 3) presence of an efficacious treatment; 4) adequate understanding of the natural history of the condition; and 5) whether the condition was either part of the differential diagnosis of another condition or whether the screening test results related to a clinically significant condition.
The conditions were then assigned to one of three categories as previously described (core panel, secondary targets, or not appropriate for Newborn Screening).
Among the 29 conditions assigned to the core panel are three hemoglobinopathies associated with a Hb/S allele, six amino acidurias, five disorders of fatty oxidation, nine organic acidurias, and six unrelated conditions (congenital hypothyroidism (CH), biotinidase deficiency (BIOT), congenital adrenal hyperplasia (CAH), classical galactosemia (GALT), hearing loss (HEAR) and cystic fibrosis (CF)). Twenty-three of the 29 conditions in the core panel are identified with multiplex technologies such as tandem mass spectrometry (MS/MS) or high pressure liquid chromatography (HPLC). On the basis of the evidence, six of the 35 conditions initially placed in the core panel were moved into the secondary target category, which expanded to 25 conditions. Test results not associated with potential disease in the infant (e.g., carriers) were also placed in the secondary target category. When newborn screening laboratory results definitively establish carrier status, the result should be made available to the health care professional community and families. Twenty-seven conditions were determined to be inappropriate for newborn screening at this time.
Conditions with limited evidence reported in the scientific literature were more difficult to evaluate, quantify and place in one of the three categories. In addition, many conditions were found to occur in multiple forms distinguished by age-of-onset, severity, or other features. Further, unless a condition was already included in newborn screening programs, there was a potential for bias in the information related to some criteria. In such circumstances, the quality of the studies underlying the data such as expert opinion that considered case reports and reasoning from first principles determined the placement of the conditions into particular categories.
Newborn screening program optimization
– Assessment of the activities of newborn screening programs, based on program reports, was done for the six program components: education; screening; follow-up; diagnostic confirmation; management; and program evaluation. Considerable variation was found between programs with regard to whether particular aspects (e.g., prenatal education program availability, tracking of specimen collection and delivery) were included and the degree to which they are provided. Newborn screening program evaluation systems also were assessed in order to determine their adequacy and uniformity with the goal being to improve interprogram evaluation and comparison to ensure that the expected outcomes from having been identified in screening are realized.
The state of the published evidence in the fast-moving worlds of newborn screening and medical genetics has not kept up with the implementation of new technologies, thus requiring the considerable use of expert opinion to develop recommendations about a core panel of conditions for newborn screening. Twenty-nine conditions were identified as primary targets for screening from which all components of the newborn screening system should be maximized. An additional 25 conditions were listed that could be identified in the course of screening for core panel conditions. Programs are obligated to establish a diagnosis and communicate the result to the health care provider and family. It is recognized that screening may not have been maximized for the detection of these secondary conditions but that some proportion of such cases may be found among those screened for core panel conditions. With additional screening, greater training of primary care health care professionals and subspecialists will be needed, as will the development of an infrastructure for appropriate follow-up and management throughout the lives of children who have been identified as having one of these rare conditions. Recommended actions to overcome barriers to an optimal newborn screening system include: The establishment of a national role in the scientific evaluation of conditions and the technologies by which they are screened;Standardization of case definitions and reporting procedures;Enhanced oversight of hospital-based screening activities;Long-term data collection and surveillance; andConsideration of the financial needs of programs to allow them to deliver the appropriate services to the screened population.
PMCID: PMC3109899
14.  UK Naval Dockyards Asbestosis Study: radiological methods in the surveillance of workers exposed to asbestos 
ABSTRACT In a survey of the effects of exposure to asbestos in the UK Naval Dockyards, small- and large-film chest radiographs of 674 men have been examined. These films have been read under survey conditions by two readers using a simple screening classification, and also in a controlled trial by five readers using the full ILO U/C classification. Comparison between the reading methods showed a deficiency, independent of the size of film, of at least 30% in the detection of asbestos-related radiographic abnormalities when the screening classification was used. For adequate diagnostic sensitivity the ILO U/C classification appears to be essential. There was a deficiency of 43% in significant abnormalities observed by a majority of readers in the small films when directly compared with large film readings. This deficiency could be reduced to 7% by using readings of the small films at any level of abnormality by any of the five readers. When the ILO U/C readings were related to the clinical diagnoses, the only abnormality missed was a small pleural plaque. Films with previously agreed coding were inserted at intervals during the reading trial and helped to maintain the consistency of reading. Right oblique views were taken for 1884 men, in addition to the full-sized postero-anterior view, but the contribution provided by this view proved insufficient to justify its use in large surveys. The cost of a survey when small films are used as a screening method is reduced to between one-third and one-half of the cost when large films are used, assuming that the abnormality rate is not more than 5%. However, this cost advantage for small films is likely to be overtaken by the development of automatic large-film units. The radiation dose when small films are used is increased by a factor of about 20, but is within the prescribed safety level. It is concluded that at least three readers should be involved, using the full ILO U/C classification. Small films may be of particular use in a large-scale survey, in which the abnormality rate is expected to be low, and which might otherwise be too expensive. A sensitive reading method and a high standard of film quality are essential factors in the use of this technique.
PMCID: PMC1008405  PMID: 698132
15.  An elaboration of small opacity types (p, m, and n) in simple pneumoconiosis 
Lindars, D. C. (1971).Brit. J. industr. Med.,28, 131-142. An elaboration of small opacity types (p, m, and n) in simple pneumoconiosis. According to the I.L.O. classification (International Labour Office, 1959), radiographs showing pneumoconiosis may be classified as p, m or n according to the greatest diameter of predominant (small) opacities. Recent work has revealed pathological and physiological differences associated with these appearances.
In response to a request for some refinement of the classification for correlation with pathological data a `pmn elaboration' has been devised, analogous to the N.C.B. elaboration of the I.L.O. classification (Liddell and Lindars, 1969). It has the following form: o/-, o/o, o/p; p/o, p/p, p/m; m/p, m/m, m/n; n/m, n/n, n/A. Instructions to readers are similar to those for use with the N.C.B. elaboration. The significance of o/- and o/o is identical in the two elaborations; n/A indicates predominance of opacity size close to that of large opacities in the I.L.O. classification.
Two hundred and forty-seven radiographs have been read, on two occasions by four film readers, using both elaborations. Analysis of the results showed that the readers had a slightly greater observer error, in terms of variance, when using the pmn elaboration than with the N.C.B. elaboration, but reading bias was less. Calculation of information transmitted showed a gain in information in the pmn elaboration over conventional p, m, n typing, comparable to the gain in information achieved by the N.C.B. elaboration.
Marginal zones were not used as frequently as in the N.C.B. elaboration. The radiograph series contained too few normal or near-normal radiographs for the lower end of the scale to be adequately studied. Improved results can be expected with increased experience and more careful framing of reading instructions for prior briefing.
It is recommended that the pmn elaboration should be used whenever typing is required for correlation between radiographic appearance and pathological or physiological data.
PMCID: PMC1009257  PMID: 5572681
16.  Moving from Data on Deaths to Public Health Policy in Agincourt, South Africa: Approaches to Analysing and Understanding Verbal Autopsy Findings 
PLoS Medicine  2010;7(8):e1000325.
Peter Byass and colleagues compared two methods of assessing data from verbal autopsies, review by physicians or probabilistic modeling, and show that probabilistic modeling is the most efficient means of analyzing these data
Cause of death data are an essential source for public health planning, but their availability and quality are lacking in many parts of the world. Interviewing family and friends after a death has occurred (a procedure known as verbal autopsy) provides a source of data where deaths otherwise go unregistered; but sound methods for interpreting and analysing the ensuing data are essential. Two main approaches are commonly used: either physicians review individual interview material to arrive at probable cause of death, or probabilistic models process the data into likely cause(s). Here we compare and contrast these approaches as applied to a series of 6,153 deaths which occurred in a rural South African population from 1992 to 2005. We do not attempt to validate either approach in absolute terms.
Methods and Findings
The InterVA probabilistic model was applied to a series of 6,153 deaths which had previously been reviewed by physicians. Physicians used a total of 250 cause-of-death codes, many of which occurred very rarely, while the model used 33. Cause-specific mortality fractions, overall and for population subgroups, were derived from the model's output, and the physician causes coded into comparable categories. The ten highest-ranking causes accounted for 83% and 88% of all deaths by physician interpretation and probabilistic modelling respectively, and eight of the highest ten causes were common to both approaches. Top-ranking causes of death were classified by population subgroup and period, as done previously for the physician-interpreted material. Uncertainty around the cause(s) of individual deaths was recognised as an important concept that should be reflected in overall analyses. One notably discrepant group involved pulmonary tuberculosis as a cause of death in adults aged over 65, and these cases are discussed in more detail, but the group only accounted for 3.5% of overall deaths.
There were no differences between physician interpretation and probabilistic modelling that might have led to substantially different public health policy conclusions at the population level. Physician interpretation was more nuanced than the model, for example in identifying cancers at particular sites, but did not capture the uncertainty associated with individual cases. Probabilistic modelling was substantially cheaper and faster, and completely internally consistent. Both approaches characterised the rise of HIV-related mortality in this population during the period observed, and reached similar findings on other major causes of mortality. For many purposes probabilistic modelling appears to be the best available means of moving from data on deaths to public health actions.
Please see later in the article for the Editors' Summary
Editors' Summary
Whenever someone dies in a developed country, the cause of death is determined by a doctor and entered into a “vital registration system,” a record of all the births and deaths in that country. Public-health officials and medical professionals use this detailed and complete information about causes of death to develop public-health programs and to monitor how these programs affect the nation's health. Unfortunately, in many developing countries dying people are not attended by doctors and vital registration systems are incomplete. In most African countries, for example, less than one-quarter of deaths are recorded in vital registration systems. One increasingly important way to improve knowledge about the patterns of death in developing countries is “verbal autopsy” (VA). Using a standard form, trained personnel ask relatives and caregivers about the symptoms that the deceased had before his/her death and about the circumstances surrounding the death. Physicians then review these forms and assign a specific cause of death from a shortened version of the International Classification of Diseases, a list of codes for hundreds of diseases.
Why Was This Study Done?
Physician review of VA forms is time-consuming and expensive. Consequently, computer-based, “probabilistic” models have been developed that process the VA data and provide a likely cause of death. These models are faster and cheaper than physician review of VAs and, because they do not rely on the views of local doctors about the likely causes of death, they are more internally consistent. But are physician review and probabilistic models equally sound ways of interpreting VA data? In this study, the researchers compare and contrast the interpretation of VA data by physician review and by a probabilistic model called the InterVA model by applying these two approaches to the deaths that occurred in Agincourt, a rural region of northeast South Africa, between 1992 and 2005. The Agincourt health and sociodemographic surveillance system is a member of the INDEPTH Network, a global network that is evaluating the health and demographic characteristics (for example, age, gender, and education) of populations in low- and middle-income countries over several years.
What Did the Researchers Do and Find?
The researchers applied the InterVA probabilistic model to 6,153 deaths that had been previously reviewed by physicians. They grouped the 250 cause-of-death codes used by the physicians into categories comparable with the 33 cause-of-death codes used by the InterVA model and derived cause-specific mortality fractions (the proportions of the population dying from specific causes) for the whole population and for subgroups (for example, deaths in different age groups and deaths occurring over specific periods of time) from the output of both approaches. The ten highest-ranking causes of death accounted for 83% and 88% of all deaths by physician interpretation and by probabilistic modelling, respectively. Eight of the most frequent causes of death—HIV, tuberculosis, chronic heart conditions, diarrhea, pneumonia/sepsis, transport-related accidents, homicides, and indeterminate—were common to both interpretation methods. Both methods coded about a third of all deaths as indeterminate, often because of incomplete VA data. Generally, there was close agreement between the methods for the five principal causes of death for each age group and for each period of time, although one notable discrepancy was pulmonary (lung) tuberculosis, which accounted for 6.4% and 21.3% of deaths in this age group, respectively, according to the physicians and to the model. However, these deaths accounted for only 3.5% of all the deaths.
What Do These Findings Mean?
These findings reveal no differences between the cause-specific mortality fractions determined from VA data by physician interpretation and by probabilistic modelling that might have led to substantially different public-health policy programmes being initiated in this population. Importantly, both approaches clearly chart the rise of HIV-related mortality in this South African population between 1992 and 2005 and reach similar findings on other major causes of mortality. The researchers note that, although preparing the amount of VA data considered here for entry into the probabilistic model took several days, the model itself runs very quickly and always gives consistent answers. Given these findings, the researchers conclude that in many settings probabilistic modeling represents the best means of moving from VA data to public-health actions.
Additional Information
Please access these Web sites via the online version of this summary at
The importance of accurate data on death is further discussed in a perspective previously published in PLoS Medicine Perspective by Colin Mathers and Ties Boerma
The World Health Organization (WHO) provides information on the vital registration of deaths and on the International Classification of Diseases; the WHO Health Metrics Network is a global collaboration focused on improving sources of vital statistics; and the WHO Global Health Observatory brings together core health statistics for WHO member states
The INDEPTH Network is a global collaboration that is collecting health statistics from developing countries; it provides more information about the Agincourt health and socio-demographic surveillance system and access to standard VA forms
Information on the Agincourt health and sociodemographic surveillance system is available on the University of Witwatersrand Web site
The InterVA Web site provides resources for interpreting verbal autopsy data and the Umeå Centre for Global Health Reseach, where the InterVA model was developed, is found at
A recent PLoS Medicine Essay by Peter Byass, lead author of this study, discusses The Unequal World of Health Data
PMCID: PMC2923087  PMID: 20808956
17.  Crystalline silica exposure, radiological silicosis, and lung cancer mortality in diatomaceous earth industry workers 
Thorax  1999;54(1):56-59.
BACKGROUND—The role of silicosis as either a necessary or incidental condition in silica associated lung cancer remains unresolved. To address this issue a cohort analysis of dose-response relations for crystalline silica and lung cancer mortality was conducted among diatomaceous earth workers classified according to the presence or absence of radiological silicosis.
METHODS—Radiological silicosis was determined by median 1980 International Labour Organisation system readings of a panel of three "B" readers for 1809 of 2342 white male workers in a diatomaceous earth facility in California. Standardised mortality ratios (SMR) for lung cancer, based on United States rates for 1942-94, were calculated separately for workers with and without radiological silicosis according to cumulative exposures to respirable crystalline silica (milligrams per cubic meter × years; mg/m3-years) lagged 15years.
RESULTS—Eighty one cases of silicosis were identified, including 77 with small opacities of ⩾1/0 and four with large opacities. A slightly larger excess of lung cancer was found among the subjects with silicosis (SMR 1.57, 95% confidence interval (CI) 0.43 to 4.03) than in workers without silicosis (SMR 1.19, 95% CI 0.87to 1.57). An association between silica exposure and lung cancer risk was detected among those without silicosis; a statistically significant (p = 0.02) increasing trend of lung cancer risk was seen with cumulative exposure, with SMR reaching 2.40 (95% CI 1.24 to 4.20) at the highest exposure level (⩾5.0 mg/m3-years). A similar statistically significant (p = 0.02) dose-response gradient was observed among non-silicotic subjects when follow up was truncated at 15 years after the final negative radiograph (SMR 2.96, 95% CI 1.19 to 6.08 at ⩾5.0 mg/m3-years), indicating that the association among non-silicotic subjects was unlikely to be accounted for by undetected radiological silicosis.
CONCLUSIONS—The dose-response relation observed between cumulative exposure to respirable crystalline silica and lung cancer mortality among workers without radiological silicosis suggests that silicosis is not a necessary co-condition for silica related lung carcinogenesis. However, the relatively small number of silicosis cases in the cohort and the absence of radiographic data after employment limit interpretations.

PMCID: PMC1745344  PMID: 10343633
18.  Methods of Blinding in Reports of Randomized Controlled Trials Assessing Pharmacologic Treatments: A Systematic Review 
PLoS Medicine  2006;3(10):e425.
Blinding is a cornerstone of therapeutic evaluation because lack of blinding can bias treatment effect estimates. An inventory of the blinding methods would help trialists conduct high-quality clinical trials and readers appraise the quality of results of published trials. We aimed to systematically classify and describe methods to establish and maintain blinding of patients and health care providers and methods to obtain blinding of outcome assessors in randomized controlled trials of pharmacologic treatments.
Methods and Findings
We undertook a systematic review of all reports of randomized controlled trials assessing pharmacologic treatments with blinding published in 2004 in high impact-factor journals from Medline and the Cochrane Methodology Register. We used a standardized data collection form to extract data. The blinding methods were classified according to whether they primarily (1) established blinding of patients or health care providers, (2) maintained the blinding of patients or health care providers, and (3) obtained blinding of assessors of the main outcomes. We identified 819 articles, with 472 (58%) describing the method of blinding. Methods to establish blinding of patients and/or health care providers concerned mainly treatments provided in identical form, specific methods to mask some characteristics of the treatments (e.g., added flavor or opaque coverage), or use of double dummy procedures or simulation of an injection. Methods to avoid unblinding of patients and/or health care providers involved use of active placebo, centralized assessment of side effects, patients informed only in part about the potential side effects of each treatment, centralized adapted dosage, or provision of sham results of complementary investigations. The methods reported for blinding outcome assessors mainly relied on a centralized assessment of complementary investigations, clinical examination (i.e., use of video, audiotape, or photography), or adjudication of clinical events.
This review classifies blinding methods and provides a detailed description of methods that could help trialists overcome some barriers to blinding in clinical trials and readers interpret the quality of pharmalogic trials.
Following a systematic review of all reports of randomized controlled trials assessing pharmacologic treatments involving blinding, a classification of blinding methods is proposed.
Editors' Summary
In evidence-based medicine, good-quality randomized controlled trials are generally considered to be the most reliable source of information about the effects of different treatments, such as drugs. In a randomized trial, patients are assigned to receive one treatment or another by the play of chance. This technique helps makes sure that the two groups of patients receiving the different treatments are equivalent at the start of the trial. Proper randomization also prevents doctors from controlling or affecting which treatment patients get, which could distort the results. An additional tool that is also used to make trials more precise is “blinding.” Blinding involves taking steps to prevent patients, doctors, or other people involved in the trial (e.g., those people recording measurements) from finding out which patients got what treatment. Properly done, blinding should make sure the results of a trial are more accurate. This is because in an unblinded study, participants may respond better if they know they have received a promising new treatment (or worse if they only got placebo or an old drug); doctors may “want” a particular treatment to do better in the trial, and unthinking bias could creep into their measurements or actions; the same applies for practitioners and researchers who record patients' outcomes in the trial. However, blinding is not a simple, single step; the people carrying out the trial often have to set up a variety of different procedures that depend on the type of trial that is being done.
Why Was This Study Done?
The researchers here wanted to thoroughly examine different methods that have been used to achieve blinding in randomized trials of drug treatments, and to describe and classify them. They hoped that a better understanding of the different blinding methods would help people doing trials to design better trials in the future, and also help readers to interpret the quality of trials that had been done.
What Did the Researchers Do and Find?
This group of researchers conducted what is called a “systematic review.” They systematically searched the published medical literature to find all randomized, blinded drug trials published in 2004 in a number of different “high-impact” journals (journals whose articles are often mentioned in other articles). Then, the researchers classified information from the published trial reports. The researchers ended up with 819 trial reports, and nearly 60% of them described how blinding was done. Their classification of blinding was divided up into three main areas. First, they detailed methods used to hide which drugs are given to particular patients, such as preparing identically appearing treatments; using strong flavors to mask taste; matching the colors of pills; using saline injections and so on. Second, they described a number of methods that could be used to reduce the risk of unblinding (of doctors or patients), such as using an “active placebo” (a sugar pill that mimics some of the expected side effects of the drug treatment). Finally, they defined methods for blinded measurement of outcomes (such as using a central committee to collect data).
What Do These Findings Mean?
The researchers' classification will help people to work out how different techniques can be used to achieve, and keep, blinding in a trial. This will assist others to understand whether any particular trial was likely to have been blinded properly, and therefore work out whether the results are reliable. The researchers also suggest that, generally, blinding methods are not described in enough detail in published scientific papers, and recommend that guidelines for describing results of randomized trials be improved.
Additional Information.
Please access these Web sites via the online version of this summary at
James Lind Library has been created to help patients and researchers understand fair tests of treatments in health care by illustrating how fair tests have developed over the centuries, a trial registry created by the US National Institutes of Health, has an introduction to understanding clinical trials
National Electronic Library for Health introduction to controlled clinical trials
PMCID: PMC1626553  PMID: 17076559
19.  Influence of tumor necrosis factor α in rheumatoid arthritis  
Rheumatoid arthritis (RA) is the most prevalent inflammatory rheumatic disorder. It is a chronic and incurable disease that leads to painful inflammation, often irreversible joint damage, and eventually to functional loss.
Conventional treatment is based on unspecific immunosuppressive agents, e.g. Methotrexate, Azathioprin or Gold. However, the longterm outcomes of these approaches have been poor with frequently ongoing inflammatory disease activity, functional decline, and temporary or permanent work disability. More recently, antagonists of the human cytokine Tumor Necrosis Factor α (TNF-α) have been introduced that are potent suppressors of inflammatory processes. Infliximab is a chimeric antibody against TNF-α. Etanercept is a soluble human TNF-α receptor.
The report assesses the efficacy of TNF-α-antagonists to down-regulate inflammation, improve functional status and prevent joint damage in RA with particular regard to the following indications: Treatment of severe, refractory and ongoing disease activity despite adequate use of conventional antirheumatic agents; and treatment of early RA before conventional treatment failure has been demonstrated.
A systematic review of the literature is been performed using established electronic databases. The literature search is supplemented by a hand search of journals and publications relevant to RA, reviews of websites of national and international rheumatologic expert societies, as well as contacts to manufacturers. A priori defined inclusion and exclusion criteria are used for literature selection. Analysis and evaluation of included publications are based on standardised criteria sets and checklists of the German Scientific Working Group for Technology Assessment in Health Care.
Health Technology Assessment reports and metaanalyses cannot be identified. A total of 12 clinical trials are analysed, as well as national and international expert recommendations and practice guidelines. Numerous non-systematic reviews are found and analysed for additional sources of information that is not identified through the systematic search. Case reports and safety assessements are considered as well. A total of 137 publications is included.
The primary outcome measures in clinical trials are suppression of inflammatory disease activity and slowing of structural joint damage. Clinical response is usually measured by standardised response criteria that allow a semi-quantitative classification of improvement from baseline by 20%, 50%, or 70%.
In patients with RA refractory to conventional treatment, TNF-α-antagonists are unequivocally superior to Methotrexate with regard to disease activity, functional status and prevention of structural damage. In patients with early RA, TNF-α-antagonists show a more rapid onset of anti-inflammatory effects than Methotrexate. However, differences in clinical response rates and radiologic progression disappear after a few months of treatment and are no longer statistically significant. Serious adverse events are rare in clinical trials and do not occur significantly more often than in the control groups. However, case reports and surveillance registries show an increased risk for serious infectious complications, particularly tuberculosis. Expert panels recommend the use of TNF-α-antagonists in patients with active refractory RA after failure of conventional treatment. Studies that compare Infliximab and Etanercept are lacking.
There are no pharmacoeconomic studies although decision analytic models of TNF-α-antagonists for the treatment of RA exist. Based on the results of the models, a combination therapy with Hydroxychloroquin (HCQ), Sulfaslazin (SASP) and Methotrexate as well as Etanercept/Methotrexate can be considered a cost-effective treatment for Methotrexate-resistant RA.
TNF-α-antagonists are clearly effective in RA patients with no or incomplete response to Methotrexate and superior to continuous use of Methotrexate. It refers to both, reduction of inflammatory disease activity including pain relief and improved functional status, and prevention of structural joint damage. Therefore, TNF-α-antagonism is an important new approach in the treatment of RA. There is still insufficient evidence that early use of TNF-α-antagonists in RA prior to standard agents is beneficial and further studies have to be awaited.
An analytic model suggests that TNF-α-antagonists are, due to their clinical effectiveness in patients with no or incomplete response to Methotrexate, a cost-effective alternative to common therapies chosen in the subpopulations of patients. Nevertheless, it has to be borne in mind that the acquisition costs of TNF-α-antagonists lead to high incremental costs and C/E ratios, which exceed the common frame of assessing the cost-effectiveness of medical methods and technologies. Hence, society's willingness-to-pay is the critical determinant in the question whether TNF-α-antagonists shall be reimbursed or not, or to define criteria for reimbursement. Changes in the quality of life attributable to the use of TNF-α-antagonists in RA have not yet been assessed which might assist the decision making.
With respect of the questions mentioned above and the potential financial effect of a systematic use of TNF-α-antagonists in the treatment of RA, we come to the conclusion that TNF-α-antagonists should not introduced as a standard benefit reimbursed by the statutory health insurers in Germany.
PMCID: PMC3011313  PMID: 21289933
health economics; tumor necrosis factor; TNF-alpha; treatment; rheumatoid arthritis; cost-effectiveness
20.  Universal Definition of Loss to Follow-Up in HIV Treatment Programs: A Statistical Analysis of 111 Facilities in Africa, Asia, and Latin America 
PLoS Medicine  2011;8(10):e1001111.
Based on a statistical analysis of 111 facilities in Africa, Asia, and Latin America, Benjamin Chi and colleagues develop a standard loss-to-follow-up (LTFU) definition that can be used by HIV antiretroviral programs worldwide.
Although patient attrition is recognized as a threat to the long-term success of antiretroviral therapy programs worldwide, there is no universal definition for classifying patients as lost to follow-up (LTFU). We analyzed data from health facilities across Africa, Asia, and Latin America to empirically determine a standard LTFU definition.
Methods and Findings
At a set “status classification” date, patients were categorized as either “active” or “LTFU” according to different intervals from time of last clinic encounter. For each threshold, we looked forward 365 d to assess the performance and accuracy of this initial classification. The best-performing definition for LTFU had the lowest proportion of patients misclassified as active or LTFU. Observational data from 111 health facilities—representing 180,718 patients from 19 countries—were included in this study. In the primary analysis, for which data from all facilities were pooled, an interval of 180 d (95% confidence interval [CI]: 173–181 d) since last patient encounter resulted in the fewest misclassifications (7.7%, 95% CI: 7.6%–7.8%). A secondary analysis that gave equal weight to cohorts and to regions generated a similar result (175 d); however, an alternate approach that used inverse weighting for cohorts based on variance and equal weighting for regions produced a slightly lower summary measure (150 d). When examined at the facility level, the best-performing definition varied from 58 to 383 d (mean = 150 d), but when a standard definition of 180 d was applied to each facility, only slight increases in misclassification (mean = 1.2%, 95% CI: 1.0%–1.5%) were observed. Using this definition, the proportion of patients classified as LTFU by facility ranged from 3.1% to 45.1% (mean = 19.9%, 95% CI: 19.1%–21.7%).
Based on this evaluation, we recommend the adoption of ≥180 d since the last clinic visit as a standard LTFU definition. Such standardization is an important step to understanding the reasons that underlie patient attrition and establishing more reliable and comparable program evaluation worldwide.
Please see later in the article for the Editors' Summary
Editors' Summary
Since 1981, AIDS has killed more than 25 million people, and about 33 million people (mostly in low- and middle-income countries) are now infected with HIV, the virus that causes AIDS. Because HIV destroys immune system cells, HIV-positive individuals are very susceptible to other infections, and, early in the AIDS epidemic, most HIV-infected people died within ten years of contracting the virus. Then, in 1996, antiretroviral therapy (ART)—a cocktail of drugs that keeps HIV in check—became available. For people living in developed countries, HIV infection became a chronic condition. However, for people living in developing countries, ART was prohibitively expensive, and HIV/AIDS remained a fatal illness. In 2003, this situation was declared a global emergency, and governments, international agencies, and funding bodies began to implement plans to increase ART coverage in resource-limited countries. By the end of 2009, more than a third of people living in these countries who needed ART were receiving it.
Why Was This Study Done?
Because ART does not cure HIV infection, patients have to take antiretroviral drugs regularly for the rest of their lives. But in some ART programs, more than a third of patients are lost to follow-up (LTFU), that is, they stop coming for treatment, within three years of starting treatment. Patient attrition threatens the success of ART programs, but to understand why it occurs, a standardized method for classifying patients as LTFU is essential. Classification of patients as LTFU relies on an interval-based definition of LTFU. That is, a patient who fails to attend a clinic within a specified interval after a previous visit is classified as LTFU. If this interval is too short, although many patients will be accurately identified as LTFU, there will be a high false-positive rate—some patients classified as LTFU will actually return to the clinic later. Conversely, if the interval is too long, some patients who are truly LTFU will be misclassified as active (a false-negative classification). In this study, the researchers analyzed data from health facilities across Africa, Asia, and Latin America to determine a standard definition for LTFU that minimizes patient misclassification.
What Did the Researchers Do and Find?
Using data collected from 111 health facilities by the International Epidemiologic Databases to Evaluate AIDS (IeDEA) Collaboration, the researchers categorized patients receiving ART at each facility at a “status classification” date (12 months before the facility's last data export to IeDEA) as active or LTFU using a range of intervals (thresholds) since their last clinic visit. For example, for a test interval of 200 days, patients who had not revisited the clinic within 200 days of their previous visit at the status classification date were classified as LTFU; patients who had revisited the clinic were classified as active. The researchers then looked forward 365 days from the status classification date to assess the performance and accuracy of these classifications. So, a “LTFU” patient who visited the clinic anytime during the year after the status classification date represented a false-positive classification, and an “active” patient who did not return within the ensuing year represented a false-negative classification. When data from all the facilities were pooled, a threshold of 180 days produced the fewest misclassifications. At the facility level, the best-performing threshold for patient classification ranged from 58 to 383 days (with an average of 150 days), but application of a 180-day threshold to individual facilities only slightly increased misclassifications. Finally, using the 180-day threshold, average LTFU at individual facilities was 19.9%.
What Do These Findings Mean?
Based on these findings, the researchers recommend that the standard definition for LTFU should be when it has been 180 days or more since the patient's last clinic visit. Given the wide range of best-performing definitions among facilities, however, they recognize that local, national, or regional definitions of LTFU may be more appropriate in certain contexts. Adoption of a standard definition for LTFU, the researchers note, should facilitate harmonization of monitoring and evaluation of ART programs across the world and should help to identify “best practices” associated with low LTFU rates. Importantly, it should also provide the necessary framework for research designed to improve patient retention in ART programs, thereby helping to maximize and sustain the health gains from HIV treatment programs.
Additional Information
Please access these websites via the online version of this summary at
Information is available from the US National Institute of Allergy and Infectious Diseases on HIV infection and AIDS
NAM/aidsmap provides basic information about HIV/AIDS and summaries of recent research findings on HIV care and treatment
Information is available from Avert, an international AIDS charity on many aspects of HIV/AIDS, including information on HIV/AIDS treatment and care and on universal access to AIDS treatment (in English and Spanish)
The World Health Organization provides information about universal access to AIDS treatment (in several languages)
Information about the IeDEA Collaboration available
Patient stories about living with HIV/AIDS are available through Avert and through the charity website Healthtalkonline
PMCID: PMC3201937  PMID: 22039357
21.  Respiratory symptoms, lung function, and pneumoconiosis among self employed dental technicians. 
From the registry of self employed workers living in Paris, a group of 105 dental technicians was studied to evaluate occupational exposure, to determine respiratory manifestations, and to investigate immune disturbances. Seventy one dental technicians (age range 43-68: group D), 34 dental technicians younger than 43 or older than 68 (group d), and 68 control workers (age range 43-66: group C) were investigated. The demographic characteristics and the smoking habits of the groups D and C did not differ significantly. The dental technicians often worked alone (43.7%) or in small laboratories without adequate dust control. The mean duration of their exposure was long (group D 34.0 (SD 8.4) years). The prevalence of respiratory symptoms did not differ between groups D and C except for the occurrence of increased cough and phlegm lasting for three weeks or more over the past three years (group D 16.9%, group C 2.9%, p < 0.007). The effect of cigarette smoking on respiratory symptoms and lung function was obvious. All mean values of lung function for dental technicians and controls were within normal limits. Significant decreases in all mean lung function values were found among smokers by comparison with non-smokers, however, and a positive interaction with occupational exposure was established. The x ray films of dental technicians (n = 102, groups D and d) were read independently by four readers and recorded according to the International Labour Office classification of pneumoconioses. The prevalence of small opacities greater than 1/0 was 11.8% with a significant increase with duration of exposure. The prevalence among dental technicians with 30 years or exposure or more was significantly higher (22.2%) than those with less than 30 years (3.5, p < 0.004). The prevalence of autoantibodies (rheumatoid factors, antinuclear antibodies, and antihistone antibodies) was not significantly different in the groups D and C. When positive, autoantibodies only occurred at low concentrations. This finding contrasts with previous reports on the occurrence of autoantibodies and even of connective tissue diseases in dental technicians. In conclusion, the study confirms an increased risk of pneumoconiosis among dental technicians. Moreover, there may be other lung disorders such as impairment of lung function especially in association with cigarette smoking.
PMCID: PMC1012163  PMID: 8507597
22.  The characteristics of respiratory ill health of wool textile workers. 
The relations of lung function and chest radiographic appearances with exposure to inspirable dust were examined in 634 workers in five wool textile mills in west Yorkshire, randomly selected to represent fully the range of current exposures to wool mill dust. Most of these workers could be categorised into three large sex and ethnic groups; European men, Asian men, and Asian women. Exposures to inspirable dust had been measured at a previous survey and time spent in current job, and in the industry were used as surrogates for lifetime cumulative exposures. Chest radiographs were interpreted on the International Labour Office (ILO) scale by three medically qualified readers, and the results combined. Profusions of small opacities of 0/1 on the ILO scale, or greater, were present in only 6% of the population, and were not positively associated with current exposure to wool mill dust, or duration of exposure. In general, statistically significant relations between exposure and lung function indices were not found, with the exception of an inverse relation between the forced expiratory volume/forced vital capacity ratio and dust concentration in European women. A suggestive but not statistically significant inverse relation between FVC and current dust concentration was seen in Asian men. Substantial differences were found between mills in mean values of lung function variables after adjustment for other factors but these were not apparently related to the differences in dust concentrations between these mills. Dyeworkers and wool scourers (mostly European men in relatively dust free jobs) on average experienced an FEV1 251 ml lower than other workers when age, height, smoking habits, and occupational factors had been taken into account. Twenty four per cent of the workforce responded to intracutaneous application of one or more common allergens (weal diameter at least 4 mm), only 12 (7.9%) of these responding to wool extracts. Atopic subjects did not appear to have an increased susceptibility to the effects of inspirable wool dust on lung function. These studies suggest that exposure to wool mill dust may cause functional impairment in some workers but there is little indication from these data of frequent or severe dust related functional deficits. More detailed estimates of cumulative dust exposure by reconstruction of exposure histories might clarify associations between exposure to dust and lung function. These chest radiographic findings provide no evidence that exposure to wool mill dust is related to lung fibrosis.
PMCID: PMC1035359  PMID: 2025586
23.  A scoring system for the assessment of clinical severity in osteogenesis imperfecta 
Osteogenesis imperfecta (OI) is a genetic disorder characterized by bone fragility and fractures. Patients with OI have clinical features that may range from mild symptoms to severe bone deformities and neonatal lethality. Numerous approaches for the classification of OI have been published. The Sillence classification is the most commonly used. In this study, we aimed at developing a more refined sub-classification by applying a proposed scoring system for the quantitative assessment of clinical severity in different types of OI.
Subjects and methods
This study included 43 patients with OI. Clinical examination and radiological studies were conducted for all patients. Cases were classified according to the Sillence classification into types I–IV. The proposed scoring system included five major criteria of high clinical value: number of fractures per year, motor milestones, long bone deformities, length/height standard deviation score (SDS), and bone mineral density (BMD). Each criterion was assigned a score from 1 to 4, and each patient was marked on a scale from 1 to 20 according to these five criteria.
Applying the proposed clinical scoring system showed that all 11 patients with Sillence type I (100%) had a score between 6 and 10, denoting mild affection. The only patient with Sillence type II had a score of 19, denoting severe affection. In Sillence type III, 7 patients (31.8%) were moderately affected and 15 patients (68.2%) were severely affected. Almost all patients with Sillence type IV (88.9%) were moderately affected.
Applying the proposed scoring system can quantitatively reflect the degree of clinical severity in OI patients and can be used in complement with the Sillence classification and molecular studies.
PMCID: PMC3303020  PMID: 23449141
Osteogenesis imperfecta; Sillence classification; Clinical scoring system; Radiological manifestations; Genetics
24.  Setting a Fair Performance Standard for Physicians’ Quality of Patient Care 
Assessing physicians’ clinical performance using statistically sound, evidence-based measures is challenging. Little research has focused on methodological approaches to setting performance standards to which physicians are being held accountable.
Determine if a rigorous approach for setting an objective, credible standard of minimally-acceptable performance could be used for practicing physicians caring for diabetic patients.
Retrospective cohort study.
Nine hundred and fifty-seven physicians from the United States with time-limited certification in internal medicine or a subspecialty.
Main Measures
The ABIM Diabetes Practice Improvement Module was used to collect data on ten clinical and two patient experience measures. A panel of eight internists/subspecialists representing essential perspectives of clinical practice applied an adaptation of the Angoff method to judge how physicians who provide minimally-acceptable care would perform on individual measures to establish performance thresholds. Panelists then rated each measure’s relative importance and the Dunn–Rankin method was applied to establish scoring weights for the composite measure. Physician characteristics were used to support the standard-setting outcome.
Key Results
Physicians abstracted 20,131 patient charts and 18,974 patient surveys were completed. The panel established reasonable performance thresholds and importance weights, yielding a standard of 48.51 (out of 100 possible points) on the composite measure with high classification accuracy (0.98). The 38 (4%) outlier physicians who did not meet the standard had lower ratings of overall clinical competence and professional behavior/attitude from former residency program directors (p = 0.01 and p = 0.006, respectively), lower Internal Medicine certification and maintenance of certification examination scores (p = 0.005 and p < 0.001, respectively), and primarily worked as solo practitioners (p = 0.02).
The standard-setting method yielded a credible, defensible performance standard for diabetes care based on informed judgment that resulted in a reasonable, reproducible outcome. Our method represents one approach to identifying outlier physicians for intervention to protect patients.
Electronic supplementary material
The online version of this article (doi:10.1007/s11606-010-1572-x) contains supplementary material, which is available to authorized users.
PMCID: PMC3077491  PMID: 21104453
clinical performance assessment; standard setting; composite measures; diabetes care
25.  A new high resolution computed tomography scoring system for pulmonary fibrosis, pleural disease, and emphysema in patients with asbestos related disease. 
The aim of this study was to describe a scoring system for high resolution computed tomographic (HRCT) scans analogous to the International Labour Office (ILO) scoring system for plain chest radiographs in patients with asbestos related disease. Interstitial fibrosis, pleural disease, and emphysema were scored, the reproducibility and the interobserver agreement using this scoring system were examined, and the extent of the various types of disease was correlated with measurements of lung function. Sixty asbestos workers (five women and 55 men) mean age 59 (range 34-78) were studied. The lungs were divided into upper, middle, and lower thirds. An HRCT score for the extent of pleural disease and pulmonary disease in each third was recorded in a way analogous to the International Labour Office (ILO) method of scoring pleural and parenchymal disease on chest radiographs. A CT score for the extent of emphysema was also recorded. Pleural disease and interstitial fibrosis on the plain chest radiographs were assessed according to the ILO scoring system. A chest radiographic score for emphysema analogous to that used for HRCT was also recorded. Two independent readers assigned HRCT scores that differed by two categories or less in 96%, 92%, and 85% compared with 90%, 78%, and 79% of cases for chest radiographs for fibrosis, emphysema, and pleural disease respectively. There was better intraobserver repeatability for the HRCT scores than for the chest radiograph scores for all disorders. Multiple regression analysis showed that scores for interstitial fibrosis, emphysema, and pleural disease on chest radiographs and HRCT correlated to a similar degree with impairment of lung function.
PMCID: PMC1012071  PMID: 1536823

Results 1-25 (1156513)