|Home | About | Journals | Submit | Contact Us | Français|
Psoriasis is a common disease frequently studied in large databases. To date the validity of psoriasis information has not been established in The Health Improvement Network (THIN).
To investigate the validity of THIN for identifying psoriasis patients and to determine if the database can be used to determine the natural history of disease.
First we conducted a cross sectional study to determine if psoriasis prevalence in THIN is similar to expected. Second we created a cohort of 4900 patients, aged 45–65, with a psoriasis diagnostic Read Code and surveyed their GPs to confirm the diagnosis clinically. Third we created models to determine if psoriasis descriptors (extent, severity, duration, and dermatologist confirmation) could be accurately captured from database records.
Psoriasis prevalence was 1.9%, and showed the characteristic age distribution expected. GP questionnaires were received for 4,634 of 4,900 cohort patients (95% response rate), and psoriasis diagnoses were confirmed in 90% of patients. Duration of disease in the database showed substantial agreement with physician query (kappa = 0.69). GPs confirmed that the psoriasis diagnosis was corroborated by a dermatologist in 91% of patients whose database records contained a dermatology referral code associated with a psoriasis code. We achieved good discrimination between patients with and without extensive disease based on the number of psoriasis codes received per year (Area Under Curve, AUC = 0.8).
THIN is a valid data resource for studying psoriasis and can be used to identify characteristics of the disease such as duration and confirmation by a dermatologist.
Psoriasis is a common, chronic inflammatory disease affecting 1–4% of the adult population1–6. Recently, a variety of studies have used large databases to show that psoriasis patients, especially those with severe disease, are at increased risk of MI, stroke, all-cause mortality, and cardiovascular mortality7–15. Although psoriasis is increasingly studied in administrative and medical records databases, rigorous validation studies to confirm that electronic psoriasis records accurately reflect the patient’s clinical state are frequently not performed16–18.
The most extensive psoriasis validation studies were completed in the General Practice Research Database (GPRD). They involved epidemiological confirmation and questionnaires sent to general practitioners (GPs) to demonstrate that psoriasis diagnostic codes accurately reflect the patient’s clinical state4,5,19–21.
In recent years a new population-based medical records database, The Health Improvement Network (THIN), has become available. It offers many desirable features including a substantial patient population, patient level laboratory and prescribing information, and a large group of contributing practices that have agreed to answer patient-related queries22. Initial studies in THIN have suggested that it validly captures data on conditions such as peptic ulcer disease, basal cell carcinoma, and death23–25. To our knowledge, studies evaluating the validity of psoriasis in THIN have not been conducted. Therefore, the goals of this study were to validate the accuracy of psoriasis diagnostic psoriasis codes in THIN, and to determine if disease severity, disease duration, and dermatologist confirmation of psoriasis diagnoses are accurately captured in the database.
This validation study involved three discrete stages. First, to measure convergent validity (i.e. degree to which two measures of the same construct yield similar findings) we conducted a cross sectional study to determine if psoriasis prevalence in the THIN population is similar to expected. Second, to measure criterion validity (i.e. extent to which a measure agrees with a gold standard) we surveyed GPs to confirm that patients with an electronic psoriasis code in fact had psoriasis clinically16,18. This survey was designed by dermatologists and epidemiologists with the guidance of UK-based general practitioners. Third we attempted to derive information about the patient’s psoriasis (i.e. extent, severity, duration, and confirmation of the diagnosis by a dermatologist) from database codes and compared these to GP surveys. This study was approved by the University of Pennsylvania institutional review board and the Cambridgeshire Research Ethics Committee and was funded by the National Heart Lung and Blood Institute of the NIH.
THIN is an electronic medical records database that includes more than 7.5 million people, and broadly represents 4.6% of the UK population across more than 400 practices. It contains patient demographics, medical diagnoses, laboratory results, and prescriptions as recorded by GPs, who serve as the primary point of medical contact in the UK. THIN utilizes monetary incentives, quality targets, and training to ensure accurate and complete records which should include information from hospital and specialty care22. It is estimated that approximately half of the practices in THIN are also participating in the GPRD database27.
The entire THIN population was included in the prevalence study. Patients were considered to have psoriasis if they had ever received an electronic psoriasis Read Code. THIN utilizes more than 100,000 unique hierarchical alpha-numeric Read Codes to capture medical information22. Four researchers, two of whom were dermatologists, identified 26 psoriasis Read Codes by employing a keyword search and then examining affiliated codes in hierarchical proximity. The list of codes along with the percent of patients assigned each code is available online in Table 5.
In order to validate the psoriasis diagnostic codes and test the psoriasis descriptor models (extent, severity etc), we created a cohort of psoriasis patients in THIN and gathered additional information about them via a GP questionnaire. Patients were eligible for the cohort if they were 45–64 years of age; this age range was selected as this cohort will be followed prospectively for cardiovascular outcomes. They were further required to be alive at the time of sampling, have received at least one psoriasis Read Code in the two years prior to sampling, and be registered to a practice with an Additional Information Services (AIS) contract. AIS practices agree to complete questionnaires in exchange for compensation. More than half (55%, n=228) of all THIN practices belonged to AIS at the time of sampling. For these portions of the study, patients were defined as having psoriasis if their GP confirmed the diagnosis.
To ensure data integrity, we calculated the proportion of internally inconsistent survey responses from each practice, as determined by comparing two questions about Body Surface Area (BSA) involved. If more than a quarter of a practice’s responses were inconsistent the practice was excluded from the analysis.
There are 7,520,293 patients registered in THIN of whom 6,985,418 had at least one recorded visit to their GP. We randomly sampled 5000 of the eligible 5008 patients who met the cohort criteria above. The sample size of 5000 was selected to ensure appropriate power to detect cardiovascular outcomes in subsequent work. A pilot group of 100 randomly selected patients were excluded from this study due to different selection criteria. Between June and September of 2009 a questionnaire was mailed to the remaining 4900 patients’ GPs to confirm their psoriasis diagnosis and describe the disease. Practices that did not respond were reminded with follow up phone calls and letters. Survey collection closed in June 2010. The questionnaire is available online.
The first stage of the study involved analysing patient demographics and psoriasis prevalence to determine if our findings are consistent with previous epidemiological research. Prevalence was calculated by dividing the number of psoriasis patients in each age group by the total number of THIN patients in that age group. A sensitivity analysis involved excluding patients who had never received a medical code, and thus may have been registered but never seen by a practice.
Subsequently, electronic psoriasis diagnostic codes were validated by comparing them to GP survey responses which were utilised as a gold standard. Basic demographics were summarised for patients with confirmed and refuted diagnoses to determine if there were any systematic differences between the groups. The positive predictive value (PPV, the probability of having psoriasis confirmed by questionnaire given the presence of an electronic psoriasis code) was calculated. We then determined if we could discriminate between patients with and without GP-confirmed psoriasis using Receiver Operating Characteristic curves (ROC, described below) based upon the number of psoriasis diagnoses and/or treatments. For illustration purposes the PPV, sensitivity (the percent of GP-confirmed psoriasis patients the test captures), specificity (the percent of GP-denied patients the test excludes), and the correct classification rate (probability that the classification result [i.e., psoriasis or no psoriasis] matches the GP designation) are shown for the first three cut points of the model with the highest value for the Area Under the ROC Curve (AUC).
Finally, we sought to determine if attributes of the patient’s psoriasis (i.e., extent, severity, duration, and diagnosis corroboration via dermatology consultation) could be determined from the database. THIN does not contain information directly referring to these attributes so potential surrogate markers were investigated.
If the GP confirmed the psoriasis diagnosis, s/he was asked about the BSA the disease typically involves (≤2%, 3–10%, or >10%). Demographics and descriptive data were compared for patients with extensive (>10% BSA) and non-extensive disease (<10% BSA). Multiple algorithms were developed to predict extensive disease. Models with the highest AUCs are presented, and cut points are shown at the 10th, 25th, 50th, 75th, 90th and 95th percentile of all GP responses for the model with the highest AUC.
An ROC curve is constructed by plotting the true positive rate of the test on the y-axis as a function of the false positive rate on the x-axis for each possible cutoff value of a diagnostic test (e.g., 0 codes, 1 code, 2 codes etc). The AUC is the probability that the test will rank a randomly chosen psoriasis patient with extensive disease higher than a randomly chosen psoriasis patient without extensive disease. A value of 0.5 would indicate that the test has no predictive power and assigns patients at random, while a value of 1.0 would indicate that the test is always correct 28,29.
GPs were also asked if the diagnosis was corroborated by a dermatologist. THIN physicians are expected to record hospital referrals as a dated NHS specialty code. We calculated descriptive statistics (PPV, negative predictive value-NPV, sensitivity, and specificity) to determine whether dermatology NHS codes, alone or in conjunction with psoriasis codes, could predict dermatology confirmation.
To determine agreement between duration of disease from the database, as determined by the first psoriasis code (which GPs can backdate to reflect the time of disease onset), and duration of disease according to the GP questionnaire, a Cohen’s kappa statistic was calculated. The kappa statistic is a measure of agreement; it ranges from 1 (perfect agreement) to 0 (agreement expected from chance alone). Duration of disease was broken into 10 year increments and linear weights were applied. Ten years was selected because it was the smallest increment that the GP questionnaire responses could be grouped into to allow the same number of years in each category.
Descriptive statistics, and ROC curves were calculated using SAS (Version 9.2, SAS Institute Inc, Cary, NC). ROC curves were graphed using STATA (Version 10, Stata Corporation, College Station, TX). Statistical significance was determined using two-sided p-values at the 0.05 level (95% confidence level). Missing responses on the surveys were excluded from the analysis.
Table 1 describes the prevalence and demographics of psoriasis patients within the THIN database, and compares them to the THIN population as a whole. Prevalence rates are shown overall, by decade, and by sex. Overall prevalence was found to be 1.9% (2.2% in adults 20 years or older) and did not differ substantially by sex. The prevalence is low in the pediatric population, increases sharply in young adulthood, gradually increases until late adulthood and then decreases in older individuals. In a sensitivity analysis, patients who had never received a medical code were removed from the calculation (534,875 patients). Prevalence rates increased only slightly, with overall prevalence becoming 2.0%. The frequency of various subtypes of psoriasis based on Read Codes is shown in Table 5 (available online). Psoriasis unspecified (55.5%) or NOS (58.7%) was most common (note patients could receive more than one code and therefore the sum is greater than 100%). Variants of psoriasis were coded less frequently and include guttate (7.4%), scalp (6.8%), pustular (2.8%) palmer or plantar (0.8% and 0.5% respectively) and erythrodermic (0.3%).
At the close of the survey collection period we had received 4,634 of the 4,900 surveys yielding a response rate of 95%. One practice (39 surveys, 1% of the sample) was removed from the survey-based analysis based on a high proportion (52%) of internally inconsistent surveys. The demographics of the sampled patients and survey respondents were nearly identical. Amongst those with returned surveys, 51% (n= 2361) of patients were male and the median age was 55 years (IQR: 49.9–60.9).
To validate the electronic psoriasis Read Codes, the questionnaire asked GPs to confirm or refute the psoriasis diagnosis. The diagnosis of psoriasis was confirmed in 90% (n = 4543, 95% CI 89–91%) of patients who met the criteria for entry into our cohort. Table 2A compares patients with confirmed and refuted diagnoses, and describes the best models for predicting an accurate psoriasis diagnosis. Patient demographics did not differ significantly between the confirmed and refuted groups. The model producing the highest AUC was the number of electronic psoriasis diagnostic codes (AUC=0.75). On average, patients with a confirmed diagnosis had 6 psoriasis codes while those with a refuted diagnosis had only 2. Patients whose diagnosis was confirmed also received significantly more treatments specific for psoriasis (8 vs 2); however, incorporating psoriasis treatments did not improve the AUC or PPV of the models.
Table 2B shows descriptive statistics of the best model at several cut points along the ROC. One or more electronic psoriasis codes yields a PPV of 90%. Requiring more than one code increases the PPV to 95% but will exclude 26% of patients with a valid diagnosis.
If the GP confirmed the psoriasis diagnosis s/he was further asked to identify the percent of the patient’s body surface area the psoriasis typically involves. In our cohort, 12% (n = 478, 95%CI = 11–13%) had extensive disease covering more than 10% of their body. Table 3A compares the extensive and non-extensive psoriasis patients. Patients with extensive disease were slightly younger and more likely to be male.
We developed models to determine if electronic codes could reliably identify patients with extensive disease. Though the AUCs of the models presented were similar, the model producing the highest AUC was the number of electronic psoriasis codes (treatment or diagnostic) the patient received per year of psoriasis since their prospective data collection began in THIN (marked with a ‘**’ in table 3A and shown in Fig. 1). On average, patients with extensive disease received 16 codes per year while those with less involved disease received only 3.
Table 3B shows attributes of the model at several cut points. Even with the strictest test criteria we were only able to achieve a PPV of 45%. Though this cut point correctly classified 87% of patients, the sensitivity was only 18.9%. Less strict cut points were able to achieve a substantially higher sensitivity at the expense of a lower PPV.
Models were also developed to determine if disease severity (as defined by skin disease that would require a systemic agent to achieve clearance per the GP’s opinion) could be predicted from the database. Results were similar to the results for extensive disease and are not shown.
If the GP confirmed the psoriasis diagnosis s/he was asked if the diagnosis was also corroborated by a dermatologist. In our cohort, 46% (n = 1,816, 95%CI 44–47%) of psoriasis patients had their diagnosis corroborated. We tested whether a dermatology NHS code in the database could be used as a surrogate marker to indicate that a dermatologist confirmed the psoriasis diagnosis. Results are shown in Table 4. More than three-quarters (77%) of psoriasis patients with a dermatology NHS code had their diagnosis corroborated, and 75% of those without a dermatology NHS code did not have their diagnosis corroborated. If we further require the NHS code to be entered in conjunction with a psoriasis code the PPV increases to 91%, but only a third of patients with a dermatologist confirmation are captured.
Finally, GPs were queried regarding the number of years the patient has had psoriasis. This response was compared to the number of years since their first psoriasis code in the database. When the duration of disease was broken into 10-year increments the linearly weighted Cohen’s kappa was 0.69 (95%CI 0.67–0.71, 0.62 without weighting). The physician survey and database date matched exactly in 76.5% of cases.
The data from this study suggest that THIN is a valid data resource for the study of psoriasis. The epidemiology of psoriasis in THIN is similar to findings from previous population-based studies providing a measure of convergent validity1–6. Moreover, the PPV for one or more psoriasis Read Codes was 90%, demonstrating criterion validity. Of note, the PPV of psoriasis codes in THIN is similar to that observed in validation studies of psoriasis in GPRD5,19,21. Moreover, given that THIN and GPRD cover similar populations and have similar methods for capturing electronic medical data, it is likely that the results of this study generalize to GPRD.
In addition to validly identifying patients with psoriasis, the database can yield important information about the disease. Comparison of disease duration between the survey and the database yielded a kappa of 0.69, indicating substantial agreement30. This will be valuable for modeling outcomes based on disease duration, though we should expect some misclassification. Electronic codes in THIN can also be used to determine if the psoriasis diagnosis was confirmed by a dermatologist. For example, if a patient has a dermatology referral code in conjunction with a psoriasis code then the PPV is 91% (95% CI: 88–93%) for the patient having their diagnosis confirmed by a dermatologist. This will be valuable for sensitivity analyses where dermatologist confirmation can serve as the ultimate gold standard for an accurate psoriasis diagnosis. Moreover, we were able to achieve good discrimination (AUC = 0.80) between those with and without extensive skin disease based on ROC analysis of number of yearly psoriasis-related codes in THIN; however, the prevalence of extensive psoriasis in the general population is low and therefore the PPVs of various cut points of our model are not high enough to serve as an accurate surrogate marker of extensive involvement. Nevertheless, these data can be used to enrich a cohort with patients that have more severe disease.
Particular strengths of this study include its use of a variety of epidemiological methods to examine psoriasis validity. Furthermore this study is broadly representative of psoriasis patients ages 45–64. With nearly 5000 patients sampled, this is, to our knowledge, the largest validation study ever completed for psoriasis, and demonstrates that information above and beyond what is contained in the database can be collected for large cohorts with an excellent response rate. We have also demonstrated the ability to use electronic codes to measure psoriasis duration and confirmation by a dermatologist.
We required patients to have received a psoriasis code within the last two years for inclusion in the cohort. This may have led to under-representation of patients with mild disease who may not have discussed psoriasis with their GP in the last 2 years, or may not have seen their GP in that timeframe. In addition, our cohort was limited to patients between 45–65 years of age so our findings may not generalize to the entire psoriasis population. Moreover, we did not evaluate the NPV of psoriasis in THIN (i.e., the probability that a patient without a psoriasis Read Code does not have psoriasis) as this disease has a low prevalence at baseline.
In all validation studies the possibility of a tarnished gold standard for measuring criterion validity should be addressed. We compared our outcomes to responses from GP surveys which served as our gold standard. Though our survey instrument was designed under the guidance of expert opinion in the fields of epidemiology, dermatology, and primary care medicine in order to ensure face and content validity, it was not formally tested for psychometric properties such as test-retest reliability and criterion validity. Furthermore, compared to dermatologists, GPs are more prone to error when making diagnoses of skin disease though a previous study suggested that GPs from the UK are reasonably accurate when it comes to diagnosing psoriasis31. Moreover, our data suggest that coding algorithms can be utilised to identify patients whose psoriasis diagnosis was confirmed by a dermatologist which may allow for a more stringent case definition.
THIN is a valid data resource for the study of psoriasis and can be used to identify characteristics of the disease such as duration and confirmation by a dermatologist. These newly identified and validated characteristics can be utilised in future studies that evaluate the natural history of psoriasis in population based setting.
Sponsor/Grant This study was supported by an R01 grant and a Graduate Research Supplement (NMS) grant from the National Heart, Lung, and Blood Institute of the NIH RO1HL089744 (JMG)
Conflicts of Interest: Dr Gelfand has received grants from Amgen, Pfizer, Novartis, and Abbott, and is a consultant for Amgen, Celgene, and Centocor. Dr Margolis is on the data safety monitoring boards for Abbott, Astellas, and Centocor. No other authors have conflicts of interest.