Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Pharmacoepidemiol Drug Saf. Author manuscript; available in PMC 2013 June 1.
Published in final edited form as:
PMCID: PMC3371176

Accuracy of Pneumonia Hospital Admissions in a Primary Care Electronic Medical Record Database



When using electronic medical record (EMR) data to study drug use, hospitalizations are markers of severe outcomes. To identify events within a specified time window, it is important to validate hospitalization diagnoses and dates. Our objective was to validate pneumonia hospitalizations and their dates identified using hospitalization codes in The Health Improvement Network (THIN), a UK primary care EMR.


This cross-sectional study used a cohort of THIN adult visits for acute nonspecific respiratory infections from 6/1985-8/2006. Pneumonia hospitalizations within 14 days after the visit were identified using THIN diagnosis and hospitalization codes; 60 were randomly selected for validation. Patients' general practitioners (GPs) returned deidentifed hospital summaries and consultants' letters regarding overnight hospitalizations within a 180-day window around the THIN hospitalization. Positive predictive value (PPV) was the number of GP-validated hospitalizations divided by THIN documented hospitalizations.


GPs returned 59/60 patient records; 52 had confirmed hospitalizations. PPV of THIN hospitalization documentation was 88% (95% c.i.77–95). One admission was not for pneumonia; PPV of THIN-documented pneumonia admission was 86% (75–94). Of 52 valid THIN hospitalizations, 50 were actually admitted within 14 days of the documented THIN date (range −2 to +18). The absolute median difference between THIN and validated admission dates was +0.5 days; absolute mean difference was +3.1 days. In 16 of 52 admitted patients, the THIN admission date was the actual discharge date.


THIN hospitalization codes performed well in identifying acute pneumonia hospitalizations and their timing. Admission date validity might be better for conditions associated with shorter vs. longer hospitalizations.

Keywords: pneumonia, health services research, validation studies, electronic medical records, drug safety, treatment outcomes


The importance of enhanced post-marketing drug safety surveillance is increasingly recognized.1 Observational studies can take advantage of accumulating electronic medical record data to enhance post-marketing outcome studies. The growing availability and comprehensiveness2 of ambulatory electronic medical records such as the General Practice Research Database (GPRD) and The Health Improvement Network (THIN, CSD MR),3, 4 are adding breadth and depth to our ability to explore treatment-outcome relationships at the individual level with improved ability to adjust for confounders.58 While taking advantage of these observational electronic data to support post-marketing comparative effectiveness and drug safety studies, it is important to validate the outcomes that will be used in such studies. In particular, hospital admission diagnoses are important markers of serious adverse events, and therefore assessment of the validity of such codes is critical for drug studies. Also, to identify adverse events within a specified exposure time window, it is important to ascertain the precision of the hospitalization dates associated with the hospital diagnosis codes.

This study evaluates hospital admission data in THIN, a U.K. primary care electronic medical record. Like many electronic medical record databases, THIN contains rich longitudinal clinical data at the individual patient level, however inpatient data regarding hospital admissions are not directly linked to the outpatient record. Hospital admission data are entered manually by patients' general practitioners after they review patients' hospital discharge summaries, converting discharge diagnoses from the discharge summaries into diagnostic codes supported in THIN.

There are three main areas of uncertainty to be addressed if THIN hospital admission data are to be useful for drug outcome research. First, when THIN hospital admission codes indicate a patient was hospitalized, did the patient truly have an overnight admission to a hospital? Second, if the patient was indeed hospitalized, is the primary discharge diagnosis recorded in THIN the true primary discharge diagnosis from the hospital admission? Third, what is the relationship between the recorded hospital admission date and the true hospital admission date? If the hospital admission is recorded after receipt of the discharge summary, it may be recorded with a later date than the true admission date. Inaccuracy in recording of hospital admission dates may lead investigators to miss associations between drug exposures and adverse outcomes if the miscoded admission date falls outside the exposure window of interest when the true admission date did not.

The objective of this study was to validate hospital admissions in the THIN database. Our hypotheses were that: 1) the positive predictive value (PPV) of a hospital admission identified using THIN hospital admission codes was greater than or equal to 90%, 2) the PPV of an identified hospital admission specifically for community acquired pneumonia was greater than or equal to 90%, and 3) 100% of THIN hospital admissions would be recorded as occurring within a 14-day window of the true hospital admission date.


This validation study was part of a larger retrospective cohort study evaluating outcomes of antibiotic vs. non antibiotic treatment for patients with office visits for acute nonspecific respiratory tract infections, including hospital admission for community-acquired pneumonia. For the parent study, we utilized data from THIN, a large longitudinal observational database of anonymized computerized primary care medical records from the U.K.. THIN collects de-identified patient data records from general practices throughout the U.K. to create a longitudinal medical research database. Within the UK, approximately 98% of the population is registered with general practitioner physician (GP). THIN data include anonymized demographics, and clinical data regarding visits and diagnoses, consultations, and hospital admissions.9 Diagnoses are recorded with Read diagnostic codes, a hierarchical coding system; hospital admissions are recorded using admission identifier codes, in addition to the relevant Read codes, that are meant to distinguish between overnight admissions vs. evaluation in an emergency department or surgery. Practitioners are trained in data entry and their data are reviewed on an ongoing basis for quality and completeness.10, 11

We utilized THIN data as of September 2007, including all permanently registered patients of computerized THIN practices. Using Read diagnostic codes, we identified a cohort of adults with ambulatory primary care visits for acute nonspecific respiratory tract infections from June, 1985 through August, 2006. (Appendix)

Study Outcome

We focused on overnight hospital admissions for community acquired pneumonia for several reasons. First, hospital admission for community acquired pneumonia is a relatively common event following acute non-specific respiratory infection, giving us a robust sample of reasonably similar outcomes to be able to correctly estimate the measurement error. Additionally, the outcome of community-acquired pneumonia was a primary outcome of interest for the parent study described above. Within the described cohort, adults with overnight hospital admissions for community acquired pneumonia within 30 days of an ambulatory encounter for acute respiratory tract infection in the THIN database were identified using Read diagnostic codes for acute pneumonia (Table 1) and THIN hospital admission codes. We included a broad list of codes for community-acquired pneumonia, including organism specific codes and non-specific pneumonia codes.

Table 1
THIN Pneumonia Diagnostic Codes

In the total cohort of 814,283 adults seen for 1,646,229 non-specific acute respiratory tract infections, we identified 387 patients with Read codes for pneumonia associated with a hospitalization within 30 days of the index visit. Of these patients, 283 were patients of THIN practices participating in validation research, and 199 of these patients' paper charts were unavailable because the patient had either transferred out of the practice or died. Of the remaining 84 active patients in participating practices, we randomly sampled 60 hospitalizations coded in the database for validation. We estimated that a sample of 60 hospitalizations for validation would provide approximately 80% power to detect a PPV within a 95% confidence interval of 0.68 to 0.89. Even with a PPV as low as 50% we would have 99% power to detect an absolute difference in admission dates as small as one or two days.

Gold Standard Outcome

Each subject's de-identified THIN patient and practice identification codes, and a date window including 90 days before and following the acute respiratory infection visit were sent to the THIN Additional Information Service (AIS); THIN AIS in turn forwarded the patient identification code and date window to the subject's GP. The GPs identified records from their patients' charts that were supplementary to the electronic THIN data. For the specified patients, GPs returned to THIN AIS de-identified photocopies of all hospitalization discharge summaries, written chart notes, consultants' letters, and any additional material related to any overnight hospital admissions within the specified date window.

THIN AIS ensured that these photocopied records were completely de-identified prior to forwarding them to investigators. One of the investigators (SM) then reviewed all patient records and extracted the following information:

  • a)
    Did the subject have an overnight hospital admission within the window?
  • b)
    What was the hospital admission date?
  • c)
    What were the primary and additional discharge diagnoses?

Confirmed hospital admissions for pneumonia were defined as overnight hospital admissions if the patient's forwarded records documented either: 1) a hospital discharge summary with a primary diagnosis of pneumonia, 2) other documented evidence, supplementary to the electronic THIN data, that the patient had a hospital admission with a primary diagnosis of pneumonia, including the specific date of admission, or 3) a hospital discharge summary with a primary diagnosis of chest infection or acute infective pulmonary exacerbation of chronic lung disease with documented radiologic evidence of pneumonia on hospital admission, and/or treatment with antibiotics through the patient's hospital stay.

This study was approved by the University of Pennsylvania THIN User Committee, the University of Pennsylvania Institutional Review Board, and the Medical Research Ethics Committee, National Research Ethics Service of the U.K. National Health Service.


For our first aim, the PPV of a THIN hospital admission for any diagnosis during the 30 days following the acute respiratory infection visit was calculated as the number of patients with GP-confirmed hospital admissions divided by the total number of THIN hospital admissions, with exact binomial confidence intervals. For our second aim, the PPV of a THIN hospital admission for the specific diagnosis of pneumonia was calculated as the number of patients with GP-confirmed pneumonia diagnoses divided by the total number of THIN hospital admissions, with exact 95% binomial confidence intervals.

For our third aim, analysis included data only for those patients with GP-confirmed overnight hospital admissions. The mean and median difference in dates between the THIN recorded and the actual hospital admission date were calculated, along with 95% confidence intervals. Considering that combining these positive and negative values may underestimate true differences, we also calculated the mean and median absolute difference in dates between the THIN recorded and the actual hospital admission date. Stata, versions 9.2 and 10.0, were used for all analyses (StataCorp College Station TX, 29-Jan 2007 and 1 Oct 2009).


The sixty patients selected for pneumonia hospital admission validations were, in general, from later years during the study period (Figure, median 2000, interquartile range 1996 to 2003 vs. median 1995, interquartile range 1992 to 2000, respectively); they were younger (median 49 vs. 76 years), and were less likely to have a history of any comorbidities, compared with unselected patients (mean 43 vs. 63%) (all p< 0.01, Wilcoxon rank sum). Among the 60 requests sent out to GPs for validation, 59 photocopied chart records were returned.

THIN Charts Selected for Validation by Year

Predictive value of a THIN pneumonia hospitalization

Fifty two of these 59 patients had medical record documentation of a hospital admission within the 30-day window of the acute respiratory infection index visit date, giving a PPV of a THIN hospital admission of 88% (95% c.i. 77% to 95%, Table 2). One of these admissions did not have a discharge diagnosis of pneumonia according to the GPs chart records, giving a PPV for THIN pneumonia hospitalization coding of 51/59 or 86% (95% c.i. 75% to 94%). All of these admissions had pneumonia as the primary discharge diagnosis. For the one patient who was admitted to the hospital but did not have pneumonia, the GP records indicated that the true hospital diagnosis was bronchitis with wheezing; that patient did not receive antibiotics during that 6-day hospital stay.

Table 2
Pneumonia Hospital Admission Validation Results

Difference between THIN hospitalization date and true hospital admission date

Of the 52 patients with valid THIN hospitalizations, 50 were actually admitted within 14 days of the date recorded in THIN, with a range of −2 to +18 days. The median of the difference between the THIN recorded and actual admission dates was 0 days (95% c.i. 0–+2 days after the actual admission date) and the median absolute difference was 0.5 days (95% c.i. 0 to +2 days). The mean difference was +2.9 days after the actual admission date (95% c.i. +1.6 to +4.2 days) and the mean absolute difference was 3.1 days (95% c.i. 1.7 to 4.3 days).

In 16 of the 52 admitted patients, the THIN admission date was the discharge date listed on the GP hospital discharge notes.


Electronic medical records are a potentially vast and rich source of data to examine, evaluate, and compare clinical outcomes in comparative effectiveness studies of drugs. Such large datasets can provide impressive results, however, size is of little value if the data are of poor quality, and proceeding to analysis without validating important study parameters can corrupt the value of any results and lead us to erroneous conclusions. To take advantage of the increasingly available observational electronic medical record data to support post-marketing drug safety surveillance, it is important to validate the outcomes that will be used in such studies.1214 Hospital admission diagnoses are particularly important markers of adverse event severity. To be able to identify acute events within a specified time window related to acute exposure, it is also important to able to ascertain the precision of hospital admission dates.

Our PPV for the THIN pneumonia hospital admission codes was as good or better than the PPV for acute care date estimation methods described in other studies. McPhee found a PPV of 69% and 75% for Pap Smear and mammogram self-reports, respectively.15 Our results are similar to PPVs reported in studies of diagnosis validation in the GPRD, a database that shares some practices with THIN and that utilizes the same software.13, 14, 16 For example, von Staa et al reported a PPV of 97.2% (41/42) for hospital admissions for respiratory conditions; PPV for pnemonia specifically was not noted.17 There have been two recent systematic review of diagnosis recording in the GPRD. Herrett et al found a median PPV of 82.7% for studies using record requests from GPs as an external gold standard, although the diagnoses considered included prevalent as well as incident and acute conditions.13 The PPV was 88.0% for respiratory diagnoses, although the PPV for the two studies using specifically external GP record review as the gold standard, and the PPVs specifically for community acquired pneumonia were not reported. Khan et al found similar PPVs for diagnosis of acute conditions.14 Hammad et al addressed the timing of GPRD diagnosis of acute myocardial infarction, and found that 90% of dates were accurate within 15 days.18 Virtually all (50/52) of the THIN recorded hospital admission dates were accurate within a 14-day window, providing support for our ability to identify the timing of hospital admissions for studies where this level of precision is adequate to answer the study question. Our finding that 16 of the 52 admitted patients had the true hospital discharge date as the recorded THIN admission date implies that the accuracy of admission dates might be better for conditions that are associated with shorter vs. longer hospitalizations. For studies of conditions that usually require longer hospital stay, and/or requiring finer precision of admission dates, it might be wise to perform further studies to validate the outcome/s of interest.


Limitations of this study include that our study addresses the issue of positive predictive value, but not negative predictive value, of hospitalization codes in THIN. We did not have adequate resources to estimate the sensitivity or specificity of THIN pneumonia hospital admission coding. Future population-based studies could compare estimates of admission rates for the population covered by THIN with analogous U.K. population-based rates, for example, using the UK National Health Service Hospital Episode Statistics (HES). We did not address outcomes in addition to the included pneumonia hospital admission diagnoses in adults. The validity of other outcomes, for example, death, was not addressed, nor were results for children. All of our pneumonia cases occurred within 30 days of a primary care visit for acute nonspecific respiratory infection; our results are not necessarily generalizable to hospital admissions in general after drug exposure, or specifically to patients with different ages, different underlying conditions, or different hospital admission diagnoses than those included in this study. Our validations were restricted to living active patients of THIN practices participating in research; validated pneumonia hospital admissions were from patients in different practices, who made visits in later study years, who were younger, and had fewer comorbidities compared with patients without validated admissions. The accuracy of hospital admission documentation for patients who have died might be different for living patients, as well as results for non-participating practices or for patients that had transferred out of the practice; this is a limitation common to many validation studies. We were limited by the validity of our presumed gold standard data from the GP charts. The GPs were highly unlikely to find discharge summaries when a hospital admission did not actually take place (unlikely to misclassify false positives as true positives), however, if the charts were missing discharge summaries from true hospital admissions, or if the GPs were unable to find them, we may have misclassified some true positive hospital admission diagnoses in THIN as false positives. This, differential misclassification of our outcome would have tended to bias us toward underestimating the PPV. This project is strengthened by the fact that THIN GPs were not just recruited for this study, but have a longitudinal relationship with CSD MR. Responding to research queries is part of this relationship and they are financially compensated for their time and effort. We assumed that the single non-responding physician was missing at random, equally likely to be a true as a false pneumonia admission. If the missing chart was actually more likely to be a non-verified admission, then we may have slightly overestimated the PPV.

We had limited power to detect differences in PPV and date differences between the different hospital admission diagnoses included in this study. In addition, we had more power to validate the PPV of any hospital admission than we did to validate diagnosis-specific hospital admission.

In summary, THIN hospital admission codes performed well in identifying the timing of hospital admission events of interest. This study supports observational THIN studies regarding hospital admission outcomes for community acquired pneumonia. Future studies should pursue validating additional THIN outcomes, including sensitivity and specificity, and including children, further increasing the generalizability of our findings.

It is likely that electronic medical records will become increasing complex, potentially integrating patients' ambulatory and inpatient data. While this may improve the precision of admission diagnoses and dates, it could also introduce additional misclassification. We will need to continue to consider the precision of these clinical measures as we look forward to using these increasingly available data to help improve health outcomes.

Take-home messages

  • When using electronic medical record data to study drug use, hospital admissions are markers of severe outcomes.
  • To identify acute events within a specified time window related to acute exposure, it is important to able to ascertain the accuracy of hospital admission diagnoses and the precision of their dates.
  • THIN hospital admission codes performed well in identifying pneumonia hospital admissions and their timing.


Support for this project was provided by a grant from CSD MR, NIAID grants F32-AI-073015 and K24 AI073957, NCRR grant UL1-RR02-4134, and grant U18 HS016946 from the Agency for Healthcare Research and Quality. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality.


Acute Nonspecific Respiratory Tract Infection Diagnostic Codes

THIN Read CodeTHIN Read Code Description
H05..00Other acute upper respiratory infections
H051.00Acute upper respiratory tract infection
H05z.00Upper respiratory infection NOS
H05z.11Upper respiratory tract infection NOS
H00..00Acute nasopharyngitis
H02..00Acute pharyngitis
H02..13Throat infection – pharyngitis
H02z.00Acute pharyngitis NOS
H02..11Sore throat NOS
H060.00Acute bronchitis
H30..00Bronchitis unspecified


The authors have no conflicts of interest to disclose


1. Strom BL. How the US drug safety system should be changed. Jama. 2006;295(17):2072–5. [PubMed]
2. Shea S, Hripcsak G. Accelerating the use of electronic health records in physician practices. N Engl J Med. 2010;362(3):192–5. [PubMed]
3. Bourke A, Dattani H, Robinson M. Feasibility study and methodology to create a quality-evaluated database of primary care data. Inform Prim Care. 2004;12(3):171–7. [PubMed]
4. Nebeker JR, Hurdle JF, Bair BD. Future history: medical informatics in geriatrics. J Gerontol A Biol Sci Med Sci. 2003;58(9):M820–5. [PubMed]
5. Hunter D. First, gather the data. N Engl J Med. 2006;354(4):329–31. [PubMed]
6. Ray WA. Population-based studies of adverse drug effects. N Engl J Med. 2003;349(17):1592–4. [PubMed]
7. Classen D. Medication safety: moving from illusion to reality. Jama. 2003;289(9):1154–6. [PubMed]
8. Horsfall L, Rait G, Walters K, Swallow DM, Pereira SP, Nazareth I, Peterson I. Serum bilirubin and risk of respiratory disease and death. JAMA. 2011;305(7):691–7. [PubMed]
9. Gelfand J, Margolis DJ, Dattani H. The UK General Practice Research Database. In: Strom BL, editor. Pharmacoepidemiology. John Wiley & Sons, Ltd.; Chichester: 2005. pp. 337–46.
10. GPRD . Excellence in Public Health Research: Facts and Figures. Medicines and Healthcare Product Regulatory Agency; United Kingdom: 2006. [Accessed March 17, 2006]. , at
11. Margolis DJ, Bowe WP, Hoffstad O, Berlin JA. Antibiotic treatment of acne may be associated with upper respiratory tract infections. Arch Dermatol. 2005;141(9):1132–6. [PubMed]
12. Murray CJ, Frenk J. Health metrics and evaluation: strengthening the science. Lancet. 2008;371(9619):1191–9. [PubMed]
13. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4–14. [PMC free article] [PubMed]
14. Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the General Practice Research Database: a systematic review. Br J Gen Pract. 2010;60(572):e128–36. [PMC free article] [PubMed]
15. McPhee SJ, Nguyen TT, Shema SJ, et al. Validation of recall of breast and cervical cancer screening by women in an ethnically diverse population. Prev Med. 2002;35(5):463–73. [PubMed]
16. Lewis JD, Brensinger C, Bilker WB, Strom BL. Validity and completeness of the General Practice Research Database for studies of inflammatory bowel disease. Pharmacoepidemiol Drug Saf. 2002;11(3):211–8. [PubMed]
17. Van Staa T, Abenhaim L. The Quality of Information Recorded on a UK Database of Primary Care Records: A Study of Hospitalizations due to Hypoglycemia and Other Conditions. Pharmacoepidemiol Drug Saf. 1994;3(1):15–21.
18. Hammad TA, McAdams MA, Feight A, Iyasu S, Dal Pan GJ. Determining the predictive value of Read/OXMIS codes to identify incident acute myocardial infarction in the General Practice Research Database. Pharmacoepidemiol Drug Saf. 2008;17(12):1197–201. [PubMed]