Search tips
Search criteria 


Logo of amjepidLink to Publisher's site
Am J Epidemiol. 2010 January 1; 171(1): 123–128.
Published online 2009 December 6. doi:  10.1093/aje/kwp352
PMCID: PMC2796985

Identification of Patients With Nonmelanoma Skin Cancer Using Health Maintenance Organization Claims Data


Cancer registries usually exclude nonmelanoma skin cancers (NMSC), despite the large population affected. Health maintenance organization (HMO) and health system administrative databases could be used as sampling frames for ascertaining NMSC. NMSC patients diagnosed between January 1, 1988, and December 31, 2007, from such defined US populations were identified by using 3 algorithms: NMSC International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, NMSC treatment Current Procedural Terminology (CPT) codes, or both codes. A subset of charts was reviewed to verify NMSC diagnosis, including all records from HMO-enrollee members in 2007. Positive predictive values for NMSC ascertainment were calculated. Analyses of data from 1988–2007 ascertained 11,742 NMSC patients. A random sample of 965 cases was selected for chart review, and NMSCs were validated in 47.0% of ICD-9-CM–identified patients, 73.4% of CPT-identified patients, and 94.9% identified with both codes. All charts from HMO–health plan enrollees in 2007 were reviewed (n = 1,116). Cases of NMSC were confirmed in 96.5% of ICD-9-CM–identified patients, 98.3% of CPT-identified patients, and 98.7% identified with both codes. HMO administrative data can be used to ascertain NMSC with high positive predictive values with either ICD-9-CM or CPT code, but both codes may be necessary among non-HMO patient populations.

Keywords: claims analysis, databases, factual, health maintenance organizations, insurance claim review, neoplasms, basal cell, neoplasms, squamous cell, population surveillance, skin neoplasms

Nonmelanoma skin cancers (NMSC), including squamous cell carcinoma and basal cell carcinoma, were estimated to account for more than 1 million cases of cancer in the United States in 2008 (1). The total direct and indirect cost of NMSC in the United States, for 2004, has been estimated to be $2.4 billion (2). Despite the large population affected and resultant personal, societal, and economic burdens, the epidemiology of NMSC is understudied.

Many factors limit investigation of NMSC. Large cancer registries, including the National Cancer Institute–sponsored Surveillance, Epidemiology, and End Results (SEER) Program, often exclude NMSC partly because of the burden imposed in ascertaining the large number of cases and partly because, in most instances, these cancers are associated with low mortality. Secondary data analysis has also been limited, and the few estimates of NMSC incidence in the United States have utilized information from large surveys or data from contained health systems (36). No established methods are defined in the literature for ascertaining NMSC in secondary data. In an attempt to facilitate and further the investigation of NMSC, we designed this study to define and compare algorithms for identifying NMSC using the computerized administrative claims–based data set of a large health care system provider and its affiliated health maintenance organization (HMO) with chart-review validation of diagnosis.


As a member of the Cancer Research Network, a consortium of 14 US integrated health care systems sponsored by the National Cancer Institute that collects health care information for more than 13 million patient-members, the Henry Ford Health System (Detroit, Michigan) utilizes a data resource utility called the Virtual Data Warehouse, which is based on common structure programming files that can be shared across Cancer Research Network sites (7). This research was designed as part of the development of an NMSC case ascertainment algorithm using electronic databases that comply with data standards as defined in the Cancer Research Network's Virtual Data Warehouse (

A computerized administrative database of members of the Health Alliance Plan, a large HMO in southeastern Michigan owned by the Henry Ford Health System, and an outpatient database of individuals with other means of paying medical expenses seen by salaried physicians of the Henry Ford Medical Group, were used to identify NMSC patients. The health system reflects the diverse sociodemographic background of the metropolitan area, with 10 hospitals and 70 clinics dispersed throughout the tricounty metropolitan area, with a few exceptions.

For the Health Alliance Plan, as with many HMOs, our population includes a high proportion of individuals less than 65 years of age, who are likely to have at least one employed family member, with corresponding modest increases in employment status, income, and general health status. Approximately 30% of Health Alliance Plan enrollees are 24 years of age or younger, 56% are 25–64 years of age, and 13% are older than 65 years of age. Slightly more than half of the enrollees are female (55%); 67% are Caucasian, 28% are African American, and 5% are all other races and ethnicities. The Henry Ford Health System contains health information on more than 450,000 such members during the period of interest, and virtually all health care received by these members should be captured because this health system group is closed. As of 2006, the staff model health plan used for this study had an enrollment of 295,000, with a 1-year retention rate of 84% and a 5-year retention rate of 56% (8).

Patients who were diagnosed with NMSC, as identified with diagnosis and procedure codes, were selected from individuals seen by the staff of the Henry Ford Medical Group between January 1, 1988, and December 31, 2007. Generally, NMSCs require either destruction or excision and no chemotherapy or radiation therapy and hence are treated primarily in the outpatient setting. For this reason, the database was restricted to only outpatient encounters. However, note that, at the Henry Ford Health System, both inpatient and outpatient histopathology is processed and reviewed similarly; no inpatient cases were identified.

Analyses examined the entire cohort of all patients (regardless of health plan) and by HMO members only. A random subset of patients’ electronic medical records identified by randomly selecting 25 cases per year from the entire cohort regardless of payer and including HMO (n = 1,503) and all electronic medical records from all Health Alliance Plan members enrolled in 2007 (n = 1,116) were individually reviewed by a trained abstractor to confirm biopsy-proven NMSC diagnosis. We selected one year (2007) to conserve costs yet comprehensively review a contiguous time period that represented the most recent experience. Records including appointment history, pathology reports, and all office visit documentation within 90 days of the date of service were reviewed for confirmation of the diagnosis of NMSC. If documentation of NMSC either by pathology report or as recorded in office visit dictation was not found in the electronic medical record for that date, the case was considered missing (n = 538) and was excluded from analyses.

Three different algorithms were defined, with subsequent manual electronic medical record abstraction for a subset of cases to evaluate the accuracy of ascertainment of NMSC. Only initial cases that occurred during the study period were included (subsequent diagnoses in the same individual, which is common with NMSC, were excluded). The first algorithm we used reported International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis codes alone to identify cases. ICD-9-CM is a clinical modification of the World Health Organization's International Classification of Diseases, Ninth Revision (9). ICD-9-CM classifies morbidity and mortality information for the purposes of health statistics, indexing of hospital records by disease and operations, and storage and retrieval of health data. ICD-9-CM was developed in 1977 from earlier versions starting in the 1950s, and it continues to be a useful tool for disease classification, indexing, and epidemiologic evaluation. ICD-9-CM codes used for identifying NMSC included 173.0–173.9. (Table 1).

Table 1.
ICD-9-CM Diagnostic and CPT Treatment Codes Utilized for Claims-based Identification of Nonmelanoma Skin Cancer in HMO Claims Files

The second algorithm examined Current Procedural Terminology (CPT) codes for treatment of nonmelanoma cancers of the skin alone to identify NMSC cases. CPT is a set of 5-digit codes, published annually by the American Medical Association, used to describe procedures and services performed by health care providers and simplify the reporting of services delivered for health insurance reimbursement purposes (10). First printed in 1966, CPT is reviewed annually and modified as necessary. This study utilized the most current version of the CPT, the Fourth Edition, first printed in 1977. CPT codes for malignancies are defined in Table 1.

The third algorithm required both a recorded ICD-9-CM code(s) and CPT code(s) to define cases. For the “ICD-9-CM/CPT combined” method, the ICD-9-CM codes included 173.0–173.9, and the CPT codes were identical to those used in the second algorithm (Table 1).

The positive predictive value (PPV; the probability that a person with a positive “test” actually has the condition of interest) of each algorithm was then calculated (the number of true NMSC cases divided by the sum of both true NMSC cases and falsely identified patients), along with corresponding 95% confidence intervals for both the HMO population and the general any-payer population. False-positive cases were ascertained and classified by type of coding error.


From January 1, 1988, to December 31, 2007, 11,742 unique NMSC patients were identified by NMSC identification methods among the all-insurer, including HMO, population; 10,782 of these patients could be ascertained with NMSC by ICD-9-CM codes, 8,246 patients by CPT codes, and 7,285 patients by both ICD-9-CM and CPT codes. A random sample of 965 patients’ charts were selected for review, which demonstrated that 47.0% of patients identified by ICD-9-CM codes had NMSC verified on histopathology report (95% confidence interval: 0.418, 0.522), and 73.4% of patients identified by CPT codes had pathologically confirmed NMSC (95% confidence interval: 0.686, 0.781). Pathologic confirmation was demonstrated for 94.9% of patients identified by using both ICD-9-CM and CPT codes (95% confidence interval: 0.925, 0.974) (Table 2).

Table 2.
Identified and Confirmed Nonmelanoma Skin Cancer Cases and Positive Predictive Values, by Algorithm, 1988–2007 All-Payer Claims Data, Henry Ford Health System, Detroit, Michigan

When the population was limited to HMO members enrolled from January 1, 2007, to December 31, 2007, only, 1,116 unique patients were identified: 1,110 of them could be identified by ICD-9-CM code, 1,079 by CPT code, and 1,073 when both ICD-9-CM and CPT codes were required. After electronic medical record review, the diagnosis of NMSC was confirmed by histologic criteria for 96.5% of ICD-9-CM–identified patients (n = 1,071), 98.3% of CPT-identified patients (n = 1,061), and 98.7% of patients identified by using both codes (n = 1,059) (Table 3).

Table 3.
Identified and Confirmed Nonmelanoma Skin Cancer Cases and Positive Predictive Values, by Algorithm, 2007 HMO Health Plan Enrollee Claims Data, Henry Ford Health System, Detroit, Michigan

The reasons for misclassification of the noncases varied by identification algorithm (Tables 4 and and5).5). In the 1988–2007 all-payer sample, the most common reason for misidentification by either ICD-9-CM alone (not in the presence of CPT) or CPT alone (not in the presence of ICD-9-CM) was use of the code for benign lesions. Surgical defect repair incorrectly linked to neoplasm accounted for the majority of misidentification when both ICD-9-CM and CPT were used. In 2007, the majority of misidentification by ICD-9-CM alone was due to follow-up/postoperative or other visit of an individual with a history of NMSC, whereas the misclassifications related to the CPT algorithm were due to the use of these codes in treatment of melanocytic lesions, including atypical nevi, melanoma, and melanoma in situ. The majority of lesions identified by both ICD-9-CM and CPT that were classified as inaccurate were visually diagnosed because they were destroyed with no histologic evidence of malignancy (i.e., no biopsy was conducted).

Table 4.
Characteristics of False-Positive Cases of Nonmelanoma Skin Cancer Identified in a Random Sample of 1988–2007 All-Payer Claims Data, by Identification Method, Henry Ford Health System, Detroit, Michigan
Table 5.
Characteristics of False-Positive Cases of Nonmelanoma Skin Cancer Identified in 2007 HMO Health Plan Enrollee Claims Data, by Identification Method, Henry Ford Health System, Detroit, Michigan


Computerized claims-based administrative databases have the potential to ascertain cases of NMSC with high PPVs. Our findings suggest that NMSC patient identification by using an algorithm requiring that both a CPT and ICD-9-CM code be recorded may be more accurate compared with either ICD-9-CM or CPT code identification, especially in the all-payer population, but that it also identifies fewer cases than either method alone. Identification of true cases may be improved when this algorithm is applied to a closed health plan. These findings are of particular interest because they may help ascertain NMSC cases in administrative data by providing investigators with information on case selection criteria possibilities and limitations. It is also hoped that this information may facilitate the study of NMSC, the most common malignancy in the United States but one that is excluded from most large cancer registries, hampering investigation and epidemiologic surveillance.

The utility of using ICD-9-CM or CPT codes to identify NMSC, similar to other cancers, has limitations, including the potential for misclassification (11). Unlike more commonly studied cancers such as breast cancer, for which multiple identification algorithms have been defined and compared, there are no published identification methods for identifying NMSC cases (1216). There is no validated, established, or accepted method for ascertaining cases of NMSC with secondary data analysis. Thus, this study makes an important contribution in filling this void.

Because of the void in the literature regarding NMSC ascertainment methodology, our findings must be compared with other, more commonly studied cancer ascertainment results. Koroukian et al. (17) investigated breast cancer ascertainment in Ohio Medicaid claims data and compared several ICD-9-CM and CPT algorithms, alone and in varying combinations, for identifying cases. Similar to our findings, Koroukian et al. reported differing PPVs based on the algorithm used. The use of CPT codes alone for mastectomy/lumpectomy alone resulted in very low PPVs for breast cancer (PPV = 0), as did ICD-9-CM alone for breast cancer (PPV = 15.1). The investigators found much higher PPVs when both breast cancer CPT codes and ICD-9-CM codes were required for case ascertainment, with the highest PPV found for breast cancer ICD-9-CM plus mastectomy/lumpectomy CPT codes (PPV = 86.6). These differences in PPV based on algorithm are consistent with our findings.

The lack of investigation of NMSC ascertainment strategies is compounded by the difficulty in studying these cancers by any means. Studies of NMSC are challenging because these cancers are not routinely included in cancer registries throughout the world (18). In the few, mostly European, registries that include NMSC, there is discordance and disagreement regarding reporting (19). Epidemiologic estimates of NMSC are often based on surveys not performed regularly; the last US national survey was conducted in 1977–1978 (20). Information from contained health systems may provide the most cost-effective means of studying and tracking descriptive trends for these skin cancers, as well as serving as bases for case-control and cohort studies. The HMO Cancer Research Network has extensive experience in creating and using such databases for research purposes, and further investigation of these methods in other Cancer Research Network participant sites is needed.

The need for a means to ascertain NMSC for study intensifies because skin cancer is increasing in terms of societal burden (2, 4, 21). NMSC is a significant public health concern, and the available data would suggest that, in the last 30 years, the incidence of squamous cell carcinoma has risen 3%–10% per year while the incidence of basal cell carcinomas has risen 20%–80% (2124). It is difficult to determine the cost of these cutaneous malignancies, although estimates suggest that the economic burden is high. It has been estimated that the United States spends more than $2 billion each year treating cutaneous malignancies other than melanoma (25). Although mortality is low, the morbidity, including disability and disfigurement, that may result from these malignancies and their treatment can have resultant economic and psychosocial implications (26).

There are notable limitations to this analysis. In terms of generalizability, this study was limited to a single institution. It is possible that, because the non-HMO enrollee cohort spans a larger time period, temporal changes may also play a role in our findings. Administrative claims data cannot distinguish subtypes of NMSC, including the main categories of squamous cell versus basal cell carcinoma of the skin, which have very different proposed etiologies and risk factors. We were unable to validate our data against an existing tumor registry and hence were unable to identify false negatives or calculate the sensitivity or specificity of NMSC ascertainment. However, we believe that the void in investigation of NMSC, including case identification methodology, stems from many of the limitations described above, and, with this study, we propose to begin the difficult, although not impossible, process of improving systematic ascertainment of NMSC.

We noted differences in PPVs between the HMO enrollees and all-payer patients that are likely explained by incomplete claims. Incomplete claims can result from patients referred from outside providers into the health system (e.g., referral from an outside physician to a medical-group Mohs dermatologic surgeon) or patients who elect to be treated elsewhere (e.g., patients diagnosed by a medical-group provider who elect treatment by an outside, non-HMO provider and facility). However, these findings of lower PPV with incomplete claims are consistent with those from previous studies. In an investigation of ICD-9-CM and CPT codes identification of breast cancer in Medicaid claims data, Koroukian et al. (17) found differing PPVs based on completeness of claims data. Sensitivity rates were lower for partial-year Medicaid enrollment compared with full-year enrollment, consistent with Medicare data analysis that also has shown improved rates with more complete data (17, 27).

In conclusion, this study was designed to define and compare algorithms for identifying nonmelanoma cancers of the skin by using the computerized databases of a large health system and to validate these algorithms through careful chart review of electronic medical records. We found that, in the health system setting, identifying incident NMSC is possible by using administrative data with high PPVs. These algorithms need to be evaluated in other settings with similar data, and ideally in a population with an associated tumor registry that includes NMSC and could be used as a “gold standard” to determine sensitivity and specificity.


Author affiliations: Department of Dermatology, Henry Ford Hospital, Detroit, Michigan (Melody J. Eide, Henry W. Lim); Department of Biostatistics and Research Epidemiology, Henry Ford Hospital, Detroit, Michigan (Melody J. Eide, Richard Krajenta, Dayna Johnson, Jordan J. Long, Gordon Jacobsen, Christine C. Johnson); and Division of Research, Kaiser Permanente Northern California, Oakland, California (Maryam M. Asgari).

M. J. E.’s work was supported by a Dermatology Foundation Career Development Award in Health Care Policy. Funding from the National Cancer Institute to the Cancer Research Network (U19 CA079689) supported the involvement of R. K., D. J., and C. C. J.; and funding by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (K23 AR 051037) supported the work of M. M. A.

Conflict of interest: none declared.



Current Procedural Terminology
health maintenance organization
International Classification of Diseases, Ninth Revision, Clinical Modification
nonmelanoma skin cancers
positive predictive value


1. Jemal A, Siegel R, Ward E, et al. Cancer statistics, 2008. CA Cancer J Clin. 2008;58(2):71–96. [PubMed]
2. Bickers DR, Lim HW, Margolis D, et al. The burden of skin diseases: 2004. A joint project of the American Academy of Dermatology Association and the Society for Investigative Dermatology. J Am Acad Dermatol. 2006;55(3):490–500. [PubMed]
3. Hoy WE. Nonmelanoma skin carcinoma in Albuquerque, New Mexico: experience of a major health care provider. Cancer. 1996;77(12):2489–2495. [PubMed]
4. Athas WF, Hunt WC, Key CR. Changes in nonmelanoma skin cancer incidence between 1977–1978 and 1998–1999 in Northcentral New Mexico. Cancer Epidemiol Biomarkers Prev. 2003;12(10):1105–1108. [PubMed]
5. Karagas MR, Greenberg ER, Spencer SK, et al. Increase in incidence rates of basal cell and squamous cell skin cancer in New Hampshire, USA. New Hampshire Skin Cancer Study Group. Int J Cancer. 1999;81(4):555–559. [PubMed]
6. Miller DL, Weinstock MA. Nonmelanoma skin cancer in the United States: incidence. J Am Acad Dermatol. 1994;30(5 pt 1):774–778. [PubMed]
7. Hornbrook MC, Hart G, Ellis JL, et al. Building a virtual cancer research organization. J Natl Cancer Inst Monogr. 2005;35:12–25. [PubMed]
8. The HMO Cancer Research Network: Capacity, Collaboration, and Investigation. Bethesda, MD: National Cancer Institute; 2008. (NIH publication 08-6448)
9. Hart AC, Ford B, editors. ICD-9-CM Expert for Physicians Volumes 1 & 2. 7th ed. Amsterdam, the Netherlands: Elsevier Health Sciences; 2007.
10. Beebe M, Dalton JA, Espronceda M, editors. Current Procedural Terminology CPT 2008 Standard Edition. 4th ed. Chicago, IL: American Medical Association; 2007.
11. Bivens MM, Bhosle M, Balkrishnan R, et al. Nonmelanoma skin cancer: is the incidence really increasing among patients younger than 40? A reexamination using 25 years of U.S. outpatient data. Dermatol Surg. 2006;32(12):1473–1479. [PubMed]
12. Baldi I, Vicari P, Di Cuonzo D, et al. A high positive predictive value algorithm using hospital administrative data identified incident cancer cases. J Clin Epidemiol. 2008;61(4):373–379. [PubMed]
13. Gold HT, Do HT. Evaluation of three algorithms to identify incident breast cancer in Medicare claims data. Health Serv Res. 2007;42(5):2056–2069. [PMC free article] [PubMed]
14. Leung KM, Hasan AG, Rees KS, et al. Patients with newly diagnosed carcinoma of the breast: validation of a claim-based identification algorithm. J Clin Epidemiol. 1999;52(1):57–64. [PubMed]
15. Nattinger AB, Laud PW, Bajorunaite R, et al. An algorithm for the use of Medicare claims data to identify women with incident breast cancer. Health Serv Res. 2004;39(6 pt 1):1733–1749. [PMC free article] [PubMed]
16. Rolnick SJ, Hart G, Barton MB, et al. Comparing breast cancer case identification using HMO computerized diagnostic data and SEER data. Am J Manag Care. 2004;10(4):257–262. [PubMed]
17. Koroukian SM, Cooper GS, Rimm AA. Ability of Medicaid claims data to identify incident cases of breast cancer in the Ohio Medicaid population. Health Serv Res. 2003;38(3):947–960. [PMC free article] [PubMed]
18. Parkin DM, Whelan SL, Ferlay J, editors. Cancer Incidence in Five Continents. Vol. VII. Lyon, France: International Agency for Research on Cancer; 1997. (IARC scientific publication no. 143)
19. Marcil I, Stern RS. Risk of developing a subsequent nonmelanoma skin cancer in patients with a history of nonmelanoma skin cancer: a critical review of the literature and meta-analysis. Arch Dermatol. 2000;136(12):1524–1530. [PubMed]
20. Fears TR, Scotto J. Changes in skin cancer morbidity between 1971–72 and 1977–78. J Natl Cancer Inst. 1982;69(2):365–370. [PubMed]
21. Housman TS, Feldman SR, Williford PM, et al. Skin cancer is among the most costly of all cancers to treat for the Medicare population. J Am Acad Dermatol. 2003;48(3):425–429. [PubMed]
22. Mikkilineni R, Weinstock MA. Epidemiology. In: Sober AJ, Haluska FG, editors. Atlas of Clinical Oncology: Skin Cancer. London, United Kingdom: BC Decker, Inc; 2001. pp. 1–15.
23. Cook J, Zitelli JA. Mohs micrographic surgery: a cost analysis. J Am Acad Dermatol. 1998;39(5 pt 1):698–703. [PubMed]
24. Jemal A, Murray T, Samuels A, et al. Cancer statistics, 2003. CA Cancer J Clin. 2003;53(1):5–26. [PubMed]
25. Chuang TY. Skin cancer II: nonmelanoma skin cancer. In: Williams HC, Strachan DP, editors. The Challenge of Dermato-Epidemiology. Boca Raton, FL: CRC Press, Inc; 1997. pp. 209–222.
26. Blackford S, Roberts D, Salek MS, et al. Basal cell carcinomas cause little handicap. Qual Life Res. 1996;5(2):191–194. [PubMed]
27. Cooper GS, Yuan Z, Stange KC, et al. The sensitivity of Medicare claims data for case ascertainment of six common cancers. Med Care. 1999;37(5):436–444. [PubMed]

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press