Search tips
Search criteria 


Logo of procamiasympLink to Publisher's site
Proc AMIA Symp. 2002 : 757–761.
PMCID: PMC2244274

Identification of patient name references within medical documents using semantic selectional restrictions.


De-identification of a patient's personal data from medical records is a protective legal requirement imposed before medical documents can be used for research purposes or transferred to other healthcare providers (e.g., teachers, students, tele-consultations). This de-identification process is tedious if performed manually, and is known to be quite faulty in direct search and replace strategies [9]. In this paper, we report on the identification step of this process. The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions. The semantic restrictions place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports. Maximum entropy classifiers are used to provide a probabilistic measure of the belief of a given candidate token to a given semantic restriction. We report on the design and preliminary evaluation of the system within the do-main of pediatric urology.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1023K), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Bui Alex A T, Dionisio John David N, Morioka Craig A, Sinha Usha, Taira Ricky K, Kangarloo Hooshang. DataServer: an infrastructure to support evidence-based radiology. Acad Radiol. 2002 Jun;9(6):670–678. [PubMed]
  • Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978 Oct;8(4):283–298. [PubMed]
  • Quantin C, Bouzelat H, Allaert FA, Benhamiche AM, Faivre J, Dusserre L. Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods Inf Med. 1998 Sep;37(3):271–277. [PubMed]

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association