|Home | About | Journals | Submit | Contact Us | Français|
Cell line identification is emerging as an essential method for every cell line user in research community to avoid using misidentified cell lines for experiments and publications. IGRhCellID (http://igrcid.ibms.sinica.edu.tw) is designed to integrate eight cell identification methods including seven methods (STR profile, gender, immunotypes, karyotype, isoenzyme profile, TP53 mutation and mutations of cancer genes) available in various public databases and our method of profiling genome alterations of human cell lines. With data validation of 11 small deleted genes in human cancer cell lines, profiles of genomic alterations further allow users to search for human cell lines with deleted gene to serve as indigenous knock-out cell model (such as SMAD4 in gene view), with amplified gene to be the cell models for testing therapeutic efficacy (such as ERBB2 in gene view) and with overlapped aberrant chromosomal loci for revealing common cancer genes (such as 9p21.3 homozygous deletion with co-deleted CDKN2A, CDKN2B and MTAP in chromosome view). IGRhCellID provides not only available methods for cell identification to help eradicating concerns of using misidentified cells but also designated genetic features of human cell lines for experiments.
Cell lines are important and essential reagents for almost every experiment in biomedical research. Because of indefinite growing capability and extensive use in research community, cell lines are frequently contaminated and misidentified in literatures (1–3). It has been recommended that researchers should provide authenticities of their experimental cell lines before publications (4–6).
The problem of using misidentified cell lines in published literatures has been recognized for several decades. It has been reported by international cell repositories that the incidence of cell line misidentification has been estimated in between 16% and 36% based on analysis of submitted cell lines (4). The most notorious example is the false description of HeLa cervical cancer cell line to several different origins and cell types including Chang liver as ‘normal liver cell’, KB as ‘oral cancer cell’, HEp-2 as ‘laryngeal cancer cell’, Int-407 as ‘non-transformed intestinal epithelial cell’ and so on (7–9). The consequences of using misidentified and contaminated cell lines not only generate erroneous and misleading results but also waste research funding and delay scientific progression. Currently, there is a list of contaminated or misidentified cell lines released in the websites of the international cell repositories to avoid using incorrect cell lines for experiments (5).
For cell identification, several methods were developed to detect cell line contamination or misidentification including isoenzyme analysis, karyotyping, human leukocyte antigen (HLA)-typing, immunotyping, DNA fingerprinting and short tandem repeat (STR) profiling. These methods can distinguish and match the cell line identity to the cell line specific profiles, but with various levels of ambiguity and limitations (4). Among these methods, STR profiling of human cell lines adopted from routing assays of paternity testing and forensic analysis is becoming the most recommended method for cell identification (4,10). A database, Cell Line Integrated Molecular Authentication (CLIMA), collecting the STR profiling of human cell lines is currently available for scientific community (11). However, there are disadvantages of STR profiling in its application for cell identification. First of all, the major limitation for STR profiling is its capability to detect cell contamination from other species because PCR primers of STR markers for authenticity were designed based on human sequence. Second, since majority of cell lines were established from malignant tissues, the gain and loss of cancer genomes increase the ambiguity and reduce the power to match a specific STR profile. Third, STR profiling is unable to distinguish sub-lines derived from the same cell line due to identical STR alleles. Finally, STR profiling requires commercial reagents, expensive instruments and genotyping software for data interpretation. Unless the cost of genotyping can be dramatically reduced and the genotyping experiments can be accessed in nearby core facility, using current methods including STR profiling for routing cell identification might not be able to ease the persistent problem of cell misidentification in our research community.
Since the efforts to eradicate the misidentification problem is unsuccessful, IGRhCellID is established to provide not only STR profiles of human cell lines as the most recommended assay for cell authenticity but also other authentic tools with conventional laboratory PCR or DNA sequencing assays for routing examination of proper cell identification. In addition, IGRhCellID can also allow researchers to search the available cell lines with designated genetic features and to identify common altered loci and genes overlapped in multiple cell lines.
IGRhCellID database contains integrated genomic information of 520 human cell lines annotated with eight different methods for cell identification. The conventional methods including cell line information of STR profile, gender, immunotypes, karyotype and isoenzyme profile were downloaded from common international cell repositories including ATCC, DSMZ, JCRB, ECACC and RIKEN BioResource Center. In addition, we provided cell line information of TP53 mutation data from UMD TP53 mutation database (12,13), somatic mutation data from Catalogue of Somatic Mutations in Cancer (COSMIC) database (14) and genome-wide amplicon and homozygous deletion (HD) profile from our laboratory. We recently established a comprehensive protocol to analyze copy number alterations (CNA) in cancer genomes using high density single nucleotide polymorphism (SNP) arrays with non-paired reference genomes (15). Based on our protocol, we were able to validate known and identify novel amplicons and HDs in 23 cancer cell lines and further identify novel cancer genes in hepatocellular carcinomas.
For annotation of genome-wide amplicons and HDs in human cell lines, we downloaded 520 independent genotyping data of Affymetrix 250K, 500K and 6.0 (1800K) SNP GeneChip array sets of human cell lines from Array Data Management System at National Cancer Institute of USA (caArray, 338 cancer cell lines) (16), Gene Expression Omnibus database (GEO, 182 cell lines) (17) and International HapMap project as normal reference control data. By using dChip analysis software (18), we smoothed each SNP data intensity by three continuous SNPs to create an inferred copy number (ICN) for each SNP and defined amplicons and HDs in greater than 3 and less than 0.5 ICN, respectively. Together, we identified a total of 13840927 amplified SNPs and 182343 loss SNPs located in amplicons and HDs, respectively in 520 human cell lines. To check our analysis protocol and data quality, we examined 11 known genes (LRP1B, FHIT, PARK2, PTPRD, CDKN2B, CDKN2A, PTEN, WWOX, CREBBP, TP53 and SMAD4) with reported small deletions in published literatures for validation (19–29). In 49 previously reported deletion events of above 11 genes in cancer cell lines, we have 96% (47/49) validation rate (Supplementary Table S1). Furthermore, we downloaded and integrated 950 microarray gene expression data sets of the 358 cell lines from caArray to show the concordance of altered gene expression with the corresponding aberrant amplicons and HD regions.
Cell lines can be selected by quick search, by searching the name in alphabetic order or by searching the categories of cell line origin based on body location/systems in National Cancer Institute. After selecting a cell line in cell line view, in addition to common STR profiling and other conventional methods, users can search its genome-wide altered SNPs in amplicons and HDs along the physical and cytogenetic locations of chromosomes (Figure 1). In genomic alteration profile of a selected cell line, user can continuously click on the alteration (green lines strand for HD and red lines strand for amplification) and retrieve SNPs for cell identification with multiple PCR or quantitative PCR reactions. Furthermore, the existence of somatic mutations in TP53 (UMD TP53 database) or other cancer genes (COSMIC) can provide additional support for authenticity.
The profiles of genome-wide amplicons and HDs not only provide a new authentic method of a cell line but also allow for selection of a designated cell line with requested genetic features (in gene or intergenic view). Moreover, common somatic alterations shared in multiple cell lines across cancer types can be displayed in chromosome view. For instance, after selection of cell lines in gene view, users can search a gene and align its genetic alterations in one or multiple cell lines. By using quick examples provided in IGRhCellID, users can find that SMAD4 a tumor suppressor gene in pancreatic carcinoma is deleted in FaDu (a pharynx squamous carcinoma cell line), COLO 201 (colorectal adenocarcinoma cell line) and SW 1573 (lung alveolar carcinoma cell line) with concordance of down regulation in gene expression. Only SMAD4 HD in FaDu was reported before but not COLO 201 and SW 1573 cells (19). The integrated information will provide the unique SMAD4 HD cell lines as human indigenous knock-out model for biological studies.
In intergenic view, after selection of cell lines, users can zoom out the aberrant amplicon or HD region to examine the boundary of aberrant locus and its affected neighboring genes. For instance, ERBB2 also called HER2 or NEU is a known oncogene in human breast cancer. When search ERBB2 in 26 breast cancer cell lines, users can observe only 20% (6/26) cell lines with amplification in the region. After selecting six amplified cell lines and zoom out to a 1Mb region surrounding ERBB2, users will observe boundaries in some breast cell lines including a smallest amplicon in HCC2218 cells containing eight genes and the largest amplicon in UACC-812 cells containing 29 genes. The amplification of oncogenic ERBB2 in these cell lines will provide not only positive control cell lines for ERBB2 expression but also natural cell models for studying the therapeutic efficacy of drugs against ERBB2 over-expression in breast cancer cells.
In chromosome view, IGRhCellID also allows users to search a cytogenetic region with overlapped somatic alterations in selected human cell lines across cancer types in detail resolution to gene and SNP levels. As indicated in our tutorial example, aberrant chromosome nine in six cell lines classified in endocrine system (five thyroid carcinoma and one adrenal gland carcinoma cell lines) were selected and aligned in chromosome view. Apparently, one overlapped HD region on 9p21.3 was detected in three cell lines. After clicking on the chromosome to enlarge the view, three known genes CDKN2A, CDKN2B and MTAP were co-deleted in the HD region.
In addition to providing information of STR profiles of 520 human cell lines, IGRhCellID integrated other authentic tools and genomic alterations for helping researchers to validate their experimental cell lines using conventional methods, to select suitable cell lines with proper genetic background for their experimental designs and to support the long standing efforts for eradicating the concerns of using misidentified cell lines. Using authentic tools to maintain correct cell identification is even more critical in fields of establishing and using stem cell lines for therapeutic applications. Although integration of available authentic tools and profiles of genomic alterations of human cell lines in IGRhCellID provides convenient and accessible methods for cell identification, additional efforts to apply statistic analysis for obtaining the discrimination power of cell authenticity using either one or combined authentic methods should be studied. Nevertheless, IGRhCellID will continue to collect available authentic data for all available cell lines derived from human and other species. Our efforts should help resolving the crisis of using misidentification cell lines and improve the selection of proper cell lines for designated experiments.
Supplementary Data are available at NAR Online.
National Research Program for Genomic Medicine (NRPGM), National Science Council, Taiwan (grant numbers NSC98-3112-B-001-004 and NSC98-3112-B-001-031). Funding for open access charge: Institute of Biomedical Sciences, Academia Sinica.
Conflict of interest statement. None declared.