PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bmcgenoBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Genomics
 
BMC Genomics. 2009; 10: 566.
Published online Nov 30, 2009. doi:  10.1186/1471-2164-10-566
PMCID: PMC2791105
Evaluating annotations of an Agilent expression chip suggests that many features cannot be interpreted
E Michael Gertz,corresponding author1 Kundan Sengupta,2 Michael J Difilippantonio,2 Thomas Ried,2 and Alejandro A Schäffer1
1Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, DHHS, Bethesda, MD-20892, USA
2Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD-20892, USA
corresponding authorCorresponding author.
E Michael Gertz: gertz/at/ncbi.nlm.nih.gov; Kundan Sengupta: senguptak/at/mail.nih.gov; Michael J Difilippantonio: difilipm/at/mail.nih.gov; Thomas Ried: riedt/at/mail.nih.gov; Alejandro A Schäffer: schaffer/at/ncbi.nlm.nih.gov
Received August 5, 2009; Accepted November 30, 2009.
Abstract
Background
While attempting to reanalyze published data from Agilent 4 × 44 human expression chips, we found that some of the 60-mer olignucleotide features could not be interpreted as representing single human genes. For example, some of the oligonucleotides align with the transcripts of more than one gene. We decided to check the annotations for all autosomes and the X chromosome systematically using bioinformatics methods.
Results
Out of 42683 reporters, we found that 25505 (60%) passed all our tests and are considered "fully valid". 9964 (23%) reporters did not have a meaningful identifier, mapped to the wrong chromosome, or did not pass basic alignment tests preventing us from correlating the expression values of these reporters with a unique annotated human gene. The remaining 7214 (17%) reporters could be associated with either a unique gene or a unique intergenic location, but could not be mapped to a transcript in RefSeq. The 7214 reporters are further partitioned into three different levels of validity.
Conclusion
Expression array studies should evaluate the annotations of reporters and remove those reporters that have suspect annotations. This evaluation can be done systematically and semi-automatically, but one must recognize that data sources are frequently updated leading to slightly changing validation results over time.
Articles from BMC Genomics are provided here courtesy of
BioMed Central