Here, we report on a gene-centric approach aimed to experimentally annotate all protein-coding genes of the human chromosome 21 using antibody-based profiling. The genome sequence analysis by the Ensembl group has in release 59 identified 192 non-keratin-associated putative genes coding for proteins on this chromosome, and these genes have been characterized on the protein level by antibody-based profiling, and the status was reported in a matrix. The overall aim is to fill this matrix with information on all levels to generate experimental evidence for molecular characterization, isoforms, subcellular localization, tissue profiles, and cell and tissue specificity and to contribute to the functional annotation of the proteome by identifying faulty annotated genes that do not code for proteins.
The study presented here has contributed to several insights of both general and specific interest. Five genes with no previous evidence on the protein level have been identified by molecular characterization (b), and the level of protein modifications has been studied using a new approach for isoelectric focusing based Western blot analysis. Although this analysis was performed only on a small number of genes, the results indicate that a large fraction of the analyzed proteins have multiple isoforms or post-translational modifications. In addition, the tissue profiling using immunohistochemistry has revealed several proteins with highly selective expression patterns.
The protein analysis has been complemented with transcript profiling using next generation sequencing. The results from this analysis provide a useful tool to yield evidence for protein-coding genes as demonstrated by the ratio of reads across introns and exons for a number of chromosome 21 putative genes with no previous evidence on the protein level. The power of the RNA-seq method for transcript analysis can also be further extended to define and characterize the alternative splice variants from each gene locus and to determine the quantitative levels of RNA expression in different cells, tissues, and organs.
At present, we report annotated protein expression using two or more (paired) antibodies for 22% of the genes on chromosome 21. An important priority for the future is to add additional antibodies to allow the results from one antibody to be validated by the other. It will also be important to extend the analysis with renewable antibodies, such as monoclonal antibodies or recombinant affinity binders to complement the polyclonal antibodies generated within the Human Protein Atlas program. In this context, it is reassuring that several programs have been initiated recently to develop new methods for systematic generation of renewable binders to human proteins (
30,
31). Another important objective is to extend the validation of the molecular and subcellular localization to include analysis of cell lines in which the gene has been knocked down using siRNA technology. The combination of gene knockdowns and antibody-based profiling is a powerful approach for generating profiling data with high reliability.
In conclusion, we describe a human proteome project to perform a systematic characterization of all the protein-coding genes on human chromosome 21 using antibody-based protein profiling. Through collaboration with research groups utilizing several complementary technologies, this effort can be integrated with similar efforts as part of a Human Proteome Project to characterize the proteins in normal cells, tissues and organs to generate a proteome-wide knowledge-based resource. The objective is to ultimately create an experimentally validated resource covering all proteins encoded by the human genome.