Eukaryotic cells contain numerous compartments, such as cytoplasm, mitochondria, Golgi apparatus, and peroxisomes, all of which contain different protein constituents and have different functions. Proteins are typically directed to these compartments by short peptide sequences that act as targeting signals. For example, secretory, chloroplast and mitochondrial targeting peptides are located at the N terminus, whereas signals for other compartments can be within the amino acid sequence. Terminal signal peptides are typically cleaved during the protein translocation process.
Protein function depends on numerous factors. One important but often neglected property is its subcellular localization. Translocation to the proper compartment allows a protein to form the necessary interactions with its partners and take part in biological networks. For example, signalling and metabolic pathways are dependent on the location of the constituent proteins. Failure to be transported to the correct intracellular compartment can have detrimental effects, which appear in different ways. Either the reaction performed or information carried by the protein does not reach the proper site, causing either inactivation of central reactions or misregulation of, eg, signalling cascades, or the mislocalized protein is active, but has harmful effects by acting in the wrong place.
Subcellular localization of proteins and peptides has long been investigated using numerous methods. Recently, high-throughput methods have been developed based either on the use of reporter genes/tags or by purification, fractionation and analysis of cellular compartments [1
]. Information on protein localization is scattered throughout publications and numerous databases. Fortunately, central resources such as the Human Protein Reference Database (HPRD) [3
], UniProt [4
] and Gene Ontologies [5
] now exist to integrate information from several sources. A problem with these databases, however, is that data quality and experimental methods vary. Further, some databases contain experimentally validated localization information whereas others also contain localization predictions. The picture is further complicated by the fact that a protein can be localized in more than one compartment, often depending on the state of the cell. Thus, databases that contain only experimentally validated data may not provide complete information for all proteins.
Numerous methods have been developed to predict protein subcellular localization (for review, see eg, [6
]). The very first methods in the 1970's were developed to identify microbial signal peptides [7
]. Now, methods and protocols exist for the prediction of over 10 cellular compartments and subcompartments. Although the actual prediction algorithms and methods differ, all are based on sequence signature patterns. Some general predictors are useful for all subcompartments, but the majority of methods are specific for individual compartments and organisms or groups of organisms. The reliability of individual methods is relatively high, close to 90% (see, eg, [9
Disease-causing mutations result in abnormal cellular function through numerous mechanisms. To date, pathological mechanisms have been revealed for only a fraction of all known mutations. Mutation information has been collected and stored in locus-specific (eg, [12
]) and general (such as Online Mendelian Inheritance in Man (OMIM) and Human Gene Mutation Database (HGMD)) databases. Many experimental methods are tedious, expensive and difficult to use. Disease-causing mutations are identified for diagnostic purposes, and thus most medical centers identify a genetic mutation(s) without acquiring further information about the protein. We and others have applied numerous bioinformatic methods to predict and explain the consequences of mutations. Recently, we discussed the applicability of some 40 analysis and prediction methods [14
]. The effects and consequences vary depending on the site and type of mutation, with insertions and deletions usually leading to truncated proteins. These cases are easy to explain if a substantial part of the protein is missing. To understand protein structure and function, however, missense mutations are most interesting because they often indicate residues that are critical for, and changes that are deleterious to, structure and/or function. Most mutations reduce protein activity, but increasing numbers of gain-of-function mutations [16
] are also being identified. Relatively few detailed investigations have described protein mislocalization due to disease-related mutations or introduced genetic alterations. In addition, all such publications report a limited number of mutations in a single protein.
Targeting signals tend to be conserved and thus sensitive to alterations; therefore, we can assume that these methods can be applied to the analysis of point mutations. Here we use bioinformatics to investigate the effects of known disease-related mutations on protein targeting and localization by analyzing 22,416 missense mutations. Several hundred putative localization mutations were identified with two complementary multiprediction approaches. The results indicate that although alterations to localization signals are rare, localization predictors should be added to the methods arsenal of a mutations analyst. Our results also suggest pathological mechanisms for a number of mutations and depict cases for further experimental investigation.