Structural biology projects are highly variable and so there is not a universally applicable target optimisation strategy. However, certain criteria are generally useful. Target optimisation frequently draws upon overlapping information for the evaluation of both alternative constructs and putative homologues. Although NMR is an important technique for structure determination, as of January 2008 85% of all structures in the PDB (
18) had been solved by X-ray crystallography. As a consequence, obtaining crystals is a key stage in most structural biology pipelines. Modifying the construct sequence may influence crystallisation propensity, and alternative homologues may be examined since protein families commonly have members with a wide range of estimated crystallisation propensity (
3). The OB-Score (
3), ParCrys (
4) and Hydrophobicity/pI clustering (
43) are all harnessed by TarO to estimate crystallisation propensity, and so guide the evaluation of homologues. Proteins with transmembrane regions or significant disordered sequence are frequently problematic (
1,
17). Also, posttranslational modifications (PTMs) are commonly associated with protein disorder (
44). TarO assists with identification of sequences that are likely to contain these potentially troublesome, but biologically interesting, features. Transmembrane regions are predicted by TMHMM2 (
45), whilst protein disorder predictions are obtained from Disembl, GlobPlot and RONN (
36–38). Phosphorylation sites, as well as O-linked and N-linked glycosylation are, respectively, predicted by the programs NetPhos (
35) NetOglyc (
34) and NetNglyc (
http://www.cbs.dtu.dk/services/NetNGlyc/).
TarO also assists with the identification of protein domain boundaries, facilitated by an annotated MSA that is viewed in Jalview (
20,
21). The MSA annotations include matched domains from Pfam (
9,
10) and the conserved domains database (CDD) (
26,
27), combined with predicted protein disorder. Predicted transmembrane regions, signal peptide [SignalP (
33)], PTMs and secondary structure [JPred (
39,
40)] are also annotated on the MSA. Other useful information associated with the MSA is provided by the Jalview program. For example, Jalview automatically provides a display of residue conservation at each position of the alignment. In addition, Jalview provides the facility to query numerous Distributed Annotation System (
46) servers, and to display any returned annotation on the MSA. The various annotations associated with the MSA are useful to assist with the design of optimised constructs and identification of functionally important residues. Building upon this, a likely future development in TarO is the automated design and ranking of optimised construct sequences. Of course, the design of optimised construct sequences may also benefit from information provided by experimental methods such as limited proteolysis (
47).
Retaining the functional features that originally stimulated interest in the target is an important consideration during target optimisation. For example, removing part of an enzyme's active site might make crystals easier to obtain; although the resultant protein structure would be relatively ineffective for studies of the molecular mechanism of catalysis! The range of functional information provided by TarO aims to assist with identification and comparison of functional regions in protein sequences. A possible future direction is the automated evaluation of sequence features to provide more sophisticated prediction and analysis of the functional conservation for a given protein pair. These predictions could be useful in the context of target optimisation, for example by enabling more advanced protein ranking systems. Different projects have different sets of functional properties that are required to be retained in the optimised target sequence. However, all putative orthologues and homologues currently identified in TarO pass thresholds that aim to preserve a reasonable level of structural similarity (
24).
As a screening mechanism to avoid duplication of effort, the protein input and associated sequences are searched against the PDB (
18) and TargetDB (
25). The discovery of a similar structure in the PDB or TargetDB may be sufficient grounds to eliminate a potential target. On the other hand, identification of a known and related structure could be important; this may provide a model for molecular replacement calculations, or inform on components of multi-domain or multi-subunit systems.
In summary, TarO enables selection of sequences that are likely to be more amenable to structural studies and share functional similarity with the input sequence. Additionally, TarO provides information relevant for many of the structure determination pipeline stages, including design of optimised constructs. The use of TarO accelerates progress in structural proteomics by efficiently providing bioinformatics data to inform decision-making on the prioritisation and optimisation of potential targets. TarO simplifies the gathering, storage and retrieval of data and so frees up research time to make use of the information and to think creatively. Please cite TarO as well as the underlying algorithms and databases, as appropriate. Active development of TarO is continuing to include further analysis steps, improvements to the user interface, and integration with the Protein Information Management System (PIMS) a sister project in the BBSRC Structural Proteomics of Rational Targets (SPoRT) initiative. We also plan to make available a distribution of the TarO source code. We feel that community interactions with the TarO project can lead to further advancement and dissemination of best practices for target optimisation. Access to TarO is from
www.compbio.dundee.ac.uk/taro and we are grateful to receive feedback from users.