The recent explosion of sequence and structure information in the public databases has made possible surveys of proteins on a genomic scale. Structural genomics initiatives have rapidly increased our knowledge of protein structures, and this has provided a foundation for better elucidating reaction mechanisms and ligand binding, for understanding substrate specificity, and for creating many new three-dimensional (3D) structural models 
. One use of such information is to inform drug discovery. For example, Vidovic et al
examined human protein-tyrosine phosphatases as targets for rational drug design 
. Rational drug design has been shown to produce selective drugs: for example, a number of effective cancer drugs have been produced that have fewer side effects than traditional, cytotoxic drugs 
. Evaluating potential off-target effects is an important consideration in the process, and surveying homologs of target proteins can reveal unanticipated interactions 
. Conversely, some drugs show efficacy with unanticipated targets making them useful in treating diseases other than those for which they were designed. For example, Kinnings et al.
used a computational approach to compare proteins with similar binding sites to those of the targets of commercially available drugs and found that a drug approved for Parkinson's disease may be effective for treating tuberculosis 
. Thus, the larger context provided by examining the differences between structurally related proteins may aid in the design of more selective drugs, while study of their similarities can give clues for new starting points in drug design.
While the public databases are rich with sequence and structure data, retrieving specific data and synthesizing the information into views that are intuitively interpretable is not a trivial task, even for an experienced user of bioinformatics tools. The central idea behind our study is to take the approach illustrated by Vidovic et al
one step further and construct genome-wide views for more than one organism; specifically, for a host and parasite, allowing cross-species comparisons. We constructed these views using sequences and structures of the proteases of the pathogen Trypanosoma brucei
and its human host Homo sapiens
. The full diversity of a set of sequences or structures is often termed “sequence space” or “structure space.” To visualize the information, we used similarity networks, whereby sequences or structures are clustered graphically by similarity. Such networks represent a powerful way to visualize relationships across large sets of sequences and structures 
. To construct the structure similarity networks, existing crystal structures and homology models, as well as newly created models, were utilized.
Proteases were chosen for this computational study because a number of these proteins have been validated as druggable targets and many have available structures. Protease inhibitors are currently under investigation to treat various parasite infections, cancer, HIV, hypertension, and diabetes 
. Here, we employ the nomenclature of the protease database MEROPS 
in labeling proteases by the evolutionary units of family and clan (see Methods
). Proteases catalyze the hydrolytic breakdown or processing of proteins and account for about 2% of all expressed genes 
. The set of an organism's proteases expressed at a particular time or circumstance has been called its “degradome” 
. Here, we use the term to refer more generally to all the active proteases coded by an organism's genome.
The parasite degradome targeted in this study is that of the protist T. brucei
, which causes human African trypanosomiasis (“HAT”) or sleeping sickness, a disease that affects an estimated 50,000 to 70,000 people, mostly in sub-Saharan Africa 
. HAT is one of a number of ‘neglected’ tropical diseases that primarily afflict the poor 
. The few existing treatments for such diseases often have severe side effects. Without drug treatment, HAT is often fatal; yet the standard drug used to treat infection of the central nervous system is itself often lethal 
. T. brucei
is related to two other human pathogens, Trypanosoma cruzi and Leishmania major
, that share many physical characteristics 
. These three species are referred to as the “Tritryps” 
. As will be illustrated here for T. brucei,
knowledge about well-characterized proteins in the other Tritryp species is valuable for inferring characteristics about related but less well-characterized proteins in the target species, for example, by enabling the creation of homology models.
The objectives of this study were first to compare the array of proteases in the human host with that of the T. brucei pathogen and to determine the breadth of sequence space in each organism that was covered by 3D structure. Secondly, we aimed to use the similarities and differences within and between protease sequence and structure similarity groups to obtain insights into possible new targets for drug design. The global views produced here will also be useful for guiding phylogenetic and other more detailed studies comparing proteases of the parasite and its human host. We found that structure coverage of sequence space in human and parasite is broad, making global structural comparisons both feasible and informative. To illustrate how these results may be used to better understand structurally related human and parasite proteases, we include a detailed structural evaluation of two groups of parasite proteases that may have potential as new drug targets. For one of these protease targets, TbM32, we predicted and experimentally confirmed its inhibition by a known human drug.