|Home | About | Journals | Submit | Contact Us | Français|
Persistent hurdles impede the successful determination of high-resolution crystal structures of eukaryotic integral membrane proteins (IMP). We designed a high-throughput structural genomics oriented pipeline that seeks to minimize effort in uncovering high-quality, responsive non-redundant targets for crystallization. This “discovery-oriented” pipeline sidesteps two significant bottlenecks in the IMP structure determination pipeline: expression and membrane extraction with detergent. In addition, proteins that enter the pipeline are then rapidly vetted by their presence in the included volume on a size-exclusion column – a hallmark of well-behaved IMP targets. A screen of 384 rationally selected eukaryotic IMPs in baker’s yeast Saccharomyces cerevisiae is outlined to demonstrate the results expected when applying this discovery-oriented pipeline to whole-organism membrane proteomes.
Can a high-output structural genomics style pipeline be deployed successfully to obtain structures of eukaryotic integral membrane proteins? If so, how would such a pipeline be developed and implemented? What leaks and bottlenecks should one expect?
These questions, and many others, provided some of the framework for discussions at an NIH sponsored workshop in April of 2008. A stated objective of the workshop was to address “the challenges and technical barriers to the high-throughput determination of protein structures” (http://www.team-psa.com/NIGMS-BottlenecksWorkshop/). Indeed, integral membrane proteins, because of their added complexity, have often fallen outside the high-throughput pipelines within the Protein Structure Initiative (PSI). Efforts within PSI-2 specialized centers are starting to yield positive dividends for membrane proteins (1) yet generic protocols for the reliable expression, membrane extraction with detergent, purification and even crystallization of this important class of targets are still being developed. In general, discussions relating to IMP high-throughput structure determination focus both on efficient implementation of established methods and the development of novel tools and approaches. The current mini-review will focus on common impasses and hurdles along the road to eukaryotic IMP structure determination and the implementation of existing methodologies within a discovery-oriented pipeline. This pipeline was the topic of an oral presentation at the NIH Bottlenecks Workshop mentioned above.
The Pareto Principle, whereby highest value derives from small numbers of most favorable cases, is used to describe the unequal relationship between inputs and outputs within systems (2). This principle is applicable to efforts in pursuing the structures of novel eukaryotic IMPs since the majority of the effort leading to a structure is often spent troubleshooting. These diagnostic ventures typically include exploration of various expression systems, solubilization detergents, orthologous proteins, purification and buffer conditions, ligands and stabilizing mutations, and crystallization optimization. The current body of IMP structures bears this out with the broad range of methods employed to obtain, purify and crystallize the protein of interest. These broad and inclusive applications also require heroic effort, time and expense. All of the approximately 94 unique α-helical IMP structures (Supplementary Table) currently available have resulted from protracted efforts focused on a specific functional class or family of protein. Examples of this approach are the β2-adrenergic receptor (3–5), or the Kv1.2 potassium channel (6). What characterizes this intensive approach is the progression and modification of methods to a specific target protein of known function. These efforts, though often time consuming and laborious, have been increasingly successful in producing novel IMP structures (Figure 1). Unfortunately, such an intensive approach is inherently untenable in high- or medium-throughput pipelines as streamlining the process is not readily possible. Thus, if one is strictly interested in structural genomics-directed efforts for the elucidation of novel eukaryotic IMP structures then two apparent directions can be pursued: 1) develop novel methods and reagents to streamline the process or 2) use standard methods and reagents in a novel, streamlined way. In addressing the latter, we have asked what methods and systems have worked to date in generating novel structures, and how these can be used in the discovery-oriented pipeline of a structural genomics approach to IMPs.
A discovery-oriented pipeline can be constructed by harvesting successful methods from the IMP literature to develop pragmatic criteria that supervise the flow of targets. This is a key distinction – a discovery-oriented pipeline constitutes a progression of targets (proteins) with standardized methods while a systems-oriented approach requires the development of methods and protocols to facilitate the progression of a specific target. In general, there are four typical hurdles to overcome in pursuing eukaryotic IMP structures: (1) producing sufficient quantities of functional protein, (2) finding an appropriate detergent for membrane extraction of the target protein, (3) purification of the target protein, and (4) optimizing initial crystal hits to obtain sufficient resolution data for structure determination. Finding a truly suitable formula that matches multiple IMPs is difficult. If one were to implement a structural genomics style pipeline for eukaryotic IMP targets, could compromises be identified that still result in high-output results while retaining information on the probability for increased success through additional tuning? We have shown that there are and published results serve as an instructive guide.
Yeast is a viable system for homologous or heterologous overexpression of eukaryotic integral membrane proteins. To date, seven of the thirteen heterologously expressed integral membrane protein structures were produced in some form of yeast (4, 6–11). This may not be entirely surprising as “yeast”, especially Pichia pastoris or Saccharomyces cerevisiae, have a number of advantages for the production of eukaryotic IMP targets including: proper membrane targeting and insertion machinery, capabilities for post-translational modifications and a lower activation barrier and cost relative to other systems (e.g. sHEK293S or baculovirus). Furthermore, S. cerevisiae is attractive because of its applicability to high-throughput cloning and expression trials via episomal plasmids (12, 13). An extensive library of non-lethal knockouts in the ‘Yeast Knockout’ deletion collection (14) makes for a well-characterized platform for downstream functional studies (15). The observation that yeast is a suitable system for the overexpression of eukaryotic IMPs is in itself not new (12, 15, 16), yet worth reiterating for the purposes of this discussion. Thus, baker’s yeast is an appropriate system to utilize in a general eukaryotic IMP structure determination pipeline. The S. cerevisiae system is not expected to be optimal for every eukaryotic protein of interest but for generalized and streamlined screens it can produce reasonable indications of IMPs that seem more tractable. In addition, these results can be obtained relatively quickly – moving from initial cloning to expression/solubilization trials in just two weeks.
To date, DDM has been heavily favored in detergent-mediated extraction and purification of heterologously expressed eukaryotic IMP structures, accounting for nine out of the thirteen known (4, 6, 8–10, 17–20). Thus, if only one detergent is used, detergent n-dodecyl-β-D-maltoside (DDM) is an effective choice for broad screens of eukaryotic IMP targets both in its ability to solubilize specific targets and not impede initial crystallization. For pursuing prokaryotic targets an appropriate choice may be n-octyl-β-D-glucoside (OG). DDM is a compromise between shorter chain, and often less effective, detergents like OG and longer chain, and often more effective, zwiterionic detergents such as n-dodecylphosphocholine (FC-12). DDM is sufficiently strong to extract targets from the membrane while not being astringent enough to denature or inactivate them. In addition, DDM has been shown to encompass the larger area of Dumont’s Venn diagram (16), or the full range of target solubilization of other detergents that are more amenable to crystallization, namely DM, NG, and OG. Since detergent exchange can be considered later in the purification stage of the pipeline, DDM can comfortably be used as the sole detergent to initially solubilize targets while refining alternative detergent/lipid mixtures for crystallization. Using DDM further reduces the cost of these broad screens since it has a relatively low critical micelle concentration (CMC ≈ 0.12 mM in 200 mM NaCl) thereby requiring substantially less detergent in solubilization and purification buffers. Thus, using a single detergent for solubilization is both economic of protein source, and amenable to the high-throughput extensive phase of our discovery-oriented pipeline.
Implementation of streamlined purification protocols for eukaryotic IMPs is very challenging. Individual targets often require personalized attention to ensure stability and monodispersity within a given buffer condition and detergent. Fortunately, purification methods for IMP targets are largely the same as those for soluble proteins, with addition of detergent and lipids to protein containing micelles. Attendant complexities occur as a result of excess protein-free micelles, multiple aggregation states, instability, or precipitation of target protein.
Cleavable poly-histidine affinity tags greatly facilitate initial purification steps using immobilized metal affinity chromatography. Because of signal peptide processing considerations, carboxyl-terminal tags need to be chosen over amino-terminal ones in a moderately broad screen of IMPs. Enzyme inhibition in the presence of detergents may occur when using TEV protease (21). Two proteases that work well in detergents for cleaving tags are human rhinovirus 3C and thrombin. Evaluating targets based on their yields post-IMAC (immobilized metal affinity chromatography) is a rigorous quantitative method with potentially more utility than expression levels based on GFP fluorescence, or western blot analysis. Size-exclusion chromatography (SEC) is a robust approach to evaluate protein homogeneity. Presence of the IMP target within the included volume of an SEC column following solubilization with a detergent signifies a non-aggregated state. Fully included targets can generally be purified to homogeneity and move on to crystallization trials. Ion exchange chromatography may be leveraged not only for purification but also for detergent exchange or to pre-concentrate protein without excess detergent prior to crystallization. When concentrating purified membrane proteins one must be very conscious of, and strive to eliminate, protein-free detergent micelles within the solution which may concentrate with the protein leading to heavy phase separation during crystallization. This is a major source of negative results at the crystallization stage.
Crystallization is the last hurdle en route to structure by x-ray crystallography. Structural genomics efforts have produced significant insights into this bottleneck through development of novel methods and instrumentation (22). Crystallization and screening in lipid mesophases or various microfluidic techniques are two such examples (23, 24). Developments in instrumentation around precision liquid handling robots have driven down the time, cost and sample volume required to obtain initial crystal hits for purified targets. Within the Center for Structures of Membrane Proteins (CSMP.ucsf.edu), one of the specialized PSI-2 Centers focused on IMPs, hanging drop vapor diffusion screens have produced very positive results in generating initial crystal hits. The vast majority of these hits are derived from readily available commercial screens – some of which are tailored specifically for IMPs and have also resulted from structural genomics efforts (25).
Obtaining initial crystallization hits may often not present a hurdle. Rather, it is the optimization to obtain crystals of sufficient quality for structure determination that is almost always a matter of fine-tuning, especially with the added complexity of micelle size and homogeneity. This phenomenon is a reinforcement of Pareto’s Principle in that the final crystallization conditions which produced a crystal for structure determination often resulted from very expansive efforts. In addition, these efforts are rarely linear in nature. Success is often very target specific (e.g. transporter vs. channel) and inherently not very conducive to high-throughput workflows, though some established pipelines would likely result in measured success when applied to IMP targets (such as the RIKEN Spring-8 (26) or JCSG pipelines (22,27)). Additionally, pre- or post-crystallization screens may be implemented to increase success rates from final purification to determined structure. These include: 1) 1H NMR to characterize the folded state of the protein, 2) fluorescence-based thermal melting assays (28) to increase protein stability, 3) screening of orthologous proteins, 4) detergent screens, 5) deletion and/or truncation constructs for specific targets, and 6) light scattering (dynamic or static) to access dispersity in solution.
Principal of Least Effort is a theory that postulates a path of least resistance will always be chosen when given a choice (29). Naturally, one must ask if such an approach can be applied to vetting eukaryotic integral membrane proteins for downstream structure determination. With this mindset, a discovery-oriented pipeline was constructed to address the hurdles described above. This pipeline is constructed from strict empirical criteria that define the progression of targets with each step being dependent on the previous, and ultimately necessary for structure determination. These steps include using one expression system (S. cerevisiae), a single detergent for membrane extraction (DDM) and a single buffer for SEC void checks (20 mM TRIS-HCl pH 7.4RT, 200 mM NaCl, 1 mM DDM and 10% v/v glycerol). The targets that express well, are soluble in DDM and included in SEC move along at high velocity through the pipeline, while targets that do not meet these criteria are retained. The benefit of the discovery-oriented pipeline is evident when moving into the intensive phase, where sets of workable and responsive targets that have a posteriori cleared most of the major hurdles are all that remain. How much effort does such an extensive phase take in identifying these targets, and can it significantly increase the output to input ratio for the pipeline?
We were able to establish a discovery-oriented pipeline to survey 384, or four sets of 96, rationally selected integral membrane proteins representing every IMP protein family within the S. cerevisiae genome (30). To quickly identify proteins amenable to large-scale purification, functional characterization, and crystallization, targets were cloned into a protease deficient S. cerevisiae strain, W303-Δpep4, and grown in 500 ml cultures. Of the 351 targets cloned, 234 (67%) of these expressed and solubilized in DDM (Table 1). The 61 that expressed in the first cohort of 96 targets (64%) were followed by growths in three-liter cultures and evaluated by post IMAC expression levels and quality of size exclusion characteristics in one buffer condition. This resulted in the identification of 23 (24% of starting number) targets with high expression level (> 0.5 mg protein / L of culture), soluble in DDM, and fully included by SEC. Furthermore, the extensive phase of the pipeline was divided into three general categories – target selection, expression plasmid construction and target prioritization.
For target selection, all S. cerevisiae’s predicted gene product sequences (~6600) were fed into the program TMHMM (31), which predicted 621 proteins to have three or more transmembrane helices (TMH). A minimum of three transmembrane helices was specifically chosen in order to omit secreted, monotopic or membrane associated proteins that are merely membrane-anchored or contain signal peptides. Although this minimum eschews important classes of IMPs, such as oligomeric channel-forming transmembrane proteins with fewer than three helices, the signal peptide prediction algorithms are simply not robust enough at the present time to identify two-TMH eukaryotic IMPs (32) with certainty. Parsing through the 621 targets evinced 162 unique Pfam membrane protein families represented in yeast. One target was chosen from each of the identified Pfams and, for those Pfams represented by more then one protein (83), we chose two targets. In addition, 131 proteins could not be annotated with a known Pfam. Each of these targets were placed in the pipeline generating a cohort of 384 protein targets (4 × 96 to facilitate cloning in 96-well format) representing all annotated and unannotated IMP families within S. cerevisiae.
Ligase independent cloning (LIC) was used to construct expression vectors in a high-throughput format. Genes were inserted into a 2 μ based S. cerevisiae LIC expression plasmid with an N-terminal FLAG tag followed by a 3C protease cleavage site and a C-terminal 10XHis tag preceded by a thrombin protease cleavage site. Meticulous control of protein induction was accomplished by selecting the GAL1 promoter. 351 out of 384 targets were cloned in the initial pass (91% success rate), with a throughput of up to 192 clones per week.
Target prioritization began with the growth of 500 ml culture volumes to test expression and detergent solubility for each of the 351 cloned constructs. 272 constructs expressed in the system and 234 were soluble in DDM, producing a 61% success rate after passing both expression and solubilization bottlenecks (Table 1). Targets were then prioritized by expression level and detergent solubilization based on qualitative analysis of western blots. Protein stability and integrity were ascertained for the 61 constructs that showed both expression and solubility in the first cohort of 96. These yeast IMPs contained the most diversity of the protein families within S. cerevisiae being single representatives of the selected Pfam families. Every target was grown in three liters of yeast culture, and screened in a single buffer condition: 20 mM TRIS-HCl pH 7.4RT, 200 mM NaCl, 1 mM DDM and 10% glycerol. 31 out of 61 targets, or 51%, were found to reside mostly (> 50%) in the SEC included volume. 23 out of 61, or 38%, were fully included and of high quality for downstream studies (Table 1). This corresponds to 24% retention of targets through the extensive phase of our pipeline even while applying relatively strict and limited guidelines for target progression (single detergent, single SEC buffer, etc.). This 24% rate of retentiveness has parity with the 25% obtained for a set of globular prokaryotic targets in a recent systems-oriented screen (33). For large Pfams with multiple members, such as the Major Facilitator Superfamily, we selected the most representative member of that family based upon multiple sequence alignment profiles (30).
Pursuing the structure of integral membrane proteins has, rather aptly, been referred to as “siege warfare” (34). This depiction highlights the operose nature of membrane protein structure determination that, by many indications, is entering a Renaissance period of accelerated growth. The number of research groups pursuing IMP structures, deployment of novel methodologies and equipment, and ramping up of structural genomics efforts will all have a significant affect on the trajectory of this growth. The resulting biological insights from such efforts will likely be significant considering the current paucity of structures and relative importance membrane proteins play within biological systems. Laborious target-specific efforts will continue to pay dividends in producing structures. The outstanding question remains to what extent, and how, structural genomics efforts will play in the increased rate at which IMP structures are determined. The current mini-review focuses on the implementation of a streamlined pipeline designed to identify well-behaved targets early on to facilitate increased returns during crystallization and structure determination.
Four hurdles have been addressed, each in a single ‘most probable’ fashion - expression, solubilization, purification and crystallization. The resulting discovery-oriented pipeline is a stripped-down way of trying to best circumvent the expression and solubilization hurdles leading up to crystallization. In selecting a single expression system and screening a large number of targets one can quickly select eukaryotic IMPs that express above a predefined threshold (e.g. 0.5 mg protein / L of culture for the culture). Solubility screens using DDM for each of the expressed IMPs allows for rapid identification of targets for subsequent scale-up. If required, downstream screens with shorter chain detergents can then be performed on targets that are solubilized by DDM because of its relatively high CMC, or longer chain detergents can be tested for those that are not extracted from the membrane with DDM. Determining if DDM soluble targets are within the included volume on SEC using standardized protocols allows for rapid and efficient initial characterization of each target. Comparing the expression, solubilization and initial SEC profiles for each target provides information for developing a target priority list for a subsequent production phase of the pipeline.
For the yeast screen this resulted in a list of 23 proteins of 61 attempted that expressed to appreciable levels, were extracted from the membrane in DDM and fully included on a SEC column (30). Each of these proteins has now been pushed into a more intensive production phase oriented towards detailed characterization. Four of these targets have now entered crystallization trials; two of them have been shown to crystallize and one has diffracted below 5 Å resolution. Although this discovery-oriented approach has yet to be validated through the emergence of novel eukaryotic IMP structures, we feel the initial results are very promising. This approach comes at a cost of not being attached to specific protein identities, or functional class, until later in the pipeline. Also, attrition rates will be high initially as targets are vetted with the selected expression system, solubilization detergent and SEC buffer. Targets that are retained (i.e. fail to progress) within the first pass can then be pushed through a salvage pipeline using a new detergent or buffer condition for screening.
Is target selection needed for such a relatively simplistic screen? If using high-throughput techniques, including ligase independent cloning, combined with small scale expression and solubilization tests the answer may very well be no. Upwards of 200 targets can be pushed from initial cloning from genomic DNA stock to analysis of expression/solubilization data within three weeks using S. cerevisiae. At this rate entire integral membrane proteomes can be screened to rapidly identify the more tractable IMPs. This does not imply that the ones that fall ‘below the bar’ could not be expressed with other methods or expression systems that would be standard for the more traditional approach to a particular membrane protein class.
For large membrane proteomes the amount of work during the void check and initial purification stage may be significantly reduced by reducing the number of targets initially screened. In this case target selection may provide some significant benefits. With biomedically relevant starting sets (e.g. Table 2) even a modest yield of 10% for each organism would be incredibly insightful. For homologous overexpression within yeast the return is 24% of targets being soluble in DDM and fully included in SEC (30). Significant attrition could be expected at each stage yet those results would feed back into the pipeline allowing for changes in detergent selection (e.g. FC-12 vs. DDM) and screening buffers more broadly post-IMAC. For larger membrane proteomes, such as human, a ratiocinative selection that minimizes the number of targets while maximizing coverage of represented protein families would be beneficial.
Novel IMP structures derived from such extensive screens are likely to yield significant biological insights considering the current paucity of available IMP structures. Indeed, the Protein Data Bank, as of November 2007, contains only 94 unique α-helical IMP structures containing three or more transmembrane helices. Unique in this case is defined as greater then 95% sequence identity to remove point mutants and other slight sequence modifications. These 94 structures represent only 37 protein families, of the 598 identified with three or more transmembrane helices, with high resolution structural data available (or 6%) (35). Using the Homo sapiens, Saccharomyces cerevisiae, and E. coli genomes as representative organisms, we analyzed the number of Pfam IMP families by organism to assess the structural coverage in each (Figure 1). Of the 37 families, 21 are represented in the human genome, 25 in E. coli and 19 within yeast. Of these, 17 Pfams are in human and yeast, 16 in human and E. coli, 14 in yeast and E. coli and 14 are represented in all three organisms. Five Pfams with high resolution structures are not represented in any of the three organisms. Eight families are found only in E. coli, including the drug resistance-associated Acr transporter family (PF00873) and the disulfide bond formation protein family DsbB (PF02600). The two families only represented in human are the sodium:neurotransmitter symporter family (PF00209) and the inward rectifier potassium channels (PF01007). Finally, one family was only represented in yeast – the bacteriorhodopsin family that includes the fungal rhodopsin homologs (PF01036). Only six of the 184 eukaryote-only Pfams within this analysis have associated structures. The majority of structures within the 85 Pfams present within both E. coli and either eukaryote (human or yeast) are derived from prokaryotic homologs. This highlights the observation that eukaryotic IMPs are more experimentally difficult to pursue. Thus, considering 94% of the IMP protein families (three TMH or more) do not have a representative structure and, when available, that structure is often derived from a prokaryotic source any avenue to obtain eukaryotic structures more efficiently could be deemed beneficial.
This review describes, in part, a broad screen which was published previously in The Journal of Molecular Biology (30) – we are grateful to all co-authors involved in bringing that project to fruition. This work was supported by the N.I.H. Roadmap Center grant P50 GM073210 (to R.M.S.), Specialized Center for the Protein Structure Initiative grant U54 GM074929 (to R.M.S. and A.S), National Research Service Award F32 GM078754 (to F.A.H.) and a Sandler Biomedical Research postdoctoral fellowship (to F.A.H.).