Protein production and purification
1Karolinska Institutet, Schéeles väg 2, 171 77 Stockholm, Sweden.
2University of Oxford, Old Road Campus, Roosevelt Drive, Headington, Oxford OX3 7DQ, UK.
3University of Toronto, 100 College St., Toronto, Ontario M5G 1L6, Canada.
4Architecture et Fonction des Macromolécules Biologiques, Centre National de la Recherche Scientifique, Case 932, 163 Avenue de Luminy, 13288 Marseille Cedex 09, France.
5Lawrence Berkeley National Laboratory and Department of Chemistry, University of California, 351A Donner Laboratory, Berkeley, California 94720, USA.
6Tsinghua University, Beijing 100084, China.
7University of Science and Technology of China, Hefei 230027, China.
8Los Alamos National Laboratory, Mailstop M888, Los Alamos, New Mexico 87507, USA.
9Weizmann Institute of Science, 2 Herzel Street, Rehovot 76100, Israel.
10The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, California 92037, USA.
11Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, California, 92121, USA.
12Biosciences Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, USA.
13SGX Pharmaceuticals, Inc., 10505 Roselle Street, San Diego, California 92121, USA.
14Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, USA.
15Biology Department, 463, Brookhaven National Laboratory, Upton, New York 11973, USA.
16Case Western Reserve University, 10900 Euclid Ave., Cleveland, Ohio 44016, USA.
17Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, University of California, San Francisco, 600 16th Street, San Francisco, California 94143, USA.
18Center for Advanced Biotechnology and Medicine, Rutgers University, 679 Hoes Lane, Piscataway, New Jersey 08854, USA.
19Department of Biological Sciences, Columbia University, 701 Fairchild Building, MC 2451, New York, New York 10027, USA.
20Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX37BN, UK.
21Max Delbrück Center for Molecular Medicine (MDC), Robert-Rössle-Str. 10, 13092 Berlin, Germany,
22Protein Research Group, Genomic Sciences Center, Yokohama Institute, RIKEN, 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan.
23Division of Structural Biology, Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX37BN, UK.
25A complete list of authors appears at the end of this paper.
Recombinant proteins are used throughout biological and biomedical science. Their production was once the domain of experts, but the development of simple, commercially available systems has made the technology more widespread. As a result, also more widespread is an appreciation of the difficult, strategic choices inherent to the process. Commonly confronted questions include: should the protein(s) be expressed in bacteria, in yeast, in insect cells or in human cells? Which expression vector should be used? If bacterial expression is used, which strain(s) should be chosen? Should one express the full-length protein or a fragment thereof? Should the protein be tagged, and which affinity tag is the best? What is a good purification strategy, and what are the common pitfalls? Unfortunately, because every protein is different, there can be no ‘right’ answer to any of these questions a priori, and purification protocols and strategies must be worked out for each individual protein and with an eye to its intended use. This said, each project must begin somewhere, and purification strategies can now be guided by evidence-based trends, probabilities and cautionary notes that have emerged from large-scale structural genomics studies. In this review, which is targeted to the researcher with limited experience in protein expression and purification, we draw on our collective experiences to suggest a ‘consensus’ starting point for soluble protein expression and purification.
Over the past decade, our laboratories have collectively targeted and purified tens of thousands of different proteins from the Eubacteria and Archaea, and thousands from the Eukarya, including fungal, nematode, parasite, plant and human proteins (). These proteins belong to many different classes, including proteins with no predictable structure, human proteins of therapeutic relevance, proteins from parasites and viruses, integral membrane proteins and multiprotein complexes. A near-complete list of these proteins is available in a database (TargetDB) maintained by the Protein Data Bank (PDB; http://targetdb.pdb.org/
) under the auspices of the US National Institute of General Medical Sciences (NIGMS)-funded Protein Structure Initiative (http://www.nigms.nih.gov/Initiatives/PSI/
). The European research network Structural Proteomics in Europe (SPINE) also provides detailed target lists online (http://www.spineurope.org/
Overview of targeted proteins
In efforts to identify an optimal approach(es) for the initial production and purification of a ‘typical’ protein, our groups have explored many different technologies and strategies. Our common objective has been to balance success rates with ease and breadth of use, speed, cost and versatility1–16
. By comparing our independently optimized approaches, it is apparent that our preferred methods have, in many instances, evolved to be quite similar, but by no means identical (). Accordingly, in an effort to provide guidance to scientists interested in generating purified recombinant proteins, representatives from our research groups collaborated to articulate our ‘consensus’ advice (Box 1
), along with a brief rationale for each choice. In essence, we tried to answer the question “what would you try first?”, understanding that several choices are often possible or even desirable. We also provide guidance for those cases in which the initial attempt fails or problems are encountered, in other words, “What next?”. In Supplementary Methods
online, we provide links to online protocols offered by several structural genomics groups as well as detailed experimental protocols for the methods described here.
Summary of approaches used by SG centers
BOX 1 SUMMARY OF CONSENSUS PROTOCOL
- Obtain the cDNA by amplifying either genomic DNA (prokaryotic genes, or eukaryotic genes with no introns) or full-length, sequence-verified cDNAs (eukaryotes) or by total gene synthesis.
- Use ligation-independent cloning (LIC) to clone the full-length cDNA (or the fragment of interest) into an E coli expression vector.
- Use T7 RNA polymerase–driven expression and an N-terminal oligohistidine tag (include a cleavage site for a sequence-specific protease to enable removal of the tag).
- Express the protein in a derivative of the E. coli BL21(DE3) strain, with induction at low temperature (15–25 °C) in rich medium and with good aeration. If expressing proteins from organisms that have codon biases differing from those used by E. coli, use a strain supplemented with the appropriate tRNA genes.
- Solubilize and purify the protein in a well-buffered solution containing an ionic strength equivalent to 300–500 mM of a monovalent salt, such as NaCl.
- Use immobilized metal affinity chromatography (IMAC) as the initial purification step.
- If additional purification is required, use size-exclusion chromatography (gel filtration). If necessary, use ion exchange chromatography as a final ‘polishing’ step.
- The affinity tag may be removed to minimize non-native sequences in the recombinant protein and to achieve further purification. Use a recombinant, hexahistidine-tagged protease and reapply the sample to IMAC column to remove the protease and any cellular proteins that bound to the metal affinity resin.
It is important to emphasize three aspects of this review. First, it is meant to serve as a guide to those members of the research community who are interested in expressing recombinant proteins, but who feel that they may not have the breadth of experience to decide among the various possible approaches. Second, we selected this consensus strategy because it is simple and has the widest use. There are other methods that are perhaps equivalent, but space limitations preclude an in-depth discussion of all possible cloning, expression and purification strategies. Third, the methods described here were developed with the intention to produce purified, soluble protein in close-to-milligram quantities; there are many applications for purified protein (biochemical assays, antibody production) that may not have such requirements.
There are two important provisos to the methods and strategies described in this review. First, our experience is dominated by studies with nonmembrane cytosolic and/or fragments of proteins that comprise soluble domains. Second, although the protocols for the ‘first attempt’ described here have proven to be optimal for the broadest range of proteins, in any individual case, the methods will fail more often than they succeed.