|Home | About | Journals | Submit | Contact Us | Français|
The Association of Biomolecular Resource Facilities 2003 Edman Sequencing Research Group (ABRF-ESRG’03) sample is the 15th in a series of studies designed to allow participating members to evaluate their abilities to analyze the N-terminus of a protein or peptide using automated Edman degradation chemistry. It is a follow-up study to the ESRG’02 sample, which was a single protein with a heterogeneous N-terminus. Both the 2002 and 2003 samples were obtained from the same protein complex and were resolved by SDS-PAGE followed by electrophoretic transfer to PVDF membrane. The ABRF-ESRG’03 sample had an apparent molecular weight of 49 kDa and a single N-terminus, with initial yields of approximately 2 pmol. Participants were asked to sequence 25 residues and return their results to the ESRG for analysis along with two completed surveys and an area/pmol table for repetitive and initial yield calculations. Data for 46 responses are presented which include initial yields, repetitive yields, sequencer performance, and ability to identify the protein.
The Association of Biomolecular Resource Facilities 2003 Edman Sequencing Research Group (ABRF-ESRG’03) sample was the 15th distributed by an ABRF research group in order to assess participants’ ability to analyze the N-terminus of a protein utilizing the Edman degradation chemistry. Previous samples evaluated proteins1–6 or peptides7–13 or were a data-interpretation exercise.14 These samples were designed to specifically evaluate homogeneity,3–5,7–8 heterogeneity,1–2,6,9,12–14 post-translational modifications,10–11 and sensitivity.12–13
This study is a follow-up to the ESRG’02 sample that evaluated members’ ability to characterize a heterogeneous, frayed N-terminus.6 This year’s sample was isolated from the same protein complex as the 2002 sample and together with that sample was purified by sodium dodecylsulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and electrophoretically transferred to polyvinylidene fluoride (PVDF). ESRG’03 was designed with several objectives in mind: (1) to determine initial and repetitive yields from a sequence that contained several repetitive amino acids; (2) to compare results from a homogeneous N-terminus with those obtained from the heterogeneous N-terminus used in the ESRG’02 study; (3) to determine length of read for a sample with an average initial yield of 2 pmol; and (4) to determine the ability to identify a known protein through BLAST or other search algorithms. This protein offered participants the opportunity to analyze a less challenging sample than that used in the ESRG’02 study and to compare their facility’s repetitive and initial yields with those of other members.
The sample was prepared at the same time as the ESRG’02 sample. The protein complex was present in several milliliters of solution that was donated by Dr. Robert Macnab (Yale University, New Haven, CT) and Dr. Tohru Minamino (Protonic Nanomachine ERATO, Kyoto, Japan). The sample was mixed with an equal volume of 2X Laemmli sample buffer plus β-mercaptoethanol and boiled for 3 min. The protein complex was then run on ten 15-lane precast 10–20% Tris-glycine 1.0-mm gels (Novex, Toronto, ON, Canada). After electrophoresis, the proteins were electrophoretically transferred to PVDF (Novex, 0.22 μm) at 250 mA for 2 h in a full immersion tank system. PVDF membranes were then stained with 0.05% Coomassie blue G-250 in 50% methanol/10% acetic acid for 5 min, followed by destaining with 40% methanol/1% acetic acid and washing with water overnight. Selected bands were excised for the ESRG’02 study, while the remaining PVDF membranes were stored dry between sheets of Whatman filter paper at room temperature for 1 year. After that time, bands were excised for the ESRG’03 study and placed into 0.5-mL microcentrifuge tubes. The selected protein was originally His-tagged. The tag was (incompletely) removed by the investigators to facilitate an experiment prior to donation to the ESRG, resulting in an additional His residue at the amino-terminus of the protein.
Three bands were submitted for amino acid analysis to one member’s lab and the results detected 9.4, 13.4, and 18.2 pmol of protein for an average of 13.8 pmol. This is about one third of the amount of sample distributed in the ESRG’02 study.
The ESRG analyzed a total of 14 bands (an average of 2 per lab) prior to distributing the sample to the membership. All runs were performed on Applied Biosystems Procise HT or Procise cLC sequencers. While results varied from lab to lab, the runs within each lab were very consistent. The ESRG runs had an average initial yield of 2.23 pmol for Met in position 2 (MET2), 93.9% repetitive yield (Leu residues), and 17 assigned amino acids. Only two ESRG runs, both done in the same lab, identified Trp. Based on the members’ average initial yield and the amino acid analysis data, the distributed samples were expected to have an average sequencing yield of 16% (1.5–2.9 pmol).
Postcard and electronic notification was sent to all ABRF members to solicit participants for the 2003 study. Protein bands were distributed to 65 members along with a sample survey, a general lab survey, a data reporting sheet with instructions, and a cover letter. Participants were asked to return the data reporting sheet with their findings, the completed surveys, a table of the raw pmol/cycle yields of all amino acids (typically available on most commercial Edman sequencers), and an example of how they would report the data obtained. All reports and surveys were sent to a third party who removed identifying marks and forwarded the results to the ESRG for analysis.
The ESRG distributed samples to 65 members who wished to participate in the study, with 46 returning results. This represents a 50% increase over the previous year’s 31 responses from 72 samples distributed. A summary of the sequence assignments for each facility that reported data is shown in Table 11.. Data are reported for the first 25 amino acids. Note that 8 facilities assigned more than 25 amino acids; these data are reported in the footnote to Table 11.
Figure 11 displays the data in Table 11 as the accuracy of assignments per cycle. MET2, THR3, THR4, ARG5, LEU6, THR7, and LEU10 all had 39 or more positive correct (PC) assignments. This shows that, barring difficult amino acids, most participants (85%) can expect to definitively and accurately assign the first 10 amino acids of Edman sequence from a PVDF-bound protein with an approximate initial yield of 2 pmol. The PC assignments for HIS1 and ARG8 were approximately 25% lower, possibly due to poor extraction from the PVDF, background amino acid interference in cycle 1, the absence of HIS1 in the database sequences, and/or the low abundance of TRP9 causing more uncertainty with ARG8. TRP9, as expected, was very difficult to assign, with only 8 PC assignments and a positive accuracy of only 57%, which is consistent with previous studies. Overall, with the exception of TRP9, there were very few positive wrong (PW) assignments in the first 10 residues for a positive accuracy of 97%. Residues 11–16 each had PC values of at least 32 with very few PW assignments. Succeeding cycles showed a steady decline in the number of positive assignments, but no general increase in the number of PW assignments. MET20 is the exception, with 3 PW (versus 26 PC) assignments, making it the third most incorrectly assigned amino acid after TRP9 and HIS1. The reason for this difficulty with Met, which is normally not a problem to identify, is unclear. Interestingly, there was no increase in tentative assignments after LEU10 until positions 22–25, which showed a slight, consistent increase in tentative calls.
The accuracy for amino acids called in this study was exceptional, with a positive accuracy of 96.9% and a total accuracy (combination of positive and tentative calls) of 93.9%. Possible reasons for the high accuracy include sample homogeneity, the presence of the protein’s sequence in available databases, or simply the nature of the sequence itself.
A summary of results from the ESRG’03 sample compared with previous ABRF Edman studies performed on proteins is shown in Table 22.. Similar to the 1999 study in which there were 45 responses, this study, with 46 responses, shows that the number of laboratories participating in Edman sequencing studies has not changed over the last 4 years. (The lower response in the 2002 study is likely an aberration due to the complexity of the sample.) It is, however, noteworthy that the number of laboratories involved with Edman chemistry sequencing appears to be decreasing. The number responding to this year’s study is about 50% of the number obtained in 1994 and 1995. This is undoubtedly due to the increased use of mass spectrometry (MS) for routine protein identification. Still, there appears to be a consistent need for Edman sequencing.
As mentioned above, the positive accuracy for ESRG’03 was 96.9% which is similar to the 98.7% seen in ABRF-SEQ’992 and the 95.5% in ABRF-SEQ’94.5 All three of these samples were proteins that could be identified using a standard BLAST or FASTA search. The 2003 and 1994 samples each had homogeneous N-termini. The 1999 sample contained two proteins, the major component (10 pmol) was sequenced with a positive accuracy of 98.7%. The minor component was a synthetic peptide present in the sample at 5 pmol. The positive accuracy for the 2002 study (76.3%) and the 1995 study (78%) were much lower. The 1995 sample was a mutated protein with microheterogeneity in four residues and the protein was not present in any database at the time of study,1 while the 2002 sample possessed a heterogeneous N-terminus which made identification difficult.6 Overall, these results suggest that there is a positive correlation between the ability to identify a protein through a BLAST/FASTA search and a high positive accuracy in sequencing
Two other points are worth mentioning regarding these five protein studies. Excluding the difficult 2002 sample, on average, respondents reported 19 correct assignments over the past 5 years. Secondly, tentative accuracy has improved for the two most recent studies (77.1% and 72.6%) compared with the performance of the 3 earlier years (58.1%, 45%, and 55%). The reason for this improvement could be a more conservative approach to data assignment.
ESRG’02 and ESRG’03 were different molecular weight bands from the same protein complex, prepared at the same time and excised from the same blot. This represented an excellent opportunity to compare the membership’s ability to make assignments from a homogeneous and a heterogeneous sample. ESRG’02 participants reported an average of 27 assignments per response. Since the sample had three distinct strands with overlapping sequences, this translated into an average of 9 assignments per N-terminus. In contrast, the 2003 sample participants reported an average of 20 assignments per response, roughly twice as much contiguous sequence per respondent. There were also a higher number of incorrect assignments for the 2002 sample (6.7/response) than for the 2003 study (1.2/response), which is also reflected in the higher positive accuracy of the 2003 study. In addition, only one third of the 2002 respondents identified the protein after a database search compared with two thirds in the 2003 study who could identify the sample (see the section below). While the data appear to suggest that it is much easier to interpret results from a homogeneous sample than from a heterogeneous one, it should be noted that the 2003 sample contained fewer difficult- to-sequence amino acids. The 2002 sample contained 2 Trp and 5 Pro in the first 20 residues compared with only 1 Trp and 1 Pro in the first 25 residues in the 2003 study. Overall, a comparison of the two studies confirms the conclusion reached in the 2002 sample—that it is much more difficult to assign and interpret amino acid sequences if there are multiple amino acids per cycle than if there is a single dominant amino acid per cycle.
A summary of the repetitive (RY) and initial (IY) yields for every participant in the ESRG’03 study is shown in Table 33.. Respondents were requested to submit a table that contained the pmol amino acid obtained per cycle. This table is typically provided by most commercial Edman sequencing software. Participation in this part of the study was very high— 39 of the 46 participants submitted the requested sheet, though 2 of the respondents reported only raw data and were therefore not included in RY calculations. MET2 was used for IY calculations because there is typically very little background Met in cycle 1 and because there were 40 PC assignments of MET2. Initial yields for 36 facilities were determined with one having an unusually high IY of 29.9. As the HIS1 yield for that submitter was 3.1 pmol, the ESRG determined that the 29.9 pmol IY was some sort of calibration error and did not use it in the IY average. The initial yields ranged from 0.24 to 3.91 pmol for an average of 1.96 pmol. This agrees well with the ESRG committee members’ test runs that averaged 2.23 pmol. An interesting point is that facility #26 had the lowest IY for MET2 (0.24 pmol) yet was able to identify 23 amino acids of the first 25 with no incorrect assignments. Also note that facility #26 used a Procise-HT, not the more sensitive Procise cLC model. The LEU6 yield from this facility was 0.6 pmol, indicating that their low MET yield was probably not due to calibration error.
RY calculations were made from Met (residues 2 and 20), Leu (residues 6, 10, 13, 22) and Ala (residues 12, 18, 21, and 25). The overall RY average for all three amino acids combined came to 90.3%. Although Thr was an abundant amino acid (residues 3, 4, 7, and 11), the ESRG determined that it would not be a reliable amino acid for RY calculations because it is partially destroyed by the Edman chemistry. RYs were calculated using only the amino acids assigned by the respondent. Met provided 22 data points for an average of 91.43%, Leu provided 34 data points for an average of 89.79%, and Ala provided 26 data points for an average of 91.30%. That Met provided the fewest data points is probably due to MET20 being so late in the sequence. Leu provided the most data points because the amino acid appears earlier in the sequence; however, the lower average RY is a result of the inclusion of data from poorer runs that could not get as far into the sequence. It is interesting that the average RY for Ala was the highest. All of the Ala residues occur in the latter half of the sequence.
Table 44 summarizes the performance of the various instruments used in the ESRG’03 study. The most common instrument used was the ABI Procise-HT (30 laboratories) followed by the ABI Procise-cLC (8), ABI 477 (3), HP G-1005A (4), and one Shimadzu PPSQ-23A. Laboratories using the G-1005A and the Procise-HT produced the highest positive accuracy of 99.1% and 98.6%, respectively. Respondents using the Procise-cLC reported a 94.7% positive accuracy and those using the older 477 and Shimadzu instruments reported the poorest, at 80% and 50%, respectively. The G-1005A produced the best RY at 93.1%, while the Procise HT was 90.8% and the Procise cLC was 88.1%. Considering the higher accuracy and RY for the Procise-HT compared with the more sensitive Procise-cLC, it can be stated that higher sensitivity in an Edman sequencer does not equate with better sequencing. On average, the laboratories that used the G-1005A were able to correctly assign 27 residues compared with the 19.5 residue average of the two Procise instruments.
The best responses for ESRG’03 and facilities that went significantly beyond 25 residues are presented in Table 55.. The criteria used to determine the best responses were no PW calls made in the first 25 cycles and a minimum of 24 correct (positive or tentative) assignments for those amino acids. Twelve facilities met these standards. Responses were sorted by number of PC, number of total correct, RY, and IY. While TRP9 was a difficult amino acid for many facilities (Table 55),), it was not a consideration for sorting these responses. As can be seen, 6 of the 8 PC Trp assignments were from the top 12 facilities. Another interesting point is that 3 of the 4 ESRG’03 responses that used a G-1005A model sequencer were in the top 12 responses. The average IY for the top 12 responses was 2.2 pmol, which is slightly higher than the average for the overall study of 1.96 pmol. The top 12 average RY was 91.8%, which is significantly higher than the overall average of 90.3%. Also, considering that all Met, Leu, and Ala residues in the first 25 amino acids were correctly identified and thus used in RY calculation for the top 12 responses, it can be stated that higher RY assists in obtaining more contiguous sequence data and deterring incorrect assignments.
There were five facilities that assigned data significantly past the requested 25 requested residues on the data report sheet. Four facilities formally reported their data beyond cycle 26, but one facility (#41) did not formally report their data, instead making apparent assignments on the pmol/cycle table submitted for IY and RY calculations. Since facility #41 did not formally report their data, it was not included in any summaries presented in this report. Four of the five facilities were also in the best 12 responses, one facility (#4) missed being included because of incorrectly assigning TRP9 as Asn. Three of these facilities used either a Procise-HT or a Procise-cLC and made 31, 30, and 26 correct assignments. Facilities #41 and #48 used a G-1005A and were able to make correct assignments (tentative or positive) of 46 and 48 respectively. Facility #41 made no incorrect assignments whereas facility #17 incorrectly assigned residue 49 as a Val rather than Leu.
As in the ESRG’02 study, the participants were asked to perform a database search with the obtained data in order to identify the protein. Results are presented in Table 66 for the 37 facilities that attempted a database search. As can be seen, 26 of 46 total responses (57%) were positive about their identification of the protein as flagellum-specific ATP synthase, but 28 of the 46 responses (61%) correctly identified the sample, even though some of these respondents were not sure of the identification. There is a distinction between positive and correct, since we wanted to evaluate the information that facilities give to customers. Some facilities correctly identified the protein but were not positive about it, and one facility reported that they positively identified the protein but did not report what it was. Seven facilities made no identification, whether correct or incorrect. The search engines used were BLAST (16 laboratories), FASTA (14 laboratories), MS Edman (3 laboratories), or miscellaneous (4 laboratories). In ESRG’02 only 9 of the 31 responses (29%) were able to identify the protein correctly. The success rate of protein identification is double for the ESRG’03 study. Considering the higher initial yield for the 2002 sample, our data show that a homogeneous N-terminus (ESRG’03) is better for correct protein identification than a frayed N-terminus (ESRG’02 sample).
Two surveys were distributed with ESRG’03, one asking questions about the instrument used to analyze the sample and another asking more general lab questions. The results of these surveys are available on-line at the ESRG page on the ARBF website (www.abrf.org). Only some of the data are summarized below.
One question asked respondents about the types of solutions they used, whether they were from vendors or homemade and whether or not they included particular additives. For the ABD instruments, only 3 respondents reported using dithiothreitol in the SB2 buffer, 7 added dimethylphenylthiourea, and 20 added acetone to SA3. A variety of other additives were reported including potassium or sodium phosphate, trifluoroacetic acid, formic acid, acetate, triscarboxyethylphosphine, tryptophan, norleucine, tetraethylammonium, and hexanesulfonate. Since only a few laboratories made these additions, we did not have sufficient data to determine their effectiveness.
Laboratories were also asked about sample pretreatment. The only pretreatment that seemed to correlate with increased sequencing success was treating the PVDF membrane with methanol. Addition of polybrene or washing the sample with water appeared to have no effect. The type of cartridge used also did not appear to correlate with success.
Participating laboratories were also asked to send examples of their user reports. These varied from being very formal to informal, as some were typed others handwritten. Examples of these reports can be found at the ABRF website listed above.
Of the 46 laboratories that participated in the study, 26 were academic core laboratories, 4 were academic laboratories, 2 were commercial facilities, 9 were sequencing facilities for commercial organizations, and 5 did not report their affiliation. Both of the commercial facilities were in the top 12 laboratories. Seventy percent of the respondents had some form of MS available to them: 29 had matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) capabilities, 22 had nanospray or liquid chromatography (LC) MS, and 19 had both MALDI-TOF and LC/MS capabilities.
Overall, the ESRG’03 study was extremely successful study with regard to the number of participants and the quality of the data returned. Forty-six facilities returned data—a number that is close to that of the 1999 study where 45 respondents returned data—indicating that there is a consistent core of ABRF protein laboratories that perform Edman sequence analysis. The response rate, however, is lower than in the 1995 and 1996 studies, indicating there has been a significant drop in the number of laboratories performing Edman sequence analysis since that time. Twenty two of 46 respondents (48%) were able to sequence at least 25 residues into the protein. Thirty-three facilities (72%) made no positive wrong assignments, which indicates that most facilities are not over-interpreting the results. The positive accuracy for this year’s study is excellent at 96.9% and compares favorably with the best of the previous protein samples. The results of the ESRG’03 study are superior to those of the ESRG’02 study, demonstrating that a homogeneous N-terminus is easier to analyze than a heterogeneous, frayed N-terminus. While initial yields ranged from 0.24 pmol to 3.91 pmol, this variability may not indicate sequencer performance but rather variability in electroblotting, considering the variability in the amino acid analysis results obtained during the quality control process. Repetitive yields averaged 90.3%, and thus probably play a significant role in the quality and quantity of data that can be obtained.
The ESRG committee would like to thank Drs. Robert McNab (Yale University, New Haven, CN) and Tohru Minamino (Protonic Nanomachine ERATO, Kyoto, Japan) for the use of their protein complex in this study.