The manual analysis of pyrosequencing data sometimes is a complex process. Human error can occur during data analysis. The main errors in manual analysis occur in three aspects. The first is due to the complexity of some mutations. Among the tests that are performed in our lab, EGFR exon 19 mutations are the most complex. The mutations in EGFR exon 19 are usually deletions and there are many different deletions. Ten of these deletions are more common than others. Each deletion will generate a unique pyrogram pattern with different peaks reflecting nucleotide sequence. Both wild type and mutant nucleotide sequence may contribute to a certain peak. Therefore, each pyrogram peak may reflect wild type nucleotide sequence or mutant one, or both. The nucleotides contributing to a certain peak may come from different codons of wild type and mutant sequences due to deletion. Each pyrogram pattern may further vary due to different tumor load in each individual case. All of these variations make manual analysis difficult. However, such complexity doesn’t pose a problem for computer software analysis. Once all of these possible different combinations have been programmed into the software, the computer can sort through these possible different combinations in a rapid fashion. Figure

A is the pyrogram of EGFR exon 19 deletion between codon 747 and 752. Figure

B is the software analysis result. It shows that both wildtype and mutant gene nucleotide(s) contribute to different pyrogram peaks. The second type of error in manual analysis is overlooking subtle mutation changes. An example is BRAF V600K mutation. The targeted sequence of BRAF in reverse sequencing is CACTGTAG. The dispensing order is TCGTATCTGTAG (Figure

A and B). In the case of the V600K mutation, apart from mutant peak T (second peak at dispensing position 4), the fourth peak C and fifth peak T (at the dispensing position 7 and 8) are lower than normal. One V600K was missed because the second peak T distracted the data reviewer. Consequently, the subtle changes in the fourth peak C and fifth peak T were overlooked. The third type of error in manual analysis is overlooking less common mutant peaks. For example, in KRAS data analysis, codon 12 mutation is more common. The data reviewer may focus on codon 12 changes and overlook the changes in codon 13. In this project, a KRAS G13D was missed for this reason.
The main error in computerized analysis is that suboptimal parameters are used to build the software for certain mutations. For example, in the case of the V600K mutation, the parameter for the lower fourth peak C was initially set up as “the height of fourth peak C is lower than 95

% of the average peak height of equivalent normal peaks. In this case, the dispensing order is TCGTATCTGTAG. The sixth, seventh and ninth peaks (which are labeled as G, T and G at the dispensing position of 9, 10 and 12) are used to calculate average normal peak height. During the testing process, it was realized that although such settings can recognize some V600K mutations, but will occasionally misinterpret some V600K cases as V600E. Therefore, the parameter was modified so that instead of using only the fourth peak C, both the fourth peak C and fifth peak T are used in the calculation. Moreover, instead of using “95

% of the average”, “less than two standard deviations of the average” is used. The modified software was tested and was able to interpret the data correctly. It appears that standard deviation reflects normal variation better than an arbitrary 95

%. Such modification is part of fine-tuning process of this software development.
Normally, two individuals will check sequencing results to minimize the human error. In this project, we used our software to check a total of 1375 test results (355 EGFR, 613 BRAF and 407 KRAS). The software was able to pick up 4 errors from the first round of manual analysis, which were also picked up by the second reviewer. The results indicate that the pyrosequencing data analysis software can be used as another layer of quality control.
The pattern recognition concept has been used to generate software for pyrosequencing data analysis. For example, Joakim Lundeber et al have used it for SNPs in chromosome 9 [
18]. Pyrosequencing software from Qiagen can provide pyrogram patterns for pure homozygous and heterozygous results of most common mutations in EGFR, KRAS and BRAF [
15-
17]. A recent software, Pyromaker can provide simulated pyrogram patterns with different percentages of tumor cells [
19]. The software developed in our lab is able to analyze real case data. Real case data can be input into our software and the output result will indicate what mutation type and percentage of mutant gene in the specimen. Our software also provides more extensive coverage for various mutations in EGFR, KRAS and BRAF. For example, it has been tailored in such a way so that it can distinguish BRAF V600E, V600K and V600R mutations. It can also distinguish different common variants of EGFR exon 19 deletions. Our software is also fine-tuned to accommodate normal variations in clinical mutation tests. Such features of the software make it a practical tool for pyrosequencing data analysis of real cases.
Based on our literature search using keywords, such as pyrosequencing, software, EGFR, KRAS and BRAF, our software is a unique program developed for EGFR, KRAS and BRAF pyrosequencing data analysis.
The software is designed and fine-tuned by our lab staff members and the software can only be as good as our lab staff members. However, the lab staff’s knowledge and experiences can be built into the software during the fine-tuning process. With such collective wisdom, the software may perform better than one staff member performing manual analysis. Moreover, the software can work more consistently and objectively than a human does, which makes it a valuable quality control tool.
The fine-tuning is also an ongoing training process for the software, especially for rare mutations. Our first stage fine-tuning used the data from 490 mutation test results. This process will continue in our lab as we analyze more cases. The molecular lab staff serves as trainers. Whenever a new mutation is misread by the software, our lab will update the software to cover the new mutation. We will adjust analysis parameters so that the software will be able to recognize the new mutations correctly without losing specificity. Our software is an open system. More coverage of mutations can be added to the software when needed.