Table 3.
Feature templates used in the CRF tagger
| Word unigram | wi−5, wi−4, wi−3, wi−2, wi−1, wiwi+1, wi+2, wi+3, wi+4, wi+5 | & yi |
|---|---|---|
| Word bigram | wi−1wi, wiwi+1 | & yi |
| Word trigram | wi−1wiwi+1 | & yi |
| Substrings | substrings of wi | & yi |
| (up to length 10) | ||
| Word shape | S(wi) | & yi |
| Tag bigram | True | & yi−1yi |
wi is the current word. yi is the current tag. Word shape S(wi) is produced by converting capital letters into ‘A’, small letters into ‘a’ and numerals into ‘#’.


