|Home | About | Journals | Submit | Contact Us | Français|
This paper studies the sampling strategies for the Expert Network (EexNet), a statistical learning system used for patient record classification at the Mayo Clinic. The goal is to achieve high accuracy classification at an affordable computational cost in very large applications. The learning curves of ExpNet were observed with respect to the choice of training resources, the size, vocabulary coverage and category coverage of a training set, and the category distribution over training instances. A method combining advantages of different sampling strategies is proposed and evaluated using a large training corpus. As a result, Expert Network has achieved its nearly-optimal classification accuracy (measured by average precision) using a relatively small training set, with a fast real-time response which satisfies the needs of human-machine interaction.