Text Analytics
Seminar, Technische Universität Darmstadt, Department of Computer Science, 2018
Text analytics course on active learning for undergraduate and graduate students
Course Contents
Active Learning (not to confuse with Active Learning from the educational field) tackles the challenge of how machine learning algorithms can achieve an equal or even greater performance with less training data. Supervised learning methods require labeled data which is often costly to annotate and often involves experts. However, different data contributes differently to the performance of an machine learning algorithm. For example, when training a classifier to classify pictures of cats, dogs, and alligators, adding more cat pictures would not help that much when the classifier is already very good at recognizing cats. The main research question in active learning is which data gives us the biggest gain if they had labels. For this we search for different strategies which decrease the number of necessary training data, independent of the underlying machine learning algorithm.
This seminar covers active learning sampling strategies, evaluation metrics for active learning, occurring problems with active learning and examines practical use cases of active learning in the field of text analytics.