Train as you label
The idea of Active Learning is quite simple, yet powerful. While you label data by hand, one (or many) ML algorithms are trained under the hood. It might be that the training of these algorithms is triggered periodically, such that - for instance - every 100th labeled record, a new training iteration is done for all registered models; or the training is triggered manually whenever needed. As you label, you train models.
With this concept, you can accomplish two very powerful tasks:
As you label data manually, the model(s) make suggestions for you, such that you can see how the current model would perform and can speed-up the process, as for many instances you only have to press enter given that the prediction is indeed correct. For those that are wrongly classified, you can either do a „this was wrong, put it back in the line“-operation, or correct the classification. Of course, this only helps if the model has sufficient classification performance, as otherwise you’d waste your time with correcting wrong predictions.
A significant improvement is in gaining knowledge about the unlabeled records. You can do this in two ways:
- When a model makes a prediction, you can calculate the confidence of this prediction. To do so, you must obtain the probabilities of each class per prediction. Once you have those probabilities, you can use a slightly adapted version of the entropy function to calculate the confidence. If the model can't decide between two classes (such as in a 50/50 probabilities case), the confidence will be 0. It grows as the probabilities skew more to a specific class, with the maximal value at 1 (the model is absolutely sure of its prediction).
This is called Uncertainty sampling; in the lower figure, you can see how the prediction probability x rises as the model tends more towards one of the extremes 0 or 1.
- Alternatively, you can calculate distances on the embeddings you use (such as Word2Vec) in order to apply sampling approaches that provide you with the records that give the most information on a yet unknown area of the feature space. For instance, you first want to "explore" the feature space by labeling data, so that you get a good understanding of what your data looks like. This can be scheduled by yielding the most informative records based on their location in the feature space, as long as your embedding is somewhat representative.
This is called Diversity sampling.
Those approaches can in fact be combined; either way, the system picks the data of which it thinks you should label next! This is not a random pick, this is an educated pick. This way, models can often learn better than traditional methods with substantially less data for training.
Perfect addition to Weak Supervision - Weak Classifiers
In kern, we use Active Learning to build what we call „weak classifiers“. Weak classifiers are simple ML models that tend to learn fast on few labeled records, such as Naive Bayes. As you label data manually, the classifiers learn to make predictions that are far better than random guess. Now if you’ve already read our blog post on Weak Supervision, you know that weak classifiers therefore make great heuristics! You can easily integrate multiple weak classifiers together with Python labeling functions to create a committee of heuristics, labeling your data at large-scale.
Weak classifiers are the perfect heuristic for cases in which it is plain difficult to describe a heuristic. You don’t know how to best express a label-characteristic? Don’t worry - just label some data manually, create a weak classifier, and integrate it into the Weak Supervision pipeline instantly.
Active Learning is a methodology designed to help you improve your manual labeling tasks. At kern, we go one step further, and apply Active Learning to improve both your manual and programmatic labeling tasks.