Active Learning with Generalized Queries
Middlesex College, Room 320
Dr. Charles Ling
Dr. Sylvia Osborn
Dr. Nazim Madhavji
Dr. Xiaoming Liu (Statistics)
Dr. Huajie Zhang (University of New Brunswick)
In contrast to supervised learning, active learning can usually achieve same predictive accuracy with much fewer labeled examples, thus significantly reducing the labeling cost. However, previous studies of active leaning mostly assume that the learner can only ask specific queries (i.e., require labels for specific examples with all feature values). For instance, to predict osteoarthritis from a patient data set with 30 features, specific queries always contain values of all these 30 features, many of which may be irrelevant. A more natural way is to ask generalized queries, such as are people over age 50 with knee pain likely to have osteoarthritis? (with only two relevant features, age and type of pain). As one such generalized query can often represent a set of specific ones, the corresponding answer is also applicable to this whole set of specific queries. Therefore, the active learner can obtain more information from each generalized query, and consequently improve the learning effectively and efficiently.
In this thesis, we assume that the oracle is capable of answering such generalized queries, and develop different algorithms to implement such active learning with generalized queries, according to different real-world scenarios. The theoretical study proves that the query complexity of active learning with generalized queries is significantly lower than active learning with specific ones. The empirical study for a variety scenarios also demonstrates that, to achieve certain predictive accuracy, active learning with generalized queries requires to ask significantly fewer queries (or requires to spend significantly lower labeling cost), compared with active learning with specific ones.
Also from this web page: