Abdulwahab Kabani, Winter 2017, "Improving Deep Learning Image Recognition Performance Using Region of Interest Localization Networks", Computer Science Department, University of Western Ontario, Canada.
Ph.D. Thesis Abstract
Deep Learning has been gaining momentum and achieving the state-of-the-art results on many visual recognition problems. The roots of this field can be traced back to the 1940s of the 20th century. The field has recently started delivering some interesting results on many image understanding problems. This is mainly due to the availability of powerful hardware that can accelerate the training process. In addition, the growth of the Internet and imaging devices such as mobile phones and cameras has contributed to the increase in the amount of data that can be used to train neural networks. All of these factors have contributed to the success of deep learning on large scale image understanding tasks.
Many image understanding problems do not have large training data. This is especially true in many special purpose datasets such as medical images, astronomical images, and environmental images. These application do not have large training datasets because unlike natural images, users do not typically take these images and upload them to the web. In addition, some of these applications, such as medical imaging, have many restrictions on sharing the data in order to protect the privacy of the patients. Finally, the labeling process needed for training natural images can be done by any person, unlike special purpose datasets. For example, in medical imaging, the images must be labeled by medical or clinical experts in the field. This results in datasets that are normally much smaller than natural images datasets as these experts have limited time to invest in the creation of the training sets. Luckily, in many of these applications, the most discriminative features may be present in a small region of interest.
In this work, we present a method of training deep learning models on problems with low number of training images. We will do that by localizing a region of interest in these images, which will help reduce the problem of overfitting. In this thesis, two localization architectures are introduced, namely: the naive localization network and the wide localization network (wide net). The latter has several advantages which we explain thoroughly. The first problem we will introduce is the Right whale recognition problem. The problem involves recognizing whales from aerial images by analyzing the callosities pattern on their heads. We will study how localizing the region of interest can be used to make deep learning work on such a small dataset. The second problem we will study is the estimation of the ejection fraction and left ventricle volume by analyzing cardiac MRI images. Automatically estimating the ejection fraction and volume of the heart can help in identifying and diagnosing several cardiac health issues. Similarly, this dataset contains only a small number of training subjects.