Elevator Pitch
Deep metric learning has emerged as a superior method for representation learning and can be utilized in medicine. For extreme classification problems traditional methods fail. Triplet network learns feature embedding which can solve this ML challenge to bring AI in clinical diagnostics.
Description
Recent advancement in artificial neural network and machine learning, has the potential of revolutionizing medical diagnostics. The algorithms can be used to classify images such as mammograms or be used in volumetric analysis for surgical resection. But machine learning algorithms still have limitations that need to be addressed before clinical deployment. Trained only using the most common types of diagnostic models can create problems when very rare type of diagnostics need to be made that are seldom encountered. This can cause mistrust and rises questions regarding effectivity and sensitivity.
For high-stake application like this, the usual classification based machine learning algorithms are not enough. Instead we need a method that can learn high quality low dimensional representation of the data where we can achieve accurate clustering of different classes including for the classes for which we do not have any training data. This way the rare type of breast cancer we mentioned earlier would form its own cluster in the learned representation space and we would automatically be able to differentiate it from the other common types of cancer.
To achieve this, in this project, we develop a generative model which would be able to learn latent representation space under which points coming from the same class are near each other and points coming from separate classes are far apart. We develop a novel loss function for training Variational Autoencoder (VAE) based generative models. The novel loss function exploits ideas from metric learning literature where instead of maximizing classification accuracy, neural networks are trained to map images coming from the same class to same regions in the learned latent representation space. Using our new VAE model, we can learn low dimensional latent representation for complex data that captures intra-class variance and inter-class similarities. The ability to learn such high quality low dimensional representation for any data would reduce any complex classification problem to simple clustering problem.
All our experiments in this project were carried out using Python and its different libraries. In particular we make extensive use of PyTorch, a Python based Deep Learning framework. We believe that our approach can benefit diverse communities attending PyCon who are looking for ways to integrate machine learning algorithms to solve similar tasks that our approach is designed to tackle. In our poster, we will showcase the relevant Python tools one could use to reproduce our experiments and tackle similar tasks in their domains.
Attendees will learn how machine learning can be used to learn features for rare events in semi-supervised manner where standard supervised learning approaches fail. We will go over how we use PyTorch deep learning framework to develop our models step by step. Attendees will also learn how our technique can potentially be used in clinical and medical settings for rare disease and event identification.
Notes
The speakers have a long history of giving talks at their respective universities on related academic topics. Recently they presented this work at PyCon 2019 where it received a lot of interest and high praise, therefore the immense interest to present this research at PyCon JP 2019. It will help create awareness of how Python can be implemented in healthcare and other related emerging areas.
This talk will be one of the 2 talks the authors are proposing at PyCon JP. It’s by a multidisciplinary team of 5 members. Each will cover certain aspects of the talk and engage in the discussion.