Knowledge distillation is a technique that enables knowledge transfer from large, computationally expensive models to smaller ones without losing validity. This allows for deployment on less powerful hardware, making evaluation faster and more efficient.
In this tutorial, we will run a number of experiments focused at improving the accuracy of a lightweight neural network, using a more powerful network as a teacher. The computational cost and the speed of the lightweight network will remain unaffected, our intervention only focuses on its weights, not on its forward pass. Applications of this technology can be found in devices such as drones or mobile phones. In this tutorial, we do not use any external packages as everything we need is available in ``torch`` and ``torchvision``. In this tutorial, you will learn:
How to modify model classes to extract hidden representations and use them for further calculations. How to modify regular train loops in PyTorch to include additional losses on top of, for example, cross-entropy for classification. How to improve the performance of lightweight models by using more complex models as teachers.
Tasks: Knowledge Distillation, Deep Learning Fundamentals, Loss Functions
Task Categories: Deep Learning Fundamentals
Published: 10/07/23