The Lottery Ticket Hypothesis and Transfer Learning
It’s always interesting to read about papers that shed light on the inner workings of neural nets. It’s likely that right now there are teams of people who are way smarter than me working away at exploiting this new knowledge, and that soon enough, they will release that knowledge back to the community as an open source library.
I also really liked the information theory paper on knowledge compression and the more recent paper on expressing ResNets as Ordinary Differential Equations.
It would seem to me that the lottery ticket hypothesis would also help explain how transfer learning works so well with relatively few examples compared to full network training. ImageNet training naturally provides a lot of well-initialized weights, so it would follow that these networks would train faster. I’d imagine that the technique discussed could be exploited to prune ImageNet trained ResNet or VGG models to find the “winning tickets” for a specific subset of images, further decreasing the required training examples to achieve state of the art performance.
However, the above observation seems pretty obvious, so I’d think that the authors would have discussed it. Let me know what you think.