New research can reduce the size of your neural net in a super easy way
The future looks towards running deep learning algorithms on more compact devices as any improvements in this space make for big leaps in the usability of AI.
If a Raspberry Pi could run large neural networks, then artificial intelligence could be deployed in a lot more places.
Recent research in the field of economising AI has led to a surprisingly easy solution to reduce the size of large neural networks. It’s so simple, it could fit in a tweet:
- Train the Neural Network to Completion
- Globally prune the 20% of weights with the lowest magnitudes.
- Retrain with learning rate rewinding for the original training time.
- Iteratively repeat steps 2 and 3 until the desired sparsity is reached.
Further, if you keep repeating this procedure, you can get the model as tiny as you want. However, it’s pretty certain that you’ll lose some model accuracy along the way.
This line of research grew out of the an ICLR paper last year (Frankle and Carbin’s Lottery Ticket Hypothesis) which showed that a DNN could perform with only 1/10th of the number of connections if the right subnetwork was found in training.
The timing of this finding coincides well with reaching new limitations in computational requirements. Yes, you can send a model to train on the cloud but for seriously big networks, along with considerations of training time, infrastructure and energy usage — more efficient methods are desired because they’re just easier to handle and manage.
Bigger AI models are more difficult to train and to use, so smaller models are preferred.
Following this desire for compression, pruning algorithms came back into the picture following the success of the ImageNet competition. Higher performing models were getting bigger and bigger but many researchers proposed techniques try keep them smaller.
Song Han of MIT, developed a pruning algorithm for neural networks called AMC (AutoML for model compression) which removed redundant neurons and connections, when then the model is retrained to retain its initial accuracy level. Frankle took this method and developed it further by rewinding the pruned model to its initial weights and retrained it at a faster initial rate. Finally, in the ICLR study above, the researchers found that the model could be rewound to its early training rate and without playing with any parameters or weights.
Generally as the model gets smaller, the accuracy gets worse however this proposed model performs better than both Han’s AMC and Frankle’s rewinding method.
Now it’s unclear why this model works as well as it does, but the simplicity of it is easy to implement and also doesn’t require time-consuming tuning. Frankle says: “It’s clear, generic, and drop-dead simple.”
Model compression and the concept of economising machine learning algorithms is an important field that we can make further gains in. Leaving models too large reduces the applicability and usability of them (I mean, you can keep your algorithm sitting in an API in the cloud) but there are so many constraints in keeping them local.
For most industries, models are often limited in their usability because they may be too big or too opaque. The ability to discern why a model works so well will not only enhance the ability to make better models, but also more efficient models.
For neural nets, the models are so big because you want the model to naturally develop connections, which are being driven by the data. It’s hard for a Human to understand these connections but regardless, the understanding the model can chop out useless connections.
The golden nugget would be to have a model that can reason — so a neural network which trains connections based on logic, thereby reducing the training time and final model size, however, we’re some time away from having an AI that controls the training of AI.
Thanks for reading, and please let me know if you have any questions!