Value of "engineering" in research
In my area of study, that is, character animation intersecting deep learning, there is a notion that some papers are “novel” or “science” or “research” and others are just “engineering” papers. The boundary between science and engineering varies from person to person. For our purposes, let us say research is about proposing new methods or components, for example, using Transformers to model character motion. And engineering is about ”doing for a dollar what any fool can do for two”, for instance, implementation or code optimizations.
Many academic people view engineering as secondary and sometimes outright unnecessary. I have had people tell me that an 8x improvement in training time is not worth mentioning in a paper. Unfortunately, this cannot be further from the truth, and the antagonistic sentiment is hurting the research community. Engineering is not just about shaving a few seconds here and there or saving a few dollars on AWS or Azure. It is arguably at the centre of deep learning.
While there are many metrics for measuring cost, time is perhaps the only universal quantity. From graduate students to research scientists, academia to industry, reducing experimentation time translates to earlier graduation or product to market. For this, I will focus on the engineering efforts in reducing the training time of deep learning models and highlight why it is critical in machine learning.
“Engineering” in Machine Learning
The core of deep learning is a simple algorithm that can efficiently process large amounts of data (unless you subscribe to the other perspective). The deep learning revolution of the last decade, since AlexNet, is epitomized by the use of hardware accelerators to train wider and deeper models. While AlexNet is certainly more than just an “engineering” paper, it is arguable that the engineering aspect of AlexNet – the implementation of neural networks in CUDA – has had a broader impact on machine learning research.
Engineering efforts toward reducing the training time, e.g., code-level optimizations, should be celebrated. A 10% improvement in the training time of deep learning algorithms is more than just saving two hours per day of an individual’s time. In some cases, it can increase the research output of the entire machine learning community by 10%. And in the best scenario, such as in the case of AlexNet, it can attract more talents towards an area or an idea and further accelerate progress.
Another example of engineering in deep learning is the Transformer architecture. Transformers are designed as a recurrent neural network replacement for sequential modelling tasks. From the paper, the architecture design is very clearly motivated by engineering considerations, e.g., “this inherently sequential nature [of RNN] precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.” Another example is on the implementation of attention functions, e.g., “while the two [attention functions] are similar in theoretical complexity, dot-product attention is much faster and more space-efficient in practice, since it can be implemented using highly optimized matrix multiplication code.”
The popularity of Transformers today is at least partially thanks to its engineering. If recurrent neural networks are ten times faster than Transformers, we would probably see RNN-based models dominate deep learning research now.
What can we do?
We should normalize engineering in research papers. The ability for an approach to be implemented efficiently can be as important as its current effectiveness and can indicate the scalability of the method in the future. Every paper should document the training time of the proposed approach or compare its implementation efficiency with existing methods. A paper should be evaluated, especially during submission, by considering the combination of its novelty, effectiveness, and engineering.