Publication: Comparatively Studying Modern Optimizers Capability for Fitting Vision Transformers
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media Deutschland GmbH
Abstract
The Transformer architectures have been achieving great strides in both research and industry, garnering high adoption due to their versatility and generality. These qualities, combined with the availability of internet-scale datasets, open the path to constructing deep learning systems that can target many modalities and several tasks within each modality. Throughout the years, many optimization algorithms have been proposed and utilized in fitting Deep Learning models. Although many comparative assessments were made that investigated analyzing and selecting the best optimizer to fit architectures prior to Transformers, the literature lacks such extensive assessments in relation to optimizing Transformer-based deep learning models. In this paper, we investigated modern and recently introduced deep learning optimizers and applied the comparative assessment to multiple Transformer architectures implemented for the task of image classification. It was discovered experimentally by our comparative study that the novel optimizer LION provided the best performance on the target task and datasets, proving that the algorithmic design of optimizers can compete with and surpass handcrafted optimization schemes that are normally used in fitting Transformer architectures. © 2024 Elsevier B.V., All rights reserved.
