ActNAS: Generating efficient YOLO Models with Mixed Activation Functions

Written by Deeplite team | Nov 21, 2024 5:28:51 PM

This blog is a summary of the research paper Sah, Sudakar, et al. "ActNAS: Generating Efficient YOLO Models using Activation NAS", accepted at the NeurIPS 2024 FITML Workshop.

In the ever-evolving field of computer vision, the YOLO family of models has set benchmarks for real-time object detection. However, the quest for higher accuracy often leads to increased model complexity and latency, posing challenges for deployment on edge devices. To address these challenges, Deeplite has introduced a novel approach called Activation NAS (ActNAS) to optimize YOLO models (and applicable to other CNN models as well) by leveraging mixed activation functions tailored to specific hardware.

Activation Functions

Activation functions are crucial in neural networks as they introduce non-linearity, enabling the learning of complex patterns. Commonly used functions like ReLU (Rectified Linear Unit) and SiLU (Sigmoid Linear Unit) each have their strengths and weaknesses. ReLU is fast but less precise, while SiLU offers higher accuracy at the cost of speed. Traditionally, a single activation function is used across all layers of a model, but this approach can limit performance, especially on resource-constrained edge devices.

Introducing ActNAS

ActNAS uses a Hardware-Aware Neural Architecture Search (HA-NAS) to design YOLO models with mixed activation functions. This method selects each layer’s activation function based on specific hardware constraints, balancing accuracy, latency, and/or memory usage. The result is a model that runs significantly faster and/or uses less memory without a significant impact on accuracy.

Layer wise activation replacement a) Replace first SiLU with ReLU b) Replace second activation c) replace third activation etc.

Methodology & Key Findings

We conducted extensive experiments using YOLO5n and YOLO8m models, integrating various activation functions like ReLU, SiLU, and HardSwish. We developed a search space of candidate models by systematically replacing activations in each layer and evaluated their performance on different edge devices, including CPUs, GPUs, and NPUs.

Key findings include:

Mixed Activation can reduce latency by 30-70%
Mixed Activation has minimal impact on accuracy
ActNAS-generated models showed up to 1.67 times faster performance
ActNAS-generated models showed 64.15% less memory usage

Practical Implications

The practical implications of ActNAS are profound. By optimizing YOLO models for specific hardware, ActNAS enables the deployment of the best performing object detection models for the target hardware on edge devices. This opens new possibilities for real-time applications in areas such as autonomous vehicles, robotics, and mobile devices, where efficient processing is critical.

Conclusion

ActNAS represents a significant advancement in the optimization of YOLO models for edge devices. By leveraging mixed activation functions and hardware-aware design, it addresses the critical challenge of balancing accuracy and efficiency. This innovative approach not only enhances the performance of YOLO models but also paves the way for more practical and scalable applications of computer vision technologies in resource-constrained environments.

As the field of computer vision continues to evolve, innovations like ActNAS will play a crucial role in making advanced technologies more accessible and effective across various domains.

Interested in finding out more about leveraging mixed activation functions for your edge AI application?

I hope you enjoyed this blog summary of the article. Please let me know if you have any questions or feedback. 😊

View full post