Easily train or fine-tune SOTA computer vision models with one open source training library Tweet


SuperGradients

Introduction

Welcome to SuperGradients, a free, open-source training library for PyTorch-based deep learning models. SuperGradients allows you to train or fine-tune SOTA pre-trained models for all the most commonly applied computer vision tasks with just one training library. We currently support object detection, image classification and semantic segmentation for videos and images.

Docs and full user guide

Why use SuperGradients?

Built-in SOTA Models

Easily load and fine-tune production-ready, pre-trained SOTA models that incorporate best practices and validated hyper-parameters for achieving best-in-class accuracy.

Easily Reproduce our Results

Why do all the grind work, if we already did it for you? leverage tested and proven recipes & code examples for a wide range of computer vision models generated by our team of deep learning experts. Easily configure your own or use plug & play hyperparameters for training, dataset, and architecture.

Production Readiness and Ease of Integration

All SuperGradients models’ are production ready in the sense that they are compatible with deployment tools such as TensorRT (Nvidia) and OpenVino (Intel) and can be easily taken into production. With a few lines of code you can easily integrate the models into your codebase.

Documentation

Check SuperGradients Docs for full documentation, user guide, and examples.


Table of Content

See Table

Getting Started

Quick Start Notebook - Classification

Get started with our quick start notebook for image classification tasks on Google Colab for a quick and easy start using free GPU hardware.

Classification Quick Start in Google Colab Download notebook View source on GitHub


Quick Start Notebook - Semantic Segmentation

Get started with our quick start notebook for semantic segmentation tasks on Google Colab for a quick and easy start using free GPU hardware.

Segmentation Quick Start in Google Colab Download notebook View source on GitHub


Transfer Learning

Transfer Learning with SG Notebook - Semantic Segmentation

Learn more about SuperGradients transfer learning or fine tuning abilities with our Citiscapes pre-trained RegSeg48 fine tuning into a sub-dataset of Supervisely example notebook on Google Colab for an easy to use tutorial using free GPU hardware

Segmentation Transfer Learning in Google Colab Download notebook View source on GitHub


Knowledge Distillation Training

Knowledge Distillation Training Quick Start with SG Notebook - ResNet18 example

Knowledge Distillation is a training technique that uses a large model, teacher model, to improve the performance of a smaller model, the student model. Learn more about SuperGradients knowledge distillation training with our pre-trained BEiT base teacher model and Resnet18 student model on CIFAR10 example notebook on Google Colab for an easy to use tutorial using free GPU hardware

KD Training in Google Colab Download notebook View source on GitHub


Installation Methods

Prerequisites

General requirements
  • Python 3.7, 3.8 or 3.9 installed.

  • torch>=1.9.0

    • https://pytorch.org/get-started/locally/

  • The python packages that are specified in requirements.txt;

To train on nvidia GPUs

Quick Installation

Install stable version using PyPi

See in PyPi

pip install super-gradients

That’s it !

Install using GitHub
pip install git+https://github.com/Deci-AI/super-gradients.git@stable

Computer Vision Models - Pretrained Checkpoints

Pretrained Classification PyTorch Checkpoints

Model

Dataset

Resolution

Top-1

Top-5

Latency (HW)*T4

Latency (Production)**T4

Latency (HW)*Jetson Xavier NX

Latency (Production)**Jetson Xavier NX

Latency Cascade Lake

ViT base

ImageNet21K

224x224

84.15

-

4.46ms

4.60ms

- *

-

57.22ms

ViT large

ImageNet21K

224x224

85.64

-

12.81ms

13.19ms

- *

-

187.22ms

BEiT

ImageNet21K

224x224

-

-

-ms

-ms

- *

-

-ms

EfficientNet B0

ImageNet

224x224

77.62

93.49

0.93ms

1.38ms

- *

-

3.44ms

RegNet Y200

ImageNet

224x224

70.88

89.35

0.63ms

1.08ms

2.16ms

2.47ms

2.06ms

RegNet Y400

ImageNet

224x224

74.74

91.46

0.80ms

1.25ms

2.62ms

2.91ms

2.87ms

RegNet Y600

ImageNet

224x224

76.18

92.34

0.77ms

1.22ms

2.64ms

2.93ms

2.39ms

RegNet Y800

ImageNet

224x224

77.07

93.26

0.74ms

1.19ms

2.77ms

3.04ms

2.81ms

ResNet 18

ImageNet

224x224

70.6

89.64

0.52ms

0.95ms

2.01ms

2.30ms

4.56ms

ResNet 34

ImageNet

224x224

74.13

91.7

0.92ms

1.34ms

3.57ms

3.87ms

7.64ms

ResNet 50

ImageNet

224x224

81.91

93.0

1.03ms

1.44ms

4.78ms

5.10ms

9.25ms

MobileNet V3_large-150 epochs

ImageNet

224x224

73.79

91.54

0.67ms

1.11ms

2.42ms

2.71ms

1.76ms

MobileNet V3_large-300 epochs

ImageNet

224x224

74.52

91.92

0.67ms

1.11ms

2.42ms

2.71ms

1.76ms

MobileNet V3_small

ImageNet

224x224

67.45

87.47

0.55ms

0.96ms

2.01ms *

2.35ms

1.06ms

MobileNet V2_w1

ImageNet

224x224

73.08

91.1

0.46 ms

0.89ms

1.65ms *

1.90ms

1.56ms

NOTE:

  • Latency (HW)* - Hardware performance (not including IO)

  • Latency (Production)** - Production Performance (including IO)

  • Performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1

  • Performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1

Pretrained Object Detection PyTorch Checkpoints

Model

Dataset

Resolution

mAPval
0.5:0.95

Latency (HW)*T4

Latency (Production)**T4

Latency (HW)*Jetson Xavier NX

Latency (Production)**Jetson Xavier NX

Latency Cascade Lake

SSD lite MobileNet v2

COCO

320x320

21.5

0.77ms

1.40ms

5.28ms

6.44ms

4.13ms

SSD lite MobileNet v1

COCO

320x320

24.3

1.55ms

2.84ms

8.07ms

9.14ms

22.76ms

YOLOX nano

COCO

640x640

26.77

2.47ms

4.09ms

11.49ms

12.97ms

-

YOLOX tiny

COCO

640x640

37.18

3.16ms

4.61ms

15.23ms

19.24ms

-

YOLOX small

COCO

640x640

40.47

3.58ms

4.94ms

18.88ms

22.48ms

-

YOLOX medium

COCO

640x640

46.4

6.40ms

7.65ms

39.22ms

44.5ms

-

YOLOX large

COCO

640x640

49.25

10.07ms

11.12ms

68.73ms

77.01ms

-

NOTE:

  • Latency (HW)* - Hardware performance (not including IO)

  • Latency (Production)** - Production Performance (including IO)

  • Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1

  • Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1

Pretrained Semantic Segmentation PyTorch Checkpoints

Model

Dataset

Resolution

mIoU

Latency b1T4

Latency b1T4 including IO

DDRNet 23

Cityscapes

1024x2048

80.26

7.62ms

25.94ms

DDRNet 23 slim

Cityscapes

1024x2048

78.01

3.56ms

22.80ms

STDC 1-Seg50

Cityscapes

512x1024

75.07

2.83ms

12.57ms

STDC 1-Seg75

Cityscapes

768x1536

77.8

5.71ms

26.70ms

STDC 2-Seg50

Cityscapes

512x1024

75.79

3.74ms

13.89ms

STDC 2-Seg75

Cityscapes

768x1536

78.93

7.35ms

28.18ms

RegSeg (exp48)

Cityscapes

1024x2048

78.15

13.09ms

41.88ms

Larger RegSeg (exp53)

Cityscapes

1024x2048

79.2

24.82ms

51.87ms

ShelfNet LW 34

COCO Segmentation (21 classes from PASCAL including background)

512x512

65.1

-

-

NOTE: Performance measured on T4 GPU with TensorRT, using FP16 precision and batch size 1 (latency), and not including IO

Implemented Model Architectures

Image Classification

Object Detection

Semantic Segmentation

Contributing

To learn about making a contribution to SuperGradients, please see our Contribution page.

Our awesome contributors:


Made with contrib.rocks.

Citation

If you are using SuperGradients library or benchmarks in your research, please cite SuperGradients deep learning training library.

Community

If you want to be a part of SuperGradients growing community, hear about all the exciting news and updates, need help, request for advanced features, or want to file a bug or issue report, we would love to welcome you aboard!

  • Slack is the place to be and ask questions about SuperGradients and get support. Click here to join our Slack

  • To report a bug, file an issue on GitHub.

  • You can also join the community mailing list to ask questions about the project and receive announcements.

  • For a short meeting with SuperGradients PM, use this link and choose your preferred time.

License

This project is released under the Apache 2.0 license.


Deci Lab

Deci Lab supports all common frameworks and Hardware, from Intel CPUs to Nvidia’s GPUs and Jetsons

You can enjoy immediate improvement in throughput, latency, and memory with the Deci Lab. It optimizes deep learning models using best-of-breed technologies, such as quantization and graph compilers.

Get a complete benchmark of your models’ performance on different hardware and batch sizes in a single interface. Invite co-workers to collaborate on models and communicate your progress.

Sign up for Deci Lab for free here