Training package
super_gradients.training module
- class super_gradients.training.Trainer(experiment_name: str, device: Optional[str] = None, multi_gpu: Union[MultiGPUMode, str] = MultiGPUMode.OFF, ckpt_root_dir: Optional[str] = None)[source]
Bases:
object
SuperGradient Model - Base Class for Sg Models
- train(max_epochs: int, initial_epoch: int, save_model: bool)[source]
the main function used for the training, h.p. updating, logging etc.
- predict(idx: int)
returns the predictions and label of the current inputs
- test(epoch : int, idx : int, save : bool):
returns the test loss, accuracy and runtime
- classmethod train_from_config(cfg: Union[DictConfig, dict]) Tuple[Module, Tuple] [source]
Trains according to cfg recipe configuration.
@param cfg: The parsed DictConfig from yaml recipe files or a dictionary @return: the model and the output of trainer.train(…) (i.e results tuple)
- classmethod resume_experiment(experiment_name: str, ckpt_root_dir: Optional[str] = None) None [source]
Resume a training that was run using our recipes.
- Parameters
experiment_name – Name of the experiment to resume
ckpt_root_dir – Directory including the checkpoints
- classmethod evaluate_from_recipe(cfg: DictConfig) None [source]
Evaluate according to a cfg recipe configuration.
- Note: This script does NOT run training, only validation.
Please make sure that the config refers to a PRETRAINED MODEL either from one of your checkpoint or from pretrained weights from model zoo.
- Parameters
cfg – The parsed DictConfig from yaml recipe files or a dictionary
- classmethod evaluate_checkpoint(experiment_name: str, ckpt_name: str = 'ckpt_latest.pth', ckpt_root_dir: Optional[str] = None) None [source]
Evaluate a checkpoint resulting from one of your previous experiment, using the same parameters (dataset, valid_metrics,…) as used during the training of the experiment
Note
The parameters will be unchanged even if the recipe used for that experiment was changed since then. This is to ensure that validation of the experiment will remain exactly the same as during training.
- Example, evaluate the checkpoint “average_model.pth” from experiment “my_experiment_name”:
>> evaluate_checkpoint(experiment_name=”my_experiment_name”, ckpt_name=”average_model.pth”)
- Parameters
experiment_name – Name of the experiment to validate
ckpt_name – Name of the checkpoint to test (“ckpt_latest.pth”, “average_model.pth” or “ckpt_best.pth” for instance)
ckpt_root_dir – Directory including the checkpoints
- train(model: Module, training_params: Optional[dict] = None, train_loader: Optional[DataLoader] = None, valid_loader: Optional[DataLoader] = None, additional_configs_to_log: Optional[Dict] = None)[source]
train - Trains the Model
- IMPORTANT NOTE: Additional batch parameters can be added as a third item (optional) if a tuple is returned by
the data loaders, as dictionary. The phase context will hold the additional items, under an attribute with the same name as the key in this dictionary. Then such items can be accessed through phase callbacks.
- param additional_configs_to_log
Dict, dictionary containing configs that will be added to the training’s sg_logger. Format should be {“Config_title_1”: {…}, “Config_title_2”:{..}}.
- param model
torch.nn.Module, model to train.
- param train_loader
Dataloader for train set.
- param valid_loader
Dataloader for validation.
- param training_params
resume : bool (default=False)
- Whether to continue training from ckpt with the same experiment name
(i.e resume from CKPT_ROOT_DIR/EXPERIMENT_NAME/CKPT_NAME)
ckpt_name : str (default=ckpt_latest.pth)
- The checkpoint (.pth file) filename in CKPT_ROOT_DIR/EXPERIMENT_NAME/ to use when resume=True and
resume_path=None
resume_path: str (default=None)
Explicit checkpoint path (.pth file) to use to resume training.
max_epochs : int
Number of epochs to run training.
lr_updates : list(int)
List of fixed epoch numbers to perform learning rate updates when lr_mode=’step’.
lr_decay_factor : float
Decay factor to apply to the learning rate at each update when lr_mode=’step’.
lr_mode : str
Learning rate scheduling policy, one of [‘step’,’poly’,’cosine’,’function’]. ‘step’ refers to constant updates at epoch numbers passed through lr_updates. ‘cosine’ refers to Cosine Anealing policy as mentioned in https://arxiv.org/abs/1608.03983. ‘poly’ refers to polynomial decrease i.e in each epoch iteration self.lr = self.initial_lr * pow((1.0 - (current_iter / max_iter)), 0.9) ‘function’ refers to user defined learning rate scheduling function, that is passed through lr_schedule_function.
lr_schedule_function : Union[callable,None]
Learning rate scheduling function to be used when lr_mode is ‘function’.
lr_warmup_epochs : int (default=0)
Number of epochs for learning rate warm up - see https://arxiv.org/pdf/1706.02677.pdf (Section 2.2).
- cosine_final_lr_ratiofloat (default=0.01)
- Final learning rate ratio (only relevant when `lr_mode`=’cosine’). The cosine starts from initial_lr and reaches
initial_lr * cosine_final_lr_ratio in last epoch
inital_lr : float
Initial learning rate.
loss : Union[nn.module, str]
Loss function for training. One of SuperGradient’s built in options:
“cross_entropy”: LabelSmoothingCrossEntropyLoss, “mse”: MSELoss, “r_squared_loss”: RSquaredLoss, “detection_loss”: YoLoV3DetectionLoss, “shelfnet_ohem_loss”: ShelfNetOHEMLoss, “shelfnet_se_loss”: ShelfNetSemanticEncodingLoss, “ssd_loss”: SSDLoss,
or user defined nn.module loss function.
IMPORTANT: forward(…) should return a (loss, loss_items) tuple where loss is the tensor used for backprop (i.e what your original loss function returns), and loss_items should be a tensor of shape (n_items), of values computed during the forward pass which we desire to log over the entire epoch. For example- the loss itself should always be logged. Another example is a scenario where the computed loss is the sum of a few components we would like to log- these entries in loss_items).
IMPORTANT:When dealing with external loss classes, to logg/monitor the loss_items as described above by specific string name:
- Set a “component_names” property in the loss class, whos instance is passed through train_params,
to be a list of strings, of length n_items who’s ith element is the name of the ith entry in loss_items. Then each item will be logged, rendered on tensorboard and “watched” (i.e saving model checkpoints according to it) under <LOSS_CLASS.__name__>”/”<COMPONENT_NAME>. If a single item is returned rather then a tuple, it would be logged under <LOSS_CLASS.__name__>. When there is no such attributed, the items will be named <LOSS_CLASS.__name__>”/”Loss_”<IDX> according to the length of loss_items
- For example:
- class MyLoss(_Loss):
… def forward(self, inputs, targets):
… total_loss = comp1 + comp2 loss_items = torch.cat((total_loss.unsqueeze(0),comp1.unsqueeze(0), comp2.unsqueeze(0)).detach() return total_loss, loss_items
… @property def component_names(self):
return [“total_loss”, “my_1st_component”, “my_2nd_component”]
- Trainer.train(…
- train_params={“loss”:MyLoss(),
… “metric_to_watch”: “MyLoss/my_1st_component”}
- This will write to log and monitor MyLoss/total_loss, MyLoss/my_1st_component,
MyLoss/my_2nd_component.
- For example:
- class MyLoss2(_Loss):
… def forward(self, inputs, targets):
… total_loss = comp1 + comp2 loss_items = torch.cat((total_loss.unsqueeze(0),comp1.unsqueeze(0), comp2.unsqueeze(0)).detach() return total_loss, loss_items
…
- Trainer.train(…
- train_params={“loss”:MyLoss(),
… “metric_to_watch”: “MyLoss2/loss_0”}
This will write to log and monitor MyLoss2/loss_0, MyLoss2/loss_1, MyLoss2/loss_2 as they have been named by their positional index in loss_items.
Since running logs will save the loss_items in some internal state, it is recommended that loss_items are detached from their computational graph for memory efficiency.
optimizer : Union[str, torch.optim.Optimizer]
Optimization algorithm. One of [‘Adam’,’SGD’,’RMSProp’] corresponding to the torch.optim optimzers implementations, or any object that implements torch.optim.Optimizer.
criterion_params : dict
Loss function parameters.
- optimizer_paramsdict
When optimizer is one of [‘Adam’,’SGD’,’RMSProp’], it will be initialized with optimizer_params.
(see https://pytorch.org/docs/stable/optim.html for the full list of parameters for each optimizer).
train_metrics_list : list(torchmetrics.Metric)
Metrics to log during training. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
valid_metrics_list : list(torchmetrics.Metric)
Metrics to log during validation/testing. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
loss_logging_items_names : list(str)
The list of names/titles for the outputs returned from the loss functions forward pass (reminder- the loss function should return the tuple (loss, loss_items)). These names will be used for logging their values.
metric_to_watch : str (default=”Accuracy”)
will be the metric which the model checkpoint will be saved according to, and can be set to any of the following:
a metric name (str) of one of the metric objects from the valid_metrics_list
a “metric_name” if some metric in valid_metrics_list has an attribute component_names which is a list referring to the names of each entry in the output metric (torch tensor of size n)
one of “loss_logging_items_names” i.e which will correspond to an item returned during the loss function’s forward pass (see loss docs abov).
At the end of each epoch, if a new best metric_to_watch value is achieved, the models checkpoint is saved in YOUR_PYTHON_PATH/checkpoints/ckpt_best.pth
greater_metric_to_watch_is_better : bool
- When choosing a model’s checkpoint to be saved, the best achieved model is the one that maximizes the
metric_to_watch when this parameter is set to True, and a one that minimizes it otherwise.
ema : bool (default=False)
Whether to use Model Exponential Moving Average (see https://github.com/rwightman/pytorch-image-models ema implementation)
batch_accumulate : int (default=1)
Number of batches to accumulate before every backward pass.
ema_params : dict
Parameters for the ema model.
zero_weight_decay_on_bias_and_bn : bool (default=False)
Whether to apply weight decay on batch normalization parameters or not (ignored when the passed optimizer has already been initialized).
load_opt_params : bool (default=True)
Whether to load the optimizers parameters as well when loading a model’s checkpoint.
run_validation_freq : int (default=1)
- The frequency in which validation is performed during training (i.e the validation is ran every
run_validation_freq epochs.
save_model : bool (default=True)
Whether to save the model checkpoints.
silent_mode : bool
Silents the print outs.
mixed_precision : bool
Whether to use mixed precision or not.
save_ckpt_epoch_list : list(int) (default=[])
List of fixed epoch indices the user wishes to save checkpoints in.
average_best_models : bool (default=False)
If set, a snapshot dictionary file and the average model will be saved / updated at every epoch and evaluated only when training is completed. The snapshot file will only be deleted upon completing the training. The snapshot dict will be managed on cpu.
precise_bn : bool (default=False)
Whether to use precise_bn calculation during the training.
precise_bn_batch_size : int (default=None)
The effective batch size we want to calculate the batchnorm on. For example, if we are training a model on 8 gpus, with a batch of 128 on each gpu, a good rule of thumb would be to give it 8192 (ie: effective_batch_size * num_gpus = batch_per_gpu * num_gpus * num_gpus). If precise_bn_batch_size is not provided in the training_params, the latter heuristic will be taken.
seed : int (default=42)
Random seed to be set for torch, numpy, and random. When using DDP each process will have it’s seed set to seed + rank.
log_installed_packages : bool (default=False)
- When set, the list of all installed packages (and their versions) will be written to the tensorboard
and logfile (useful when trying to reproduce results).
dataset_statistics : bool (default=False)
Enable a statistic analysis of the dataset. If set to True the dataset will be analyzed and a report will be added to the tensorboard along with some sample images from the dataset. Currently only detection datasets are supported for analysis.
sg_logger : Union[AbstractSGLogger, str] (defauls=base_sg_logger)
Define the SGLogger object for this training process. The SGLogger handles all disk writes, logs, TensorBoard, remote logging and remote storage. By overriding the default base_sg_logger, you can change the storage location, support external monitoring and logging or support remote storage.
sg_logger_params : dict
SGLogger parameters
clip_grad_norm : float
Defines a maximal L2 norm of the gradients. Values which exceed the given value will be clipped
lr_cooldown_epochs : int (default=0)
Number of epochs to cooldown LR (i.e the last epoch from scheduling view point=max_epochs-cooldown).
pre_prediction_callback : Callable (default=None)
- When not None, this callback will be applied to images and targets, and returning them to be used
for the forward pass, and further computations. Args for this callable should be in the order (inputs, targets, batch_idx) returning modified_inputs, modified_targets
ckpt_best_name : str (default=’ckpt_best.pth’)
The best checkpoint (according to metric_to_watch) will be saved under this filename in the checkpoints directory.
enable_qat: bool (default=False)
- Adds a QATCallback to the phase callbacks, that triggers quantization aware training starting from
qat_params[“start_epoch”]
qat_params: dict-like object with the following key/values:
start_epoch: int, first epoch to start QAT.
- quant_modules_calib_method: str, One of [percentile, mse, entropy, max]. Statistics method for amax
computation of the quantized modules (default=percentile).
per_channel_quant_modules: bool, whether quant modules should be per channel (default=False).
calibrate: bool, whether to perfrom calibration (default=False).
calibrated_model_path: str, path to a calibrated checkpoint (default=None).
- calib_data_loader: torch.utils.data.DataLoader, data loader of the calibration dataset. When None,
context.train_loader will be used (default=None).
num_calib_batches: int, number of batches to collect the statistics from.
- percentile: float, percentile value to use when Trainer,quant_modules_calib_method=’percentile’.
Discarded when other methods are used (Default=99.99).
- Returns
- property get_arch_params
- property get_structure
- property get_architecture
- property get_module
- test(model: Optional[Module] = None, test_loader: Optional[DataLoader] = None, loss: Optional[_Loss] = None, silent_mode: bool = False, test_metrics_list=None, loss_logging_items_names=None, metrics_progress_verbose=False, test_phase_callbacks=None, use_ema_net=True) tuple [source]
Evaluates the model on given dataloader and metrics. :param model: model to perfrom test on. When none is given, will try to use self.net (defalut=None). :param test_loader: dataloader to perform test on. :param test_metrics_list: (list(torchmetrics.Metric)) metrics list for evaluation. :param silent_mode: (bool) controls verbosity :param metrics_progress_verbose: (bool) controls the verbosity of metrics progress (default=False). Slows down the program. :param use_ema_net (bool) whether to perform test on self.ema_model.ema (when self.ema_model.ema exists,
otherwise self.net will be tested) (default=True)
- Returns
results tuple (tuple) containing the loss items and metric values.
- All of the above args will override Trainer’s corresponding attribute when not equal to None. Then evaluation
is ran on self.test_loader with self.test_metrics.
- evaluate(data_loader: DataLoader, metrics: MetricCollection, evaluation_type: EvaluationType, epoch: Optional[int] = None, silent_mode: bool = False, metrics_progress_verbose: bool = False)[source]
Evaluates the model on given dataloader and metrics.
- Parameters
data_loader – dataloader to perform evaluataion on
metrics – (MetricCollection) metrics for evaluation
evaluation_type – (EvaluationType) controls which phase callbacks will be used (for example, on batch end, when evaluation_type=EvaluationType.VALIDATION the Phase.VALIDATION_BATCH_END callbacks will be triggered)
epoch – (int) epoch idx
silent_mode – (bool) controls verbosity
metrics_progress_verbose – (bool) controls the verbosity of metrics progress (default=False). Slows down the program significantly.
- Returns
results tuple (tuple) containing the loss items and metric values.
- property get_net
Getter for network. :return: torch.nn.Module, self.net
- set_net(net: Module)[source]
Setter for network.
- Parameters
net – torch.nn.Module, value to set net
- Returns
- class super_gradients.training.KDTrainer(experiment_name: str, device: Optional[str] = None, multi_gpu: Union[MultiGPUMode, str] = MultiGPUMode.OFF, ckpt_root_dir: Optional[str] = None)[source]
Bases:
Trainer
- classmethod train_from_config(cfg: Union[DictConfig, dict]) None [source]
Trains according to cfg recipe configuration.
@param cfg: The parsed DictConfig from yaml recipe files @return: output of kd_trainer.train(…) (i.e results tuple)
- train(model: Optional[KDModule] = None, training_params: dict = {}, student: Optional[SgModule] = None, teacher: Optional[Module] = None, kd_architecture: Union[type, str] = 'kd_module', kd_arch_params: dict = {}, run_teacher_on_eval=False, train_loader: Optional[DataLoader] = None, valid_loader: Optional[DataLoader] = None, *args, **kwargs)[source]
Trains the student network (wrapped in KDModule network).
- Parameters
model – KDModule, network to train. When none is given will initialize KDModule according to kd_architecture, student and teacher (default=None)
training_params – dict, Same as in Trainer.train()
student – SgModule - the student trainer
teacher – torch.nn.Module- the teacher trainer
kd_architecture – KDModule architecture to use, currently only ‘kd_module’ is supported (default=’kd_module’).
kd_arch_params – architecture params to pas to kd_architecture constructor.
run_teacher_on_eval – bool- whether to run self.teacher at eval mode regardless of self.train(mode)
train_loader – Dataloader for train set.
valid_loader – Dataloader for validation.
- class super_gradients.training.MultiGPUMode(value)[source]
Bases:
str
,Enum
- OFF - Single GPU Mode / CPU Mode
- DATA_PARALLEL - Multiple GPUs, Synchronous
- DISTRIBUTED_DATA_PARALLEL - Multiple GPUs, Asynchronous
- OFF = 'Off'
- DATA_PARALLEL = 'DP'
- DISTRIBUTED_DATA_PARALLEL = 'DDP'
- AUTO = 'AUTO'
- class super_gradients.training.StrictLoad(value)[source]
Bases:
Enum
Wrapper for adding more functionality to torch’s strict_load parameter in load_state_dict(). .. attribute:: OFF - Native torch “strict_load = off” behaviour. See nn.Module.load_state_dict() documentation for more details.
- ON - Native torch "strict_load = on" behaviour. See nn.Module.load_state_dict() documentation for more details.
- NO_KEY_MATCHING - Allows the usage of SuperGradient's adapt_checkpoint function, which loads a checkpoint by matching each
layer’s shapes (and bypasses the strict matching of the names of each layer (ie: disregards the state_dict key matching)).
- OFF = False
- ON = True
- NO_KEY_MATCHING = 'no_key_matching'
super_gradients.training.datasets module
- class super_gradients.training.datasets.ListDataset(root, file, sample_loader: ~typing.Callable = <function default_loader>, target_loader: ~typing.Optional[~typing.Callable] = None, collate_fn: ~typing.Optional[~typing.Callable] = None, sample_extensions: tuple = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), sample_transform: ~typing.Optional[~typing.Callable] = None, target_transform: ~typing.Optional[~typing.Callable] = None, target_extension='.npy')[source]
Bases:
BaseSgVisionDataset
- ListDataset - A PyTorch Vision Data Set extension that receives a file with FULL PATH to each of the samples.
Then, the assumption is that for every sample, there is a * matching target * in the same path but with a different extension, i.e:
- for the samples paths: (That appear in the list file)
/root/dataset/class_x/sample1.png /root/dataset/class_y/sample123.png
- the matching labels paths: (That DO NOT appear in the list file)
/root/dataset/class_x/sample1.ext /root/dataset/class_y/sample123.ext
- class super_gradients.training.datasets.DirectoryDataSet(root: str, samples_sub_directory: str, targets_sub_directory: str, target_extension: str, sample_loader: ~typing.Callable = <function default_loader>, target_loader: ~typing.Optional[~typing.Callable] = None, collate_fn: ~typing.Optional[~typing.Callable] = None, sample_extensions: tuple = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), sample_transform: ~typing.Optional[~typing.Callable] = None, target_transform: ~typing.Optional[~typing.Callable] = None)[source]
Bases:
BaseSgVisionDataset
- DirectoryDataSet - A PyTorch Vision Data Set extension that receives a root Dir and two separate sub directories:
Sub-Directory for Samples
Sub-Directory for Targets
- class super_gradients.training.datasets.SegmentationDataSet(root: str, list_file: str = None, samples_sub_directory: str = None, targets_sub_directory: str = None, cache_labels: bool = False, cache_images: bool = False, collate_fn: Callable = None, target_extension: str = '.png', transforms: Iterable = None)[source]
Bases:
DirectoryDataSet
,ListDataset
- static sample_loader(sample_path: str) <module 'PIL.Image' from '/home/ofri/.conda/envs/sg/lib/python3.8/site-packages/PIL/Image.py'> [source]
- sample_loader - Loads a dataset image from path using PIL
- param sample_path
The path to the sample image
- return
The loaded Image
- static sample_transform(image)[source]
sample_transform - Transforms the sample image
- param image
The input image to transform
- return
The transformed image
- class super_gradients.training.datasets.PascalVOC2012SegmentationDataSet(sample_suffix=None, target_suffix=None, *args, **kwargs)[source]
Bases:
SegmentationDataSet
PascalVOC2012SegmentationDataSet - Segmentation Data Set Class for Pascal VOC 2012 Data Set
- IGNORE_LABEL = 21
- static target_transform(target)[source]
target_transform - Transforms the label mask This function overrides the original function from SegmentationDataSet and changes target pixels with value 255 to value = IGNORE_LABEL. This was done since current IoU metric from torchmetrics does not support such a high ignore label value (crashed on OOM)
- param target
The target mask to transform
- return
The transformed target mask
- class super_gradients.training.datasets.PascalAUG2012SegmentationDataSet(*args, **kwargs)[source]
Bases:
PascalVOC2012SegmentationDataSet
PascalAUG2012SegmentationDataSet - Segmentation Data Set Class for Pascal AUG 2012 Data Set
- class super_gradients.training.datasets.PascalVOCAndAUGUnifiedDataset(**kwargs)[source]
Bases:
ConcatDataset
Pascal VOC + AUG train dataset, aka SBD dataset contributed in “Semantic contours from inverse detectors”. This is class implement the common usage of the SBD and PascalVOC datasets as a unified augmented trainset. The unified dataset includes a total of 10,582 samples and don’t contains duplicate samples from the PascalVOC validation set.
- datasets: List[Dataset[T_co]]
- cumulative_sizes: List[int]
- class super_gradients.training.datasets.CoCoSegmentationDataSet(root_dir: str, dataset_classes_inclusion_tuples_list: Optional[list] = None, *args, **kwargs)[source]
Bases:
SegmentationDataSet
CoCoSegmentationDataSet - Segmentation Data Set Class for COCO 2017 Segmentation Data Set
- class super_gradients.training.datasets.DetectionDataset(data_dir: str, input_dim: tuple, original_target_format: DetectionTargetsFormat, max_num_samples: int = None, cache: bool = False, cache_dir: str = None, transforms: List[DetectionTransform] = [], all_classes_list: Optional[List[str]] = None, class_inclusion_list: Optional[List[str]] = None, ignore_empty_annotations: bool = True, target_fields: List[str] = None, output_fields: List[str] = None)[source]
Bases:
Dataset
Detection dataset.
This is a boilerplate class to facilitate the implementation of datasets.
- HOW TO CREATE A DATASET THAT INHERITS FROM DetectionDataSet ?
Inherit from DetectionDataSet
implement the method self._load_annotation to return at least the fields “target” and “img_path”
- Call super().__init__ with the required params.
- //!super().__init__ will call self._load_annotation, so make sure that every required
attributes are set up before calling super().__init__ (ideally just call it last)
- WORKFLOW:
- On instantiation:
All annotations are cached. If class_inclusion_list was specified, there is also subclassing at this step.
If cache is True, the images are also cached
- On call (__getitem__) for a specific image index:
The image and annotations are grouped together in a dict called SAMPLE
the sample is processed according to th transform
Only the specified fields are returned by __getitem__
- TERMINOLOGY
TARGET: Groundtruth, made of bboxes. The format can vary from one dataset to another
- ANNOTATION: Combination of targets (groundtruth) and metadata of the image, but without the image itself.
> Has to include the fields “target” and “img_path” > Can include other fields like “crowd_target”, “image_info”, “segmentation”, …
- SAMPLE: Outout of the dataset:
> Has to include the fields “target” and “image” > Can include other fields like “crowd_target”, “image_info”, “segmentation”, …
INDEX: Refers to the index in the dataset.
- SAMPLE ID: Refers to the id of sample before droping any annotaion.
Let’s imagine a situation where the downloaded data is made of 120 images but 20 were drop because they had no annotation. In that case:
> We have 120 samples so sample_id will be between 0 and 119 > But only 100 will be indexed so index will be between 0 and 99 > Therefore, we also have len(self) = 100
- get_sample(index: int) Dict[str, Union[ndarray, Any]] [source]
Get raw sample, before any transform (beside subclassing). :param index: Image index :return: Sample, i.e. a dictionary including at least “image” and “target”
- get_resized_image(index: int) ndarray [source]
Get the resized image (i.e. either width or height reaches its input_dim) at a specific sample_id, either from cache or by loading from disk, based on self.cached_imgs_padded :param index: Image index :return: Resized image
- apply_transforms(sample: Dict[str, Union[ndarray, Any]]) Dict[str, Union[ndarray, Any]] [source]
Applies self.transforms sequentially to sample
- If a transforms has the attribute ‘additional_samples_count’, additional samples will be loaded and stored in
sample[“additional_samples”] prior to applying it. Combining with the attribute “non_empty_annotations” will load only additional samples with objects in them.
- Parameters
sample – Sample to apply the transforms on to (loaded with self.get_sample)
- Returns
Transformed sample
- get_random_samples(count: int, non_empty_annotations_only: bool = False) List[Dict[str, Union[ndarray, Any]]] [source]
Load random samples.
- Parameters
count – The number of samples wanted
non_empty_annotations_only – If true, only return samples with at least 1 annotation
- Returns
A list of samples satisfying input params
- property output_target_format
- plot(max_samples_per_plot: int = 16, n_plots: int = 1, plot_transformed_data: bool = True)[source]
Combine samples of images with bbox into plots and display the result.
- Parameters
max_samples_per_plot – Maximum number of images to be displayed per plot
n_plots – Number of plots to display (each plot being a combination of img with bbox)
plot_transformed_data – If True, the plot will be over samples after applying transforms (i.e. on __getitem__). If False, the plot will be over the raw samples (i.e. on get_sample)
- Returns
- class super_gradients.training.datasets.COCODetectionDataset(json_file: str = 'instances_train2017.json', subdir: str = 'images/train2017', tight_box_rotation: bool = False, with_crowd: bool = True, *args, **kwargs)[source]
Bases:
DetectionDataset
Dataset for COCO object detection.
- class super_gradients.training.datasets.PascalVOCDetectionDataset(images_sub_directory: str, download: bool = False, *args, **kwargs)[source]
Bases:
DetectionDataset
Dataset for Pascal VOC object detection
- static download(data_dir: str)[source]
Download Pascal dataset in XYXY_LABEL format.
Data extracted form http://host.robots.ox.ac.uk/pascal/VOC/
- class super_gradients.training.datasets.ImageNetDataset(root: str, transforms: Union[list, dict] = [], *args, **kwargs)[source]
Bases:
ImageFolder
ImageNetDataset dataset
- class super_gradients.training.datasets.Cifar10(root: str, train: bool = True, transforms: Union[list, dict] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]
Bases:
CIFAR10
CIFAR10 Dataset
- Parameters
root – Path for the data to be extracted
train – Bool to load training (True) or validation (False) part of the dataset
transforms – List of transforms to apply sequentially on sample. Wrapped internally with torchvision.Compose
target_transform – Transform to apply to target output
download – Download (True) the dataset from source
- class super_gradients.training.datasets.Cifar100(root: str, train: bool = True, transforms: Union[list, dict] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]
Bases:
CIFAR100
- class super_gradients.training.datasets.SuperviselyPersonsDataset(root_dir: str, list_file: str, **kwargs)[source]
Bases:
SegmentationDataSet
SuperviselyPersonsDataset - Segmentation Data Set Class for Supervisely Persons Segmentation Data Set, main resolution of dataset: (600 x 800). This dataset is a subset of the original dataset (see below) and contains filtered samples For more details about the ORIGINAL dataset see: https://app.supervise.ly/ecosystem/projects/persons For more details about the FILTERED dataset see: https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/PP-HumanSeg
- CLASS_LABELS = {0: 'background', 1: 'person'}
super_gradients.training.dataloaders module
- super_gradients.training.dataloaders.coco2017_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco2017_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco2017_train_yolox(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco2017_val_yolox(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco2017_train_ssd_lite_mobilenet_v2(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco2017_val_ssd_lite_mobilenet_v2(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.imagenet_train(dataset_params=None, dataloader_params=None, config_name='imagenet_dataset_params')[source]
- super_gradients.training.dataloaders.imagenet_val(dataset_params=None, dataloader_params=None, config_name='imagenet_dataset_params')[source]
- super_gradients.training.dataloaders.imagenet_efficientnet_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_efficientnet_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_mobilenetv2_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_mobilenetv2_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_mobilenetv3_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_mobilenetv3_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_regnetY_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_regnetY_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_resnet50_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_resnet50_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_resnet50_kd_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_resnet50_kd_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_vit_base_train(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.imagenet_vit_base_val(dataset_params=None, dataloader_params=None)[source]
- super_gradients.training.dataloaders.tiny_imagenet_train(dataset_params=None, dataloader_params=None, config_name='tiny_imagenet_dataset_params')[source]
- super_gradients.training.dataloaders.tiny_imagenet_val(dataset_params=None, dataloader_params=None, config_name='tiny_imagenet_dataset_params')[source]
- super_gradients.training.dataloaders.cifar10_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cifar10_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cifar100_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cifar100_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_stdc_seg50_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_stdc_seg50_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_stdc_seg75_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_stdc_seg75_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_regseg48_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_regseg48_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_ddrnet_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.cityscapes_ddrnet_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco_segmentation_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.coco_segmentation_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.pascal_aug_segmentation_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.pascal_aug_segmentation_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.pascal_voc_segmentation_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.pascal_voc_segmentation_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.supervisely_persons_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.supervisely_persons_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.pascal_voc_detection_train(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.pascal_voc_detection_val(dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None)[source]
- super_gradients.training.dataloaders.get_data_loader(config_name, dataset_cls, train, dataset_params=None, dataloader_params=None)[source]
Class for creating dataloaders for taking defaults from yaml files in src/super_gradients/recipes.
- Parameters
config_name – yaml config filename in recipes (for example coco2017_yolox).
dataset_cls – torch dataset uninitialized class.
train –
- controls whether to take
cfg.dataset_params.train_dataloader_params or cfg.dataset_params.valid_dataloader_params as defaults for the dataset constructor
- and
cfg.dataset_params.train_dataset_params or cfg.dataset_params.valid_dataset_params as defaults for DataLoader contructor.
dataset_params – dataset params that override the yaml configured defaults, then passed to the dataset_cls.__init__.
dataloader_params – DataLoader params that override the yaml configured defaults, then passed to the DataLoader.__init__
- Returns
DataLoader
- super_gradients.training.dataloaders.get(name: Optional[str] = None, dataset_params: Optional[Dict] = None, dataloader_params: Optional[Dict] = None, dataset: Optional[Dataset] = None) DataLoader [source]
Get DataLoader of the recipe-configured dataset defined by name in ALL_DATALOADERS.
- Parameters
name – dataset name in ALL_DATALOADERS.
dataset_params – dataset params that override the yaml configured defaults, then passed to the dataset_cls.__init__.
dataloader_params – DataLoader params that override the yaml configured defaults, then passed to the DataLoader.__init__
dataset – torch.utils.data.Dataset to be used instead of passing “name” (i.e for external dataset objects).
- Returns
initialized DataLoader.
super_gradients.training.exceptions module
super_gradients.training.kd_trainer module
- class super_gradients.training.kd_trainer.KDTrainer(experiment_name: str, device: Optional[str] = None, multi_gpu: Union[MultiGPUMode, str] = MultiGPUMode.OFF, ckpt_root_dir: Optional[str] = None)[source]
Bases:
Trainer
- classmethod train_from_config(cfg: Union[DictConfig, dict]) None [source]
Trains according to cfg recipe configuration.
@param cfg: The parsed DictConfig from yaml recipe files @return: output of kd_trainer.train(…) (i.e results tuple)
- train(model: Optional[KDModule] = None, training_params: dict = {}, student: Optional[SgModule] = None, teacher: Optional[Module] = None, kd_architecture: Union[type, str] = 'kd_module', kd_arch_params: dict = {}, run_teacher_on_eval=False, train_loader: Optional[DataLoader] = None, valid_loader: Optional[DataLoader] = None, *args, **kwargs)[source]
Trains the student network (wrapped in KDModule network).
- Parameters
model – KDModule, network to train. When none is given will initialize KDModule according to kd_architecture, student and teacher (default=None)
training_params – dict, Same as in Trainer.train()
student – SgModule - the student trainer
teacher – torch.nn.Module- the teacher trainer
kd_architecture – KDModule architecture to use, currently only ‘kd_module’ is supported (default=’kd_module’).
kd_arch_params – architecture params to pas to kd_architecture constructor.
run_teacher_on_eval – bool- whether to run self.teacher at eval mode regardless of self.train(mode)
train_loader – Dataloader for train set.
valid_loader – Dataloader for validation.
super_gradients.training.legacy module
super_gradients.training.losses_models module
- class super_gradients.training.losses.Losses[source]
Bases:
object
Static class holding all the supported loss names
- CROSS_ENTROPY = 'cross_entropy'
- MSE = 'mse'
- R_SQUARED_LOSS = 'r_squared_loss'
- SHELFNET_OHEM_LOSS = 'shelfnet_ohem_loss'
- SHELFNET_SE_LOSS = 'shelfnet_se_loss'
- YOLOX_LOSS = 'yolox_loss'
- YOLOX_FAST_LOSS = 'yolox_fast_loss'
- SSD_LOSS = 'ssd_loss'
- STDC_LOSS = 'stdc_loss'
- BCE_DICE_LOSS = 'bce_dice_loss'
- KD_LOSS = 'kd_loss'
- DICE_CE_EDGE_LOSS = 'dice_ce_edge_loss'
- class super_gradients.training.losses.FocalLoss(loss_fcn: BCEWithLogitsLoss, gamma=1.5, alpha=0.25)[source]
Bases:
_Loss
Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
- reduction: str
- forward(pred, true)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class super_gradients.training.losses.LabelSmoothingCrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean', smooth_eps=None, smooth_dist=None, from_logits=True)[source]
Bases:
CrossEntropyLoss
CrossEntropyLoss - with ability to recieve distrbution as targets, and optional label smoothing
- forward(input, target, smooth_dist=None)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- ignore_index: int
- label_smoothing: float
- class super_gradients.training.losses.ShelfNetOHEMLoss(threshold: float = 0.7, mining_percent: float = 0.0001, ignore_lb: int = 255)[source]
Bases:
OhemCELoss
- forward(predictions_list: list, targets)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property component_names
Component names for logging during training. These correspond to 2nd item in the tuple returned in self.forward(…). See super_gradients.Trainer.train() docs for more info.
- reduction: str
- class super_gradients.training.losses.ShelfNetSemanticEncodingLoss(se_weight=0.2, nclass=21, aux_weight=0.4, weight=None, ignore_index=-1)[source]
Bases:
CrossEntropyLoss
2D Cross Entropy Loss with Auxilary Loss
- forward(logits, labels)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property component_names
Component names for logging during training. These correspond to 2nd item in the tuple returned in self.forward(…). See super_gradients.Trainer.train() docs for more info.
- ignore_index: int
- label_smoothing: float
- class super_gradients.training.losses.YoloXDetectionLoss(strides: list, num_classes: int, use_l1: bool = False, center_sampling_radius: float = 2.5, iou_type='iou')[source]
Bases:
_Loss
Calculate YOLOX loss: L = L_objectivness + L_iou + L_classification + 1[use_l1]*L_l1
- where:
L_iou, L_classification and L_l1 are calculated only between cells and targets that suit them;
L_objectivness is calculated for all cells.
- L_classification:
for cells that have suitable ground truths in their grid locations add BCEs to force a prediction of IoU with a GT in a multi-label way Coef: 1.
- L_iou:
for cells that have suitable ground truths in their grid locations add (1 - IoU^2), IoU between a predicted box and each GT box, force maximum IoU Coef: 5.
- L_l1:
for cells that have suitable ground truths in their grid locations l1 distance between the logits and GTs in “logits” format (the inverse of “logits to predictions” ops) Coef: 1[use_l1]
- L_objectness:
for each cell add BCE with a label of 1 if there is GT assigned to the cell Coef: 1
- strides
list: List of Yolo levels output grid sizes (i.e [8, 16, 32]).
- num_classes
int: Number of classes.
- use_l1
bool: Controls the L_l1 Coef as discussed above (default=False).
- center_sampling_radius
float: Sampling radius used for center sampling when creating the fg mask (default=2.5).
- iou_type
str: Iou loss type, one of [“iou”,”giou”] (deafult=”iou”).
- property component_names
Component names for logging during training. These correspond to 2nd item in the tuple returned in self.forward(…). See super_gradients.Trainer.train() docs for more info.
- forward(model_output: Union[list, Tuple[Tensor, List]], targets: Tensor)[source]
- Parameters
model_output –
Union[list, Tuple[torch.Tensor, List]]: When list-
output from all Yolo levels, each of shape [Batch x 1 x GridSizeY x GridSizeX x (4 + 1 + Num_classes)]
And when tuple- the second item is the described list (first item is discarded)
targets – torch.Tensor: Num_targets x (4 + 2)], values on dim 1 are: image id in a batch, class, box x y w h
- Returns
loss, all losses separately in a detached tensor
- prepare_predictions(predictions: List[Tensor]) Tuple[Tensor, Tensor, Tensor, Tensor, Tensor] [source]
Convert raw outputs of the network into a format that merges outputs from all levels :param predictions: output from all Yolo levels, each of shape
[Batch x 1 x GridSizeY x GridSizeX x (4 + 1 + Num_classes)]
- Returns
5 tensors representing predictions: * x_shifts: shape [1 x * num_cells x 1],
where num_cells = grid1X * grid1Y + grid2X * grid2Y + grid3X * grid3Y, x coordinate on the grid cell the prediction is coming from
y_shifts: shape [1 x num_cells x 1], y coordinate on the grid cell the prediction is coming from
expanded_strides: shape [1 x num_cells x 1], stride of the output grid the prediction is coming from
transformed_outputs: shape [batch_size x num_cells x (num_classes + 5)], predictions with boxes in real coordinates and logprobabilities
raw_outputs: shape [batch_size x num_cells x (num_classes + 5)], raw predictions with boxes and confidences as logits
- get_l1_target(l1_target, gt, stride, x_shifts, y_shifts, eps=1e-08)[source]
- Parameters
l1_target – tensor of zeros of shape [Num_cell_gt_pairs x 4]
gt – targets in coordinates [Num_cell_gt_pairs x (4 + 1 + num_classes)]
- Returns
targets in the format corresponding to logits
- get_assignments(image_idx, num_gt, total_num_anchors, gt_bboxes_per_image, gt_classes, bboxes_preds_per_image, expanded_strides, x_shifts, y_shifts, cls_preds, obj_preds, mode='gpu', ious_loss_cost_coeff=3.0, outside_boxes_and_center_cost_coeff=100000.0)[source]
- Match cells to ground truth:
at most 1 GT per cell
dynamic number of cells per GT
- Parameters
outside_boxes_and_center_cost_coeff – float: Cost coefficiant of cells the radius and bbox of gts in dynamic matching (default=100000).
ious_loss_cost_coeff – float: Cost coefficiant for iou loss in dynamic matching (default=3).
image_idx – int: Image index in batch.
num_gt – int: Number of ground trunth targets in the image.
total_num_anchors – int: Total number of possible bboxes = sum of all grid cells.
gt_bboxes_per_image – torch.Tensor: Tensor of gt bboxes for the image, shape: (num_gt, 4).
gt_classes – torch.Tesnor: Tensor of the classes in the image, shape: (num_preds,4).
bboxes_preds_per_image – Tensor of the classes in the image, shape: (num_preds).
expanded_strides – torch.Tensor: Stride of the output grid the prediction is coming from, shape (1 x num_cells x 1).
x_shifts – torch.Tensor: X’s in cell coordinates, shape (1,num_cells,1).
y_shifts – torch.Tensor: Y’s in cell coordinates, shape (1,num_cells,1).
cls_preds – torch.Tensor: Class predictions in all cells, shape (batch_size, num_cells).
obj_preds – torch.Tensor: Objectness predictions in all cells, shape (batch_size, num_cells).
mode – str: One of [“gpu”,”cpu”], Controls the device the assignment operation should be taken place on (deafult=”gpu”)
- get_in_boxes_info(gt_bboxes_per_image, expanded_strides, x_shifts, y_shifts, total_num_anchors, num_gt)[source]
- Create a mask for all cells, mask in only foreground: cells that have a center located:
withing a GT box;
OR * within a fixed radius around a GT box (center sampling);
- Parameters
num_gt – int: Number of ground trunth targets in the image.
total_num_anchors – int: Sum of all grid cells.
gt_bboxes_per_image – torch.Tensor: Tensor of gt bboxes for the image, shape: (num_gt, 4).
expanded_strides – torch.Tensor: Stride of the output grid the prediction is coming from, shape (1 x num_cells x 1).
x_shifts – torch.Tensor: X’s in cell coordinates, shape (1,num_cells,1).
y_shifts – torch.Tensor: Y’s in cell coordinates, shape (1,num_cells,1).
- :return is_in_boxes_anchor, is_in_boxes_and_center
- where:
- is_in_boxes_anchor masks the cells that their cell center is inside a gt bbox and within
self.center_sampling_radius cells away, without reduction (i.e shape=(num_gts, num_fgs))
- is_in_boxes_and_center masks the cells that their center is either inside a gt bbox or within
self.center_sampling_radius cells away, shape (num_fgs)
- dynamic_k_matching(cost, pair_wise_ious, gt_classes, num_gt, fg_mask)[source]
- Parameters
cost – pairwise cost, [num_FGs x num_GTs]
pair_wise_ious – pairwise IoUs, [num_FGs x num_GTs]
gt_classes – class of each GT
num_gt – number of GTs
- :return num_fg, (number of foregrounds)
gt_matched_classes, (the classes that have been matched with fgs) pred_ious_this_matching matched_gt_inds
- reduction: str
- class super_gradients.training.losses.YoloXFastDetectionLoss(strides, num_classes, use_l1=False, center_sampling_radius=2.5, iou_type='iou', dynamic_ks_bias=1.1, sync_num_fgs=False, obj_loss_fix=False)[source]
Bases:
YoloXDetectionLoss
A completely new implementation of YOLOX loss. This is NOT an equivalent implementation to the regular yolox loss.
- Completely avoids using loops compared to the nested loops in the original implementation.
As a result runs much faster (speedup depends on the type of GPUs, their count, the batch size, etc.).
- Tensors format is very different the original implementation.
Tensors contain image ids, ground truth ids and anchor ids as values to support variable length data.
There are differences in terms of the algorithm itself:
- When computing a dynamic k for a ground truth,
in the original implementation they consider the sum of top 10 predictions sorted by ious among the initial foregrounds of any ground truth in the image, while in our implementation we consider only the initial foreground of that particular ground truth. To compensate for that difference we introduce the dynamic_ks_bias hyperparamter which makes the dynamic ks larger.
- When computing the k matched detections for a ground truth,
in the original implementation they consider the initial foregrounds of any ground truth in the image as candidates, while in our implementation we consider only the initial foreground of that particular ground truth as candidates. We believe that this difference is minor.
- Parameters
dynamic_ks_bias – hyperparameter to compensate for the discrepancies between the regular loss and this loss.
sync_num_fgs – sync num of fgs. Can be used for DDP training.
obj_loss_fix – devide by total of num anchors instead num of matching fgs. Can be used for objectness loss.
- reduction: str
- training: bool
- class super_gradients.training.losses.RSquaredLoss(size_average=None, reduce=None, reduction: str = 'mean')[source]
Bases:
_Loss
- forward(output, target)[source]
Computes the R-squared for the output and target values :param output: Tensor / Numpy / List
The prediction
- Parameters
target – Tensor / Numpy / List The corresponding lables
- reduction: str
- class super_gradients.training.losses.SSDLoss(dboxes: DefaultBoxes, alpha: float = 1.0, iou_thresh: float = 0.5, neg_pos_ratio: float = 3.0)[source]
Bases:
_Loss
Implements the loss as the sum of the followings: 1. Confidence Loss: All labels, with hard negative mining 2. Localization Loss: Only on positive labels
- L = (2 - alpha) * L_l1 + alpha * L_cls, where
L_cls is HardMiningCrossEntropyLoss
L_l1 = [SmoothL1Loss for all positives]
- property component_names
Component names for logging during training. These correspond to 2nd item in the tuple returned in self.forward(…). See super_gradients.Trainer.train() docs for more info.
- match_dboxes(targets)[source]
creates tensors with target boxes and labels for each dboxes, so with the same len as dboxes.
Each GT is assigned with a grid cell with the highest IoU, this creates a pair for each GT and some cells;
The rest of grid cells are assigned to a GT with the highest IoU, assuming it’s > self.iou_thresh; If this condition is not met the grid cell is marked as background
GT-wise: one to many Grid-cell-wise: one to one
- Parameters
targets – a tensor containing the boxes for a single image; shape [num_boxes, 6] (image_id, label, x, y, w, h)
- Returns
two tensors boxes - shape of dboxes [4, num_dboxes] (x,y,w,h) labels - sahpe [num_dboxes]
- forward(predictions: Tuple, targets)[source]
- Compute the loss
:param predictions - predictions tensor coming from the network, tuple with shapes ([Batch Size, 4, num_dboxes], [Batch Size, num_classes + 1, num_dboxes]) were predictions have logprobs for background and other classes :param targets - targets for the batch. [num targets, 6] (index in batch, label, x,y,w,h)
- reduction: str
- class super_gradients.training.losses.BCEDiceLoss(loss_weights=[0.5, 0.5], logits=True)[source]
Bases:
Module
Binary Cross Entropy + Dice Loss
Weighted average of BCE and Dice loss
- loss_weights
list of size 2 s.t loss_weights[0], loss_weights[1] are the weights for BCE, Dice
- respectively.
- forward(input: Tensor, target: Tensor) Tensor [source]
@param input: Network’s raw output shaped (N,1,H,W) @param target: Ground truth shaped (N,H,W)
- training: bool
- class super_gradients.training.losses.KDLogitsLoss(task_loss_fn: _Loss, distillation_loss_fn: _Loss = KDklDivLoss(), distillation_loss_coeff: float = 0.5)[source]
Bases:
_Loss
Knowledge distillation loss, wraps the task loss and distillation loss
- property component_names
Component names for logging during training. These correspond to 2nd item in the tuple returned in self.forward(…). See super_gradients.Trainer.train() docs for more info.
- forward(kd_module_output, target)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- reduction: str
- class super_gradients.training.losses.DiceCEEdgeLoss(num_classes: int, num_aux_heads: int = 2, num_detail_heads: int = 1, weights: Union[tuple, list] = (1, 1, 1, 1), dice_ce_weights: Union[tuple, list] = (1, 1), ignore_index: int = -100, edge_kernel: int = 3, ce_edge_weights: Union[tuple, list] = (0.5, 0.5))[source]
Bases:
_Loss
- property component_names
Component names for logging during training. These correspond to 2nd item in the tuple returned in self.forward(…). See super_gradients.Trainer.train() docs for more info.
- forward(preds: Tuple[Tensor], target: Tensor)[source]
- Parameters
preds – Model output predictions, must be in the followed format: [Main-feats, Aux-feats[0], …, Aux-feats[num_auxs-1], Detail-feats[0], …, Detail-feats[num_details-1]
- reduction: str
super_gradients.training.metrics module
- class super_gradients.training.metrics.Metrics[source]
Bases:
object
Static class holding all the supported metric names
- ACCURACY = 'Accuracy'
- TOP5 = 'Top5'
- DETECTION_METRICS = 'DetectionMetrics'
- DETECTION_METRICS_050_095 = 'DetectionMetrics_050_095'
- DETECTION_METRICS_050 = 'DetectionMetrics_050'
- DETECTION_METRICS_075 = 'DetectionMetrics_075'
- IOU = 'IoU'
- BINARY_IOU = 'BinaryIOU'
- DICE = 'Dice'
- BINARY_DICE = 'BinaryDice'
- PIXEL_ACCURACY = 'PixelAccuracy'
- super_gradients.training.metrics.accuracy(output, target, topk=(1,))[source]
Computes the precision@k for the specified values of k :param output: Tensor / Numpy / List
The prediction
- Parameters
target – Tensor / Numpy / List The corresponding lables
topk – tuple The type of accuracy to calculate, e.g. topk=(1,5) returns accuracy for top-1 and top-5
- class super_gradients.training.metrics.Accuracy(dist_sync_on_step=False)[source]
Bases:
Accuracy
- update(preds: Tensor, target: Tensor)[source]
Update state with predictions and targets. See pages/classification:input types for more information on input types.
- Parameters
preds – Predictions from model (logits, probabilities, or labels)
target – Ground truth labels
- correct: Tensor
- total: Tensor
- class super_gradients.training.metrics.Top5(dist_sync_on_step=False)[source]
Bases:
Metric
- class super_gradients.training.metrics.ToyTestClassificationMetric(dist_sync_on_step=False)[source]
Bases:
Metric
Dummy classification Mettric object returning 0 always (for testing).
- class super_gradients.training.metrics.DetectionMetrics(num_cls: int, post_prediction_callback: Optional[DetectionPostPredictionCallback] = None, normalize_targets: bool = False, iou_thres: Union[IouThreshold, float] = IouThreshold.MAP_05_TO_095, recall_thres: Optional[Tensor] = None, score_thres: float = 0.1, top_k_predictions: int = 100, dist_sync_on_step: bool = False, accumulate_on_cpu: bool = True)[source]
Bases:
Metric
Metric class for computing F1, Precision, Recall and Mean Average Precision.
- num_cls
Number of classes.
- post_prediction_callback
DetectionPostPredictionCallback to be applied on net’s output prior to the metric computation (NMS).
- normalize_targets
Whether to normalize bbox coordinates by image size (default=False).
- iou_thresholds
IoU threshold to compute the mAP (default=torch.linspace(0.5, 0.95, 10)).
- recall_thresholds
Recall threshold to compute the mAP (default=torch.linspace(0, 1, 101)).
- score_threshold
Score threshold to compute Recall, Precision and F1 (default=0.1)
- top_k_predictions
Number of predictions per class used to compute metrics, ordered by confidence score (default=100)
- dist_sync_on_step
Synchronize metric state across processes at each
forward()
before returning the value at the step. (default=False)- accumulate_on_cpu: Run on CPU regardless of device used in other parts.
This is to avoid “CUDA out of memory” that might happen on GPU (default False)
- update(preds, target: Tensor, device: str, inputs: tensor, crowd_targets: Optional[Tensor] = None)[source]
Apply NMS and match all the predictions and targets of a given batch, and update the metric state accordingly.
- :param predsRaw output of the model, the format might change from one model to another, but has to fit
the input format of the post_prediction_callback
- Parameters
target – Targets for all images of shape (total_num_targets, 6) format: (index, x, y, w, h, label) where x,y,w,h are in range [0,1]
device – Device to run on
inputs – Input image tensor of shape (batch_size, n_img, height, width)
crowd_targets – Crowd targets for all images of shape (total_num_targets, 6) format: (index, x, y, w, h, label) where x,y,w,h are in range [0,1]
- class super_gradients.training.metrics.PreprocessSegmentationMetricsArgs(apply_arg_max: bool = False, apply_sigmoid: bool = False)[source]
Bases:
AbstractMetricsArgsPrepFn
Default segmentation inputs preprocess function before updating segmentation metrics, handles multiple inputs and apply normalizations.
- class super_gradients.training.metrics.PixelAccuracy(ignore_label=-100, dist_sync_on_step=False, metrics_args_prep_fn: Optional[AbstractMetricsArgsPrepFn] = None)[source]
Bases:
Metric
- class super_gradients.training.metrics.IoU(num_classes: int, dist_sync_on_step: bool = False, ignore_index: Optional[int] = None, reduction: str = 'elementwise_mean', threshold: float = 0.5, metrics_args_prep_fn: Optional[AbstractMetricsArgsPrepFn] = None)[source]
Bases:
JaccardIndex
- update(preds, target: Tensor)[source]
Update state with predictions and targets.
- Parameters
preds – Predictions from model
target – Ground truth values
- confmat: Tensor
- class super_gradients.training.metrics.Dice(num_classes: int, dist_sync_on_step: bool = False, ignore_index: Optional[int] = None, reduction: str = 'elementwise_mean', threshold: float = 0.5, metrics_args_prep_fn: Optional[AbstractMetricsArgsPrepFn] = None)[source]
Bases:
JaccardIndex
- update(preds, target: Tensor)[source]
Update state with predictions and targets.
- Parameters
preds – Predictions from model
target – Ground truth values
- confmat: Tensor
- class super_gradients.training.metrics.BinaryIOU(dist_sync_on_step=True, ignore_index: Optional[int] = None, threshold: float = 0.5, metrics_args_prep_fn: Optional[AbstractMetricsArgsPrepFn] = None)[source]
Bases:
IoU
- confmat: Tensor
- training: bool
- class super_gradients.training.metrics.BinaryDice(dist_sync_on_step=True, ignore_index: Optional[int] = None, threshold: float = 0.5, metrics_args_prep_fn: Optional[AbstractMetricsArgsPrepFn] = None)[source]
Bases:
Dice
- confmat: Tensor
- training: bool
- class super_gradients.training.metrics.DetectionMetrics_050(num_cls: int, post_prediction_callback: Optional[DetectionPostPredictionCallback] = None, normalize_targets: bool = False, recall_thres: Optional[Tensor] = None, score_thres: float = 0.1, top_k_predictions: int = 100, dist_sync_on_step: bool = False, accumulate_on_cpu: bool = True)[source]
Bases:
DetectionMetrics
- class super_gradients.training.metrics.DetectionMetrics_075(num_cls: int, post_prediction_callback: Optional[DetectionPostPredictionCallback] = None, normalize_targets: bool = False, recall_thres: Optional[Tensor] = None, score_thres: float = 0.1, top_k_predictions: int = 100, dist_sync_on_step: bool = False, accumulate_on_cpu: bool = True)[source]
Bases:
DetectionMetrics
- class super_gradients.training.metrics.DetectionMetrics_050_095(num_cls: int, post_prediction_callback: Optional[DetectionPostPredictionCallback] = None, normalize_targets: bool = False, recall_thres: Optional[Tensor] = None, score_thres: float = 0.1, top_k_predictions: int = 100, dist_sync_on_step: bool = False, accumulate_on_cpu: bool = True)[source]
Bases:
DetectionMetrics
super_gradients.training.models module
super_gradients.training.sg_model module
- class super_gradients.training.sg_trainer.Trainer(experiment_name: str, device: Optional[str] = None, multi_gpu: Union[MultiGPUMode, str] = MultiGPUMode.OFF, ckpt_root_dir: Optional[str] = None)[source]
Bases:
object
SuperGradient Model - Base Class for Sg Models
- train(max_epochs: int, initial_epoch: int, save_model: bool)[source]
the main function used for the training, h.p. updating, logging etc.
- predict(idx: int)
returns the predictions and label of the current inputs
- test(epoch : int, idx : int, save : bool):
returns the test loss, accuracy and runtime
- classmethod train_from_config(cfg: Union[DictConfig, dict]) Tuple[Module, Tuple] [source]
Trains according to cfg recipe configuration.
@param cfg: The parsed DictConfig from yaml recipe files or a dictionary @return: the model and the output of trainer.train(…) (i.e results tuple)
- classmethod resume_experiment(experiment_name: str, ckpt_root_dir: Optional[str] = None) None [source]
Resume a training that was run using our recipes.
- Parameters
experiment_name – Name of the experiment to resume
ckpt_root_dir – Directory including the checkpoints
- classmethod evaluate_from_recipe(cfg: DictConfig) None [source]
Evaluate according to a cfg recipe configuration.
- Note: This script does NOT run training, only validation.
Please make sure that the config refers to a PRETRAINED MODEL either from one of your checkpoint or from pretrained weights from model zoo.
- Parameters
cfg – The parsed DictConfig from yaml recipe files or a dictionary
- classmethod evaluate_checkpoint(experiment_name: str, ckpt_name: str = 'ckpt_latest.pth', ckpt_root_dir: Optional[str] = None) None [source]
Evaluate a checkpoint resulting from one of your previous experiment, using the same parameters (dataset, valid_metrics,…) as used during the training of the experiment
Note
The parameters will be unchanged even if the recipe used for that experiment was changed since then. This is to ensure that validation of the experiment will remain exactly the same as during training.
- Example, evaluate the checkpoint “average_model.pth” from experiment “my_experiment_name”:
>> evaluate_checkpoint(experiment_name=”my_experiment_name”, ckpt_name=”average_model.pth”)
- Parameters
experiment_name – Name of the experiment to validate
ckpt_name – Name of the checkpoint to test (“ckpt_latest.pth”, “average_model.pth” or “ckpt_best.pth” for instance)
ckpt_root_dir – Directory including the checkpoints
- train(model: Module, training_params: Optional[dict] = None, train_loader: Optional[DataLoader] = None, valid_loader: Optional[DataLoader] = None, additional_configs_to_log: Optional[Dict] = None)[source]
train - Trains the Model
- IMPORTANT NOTE: Additional batch parameters can be added as a third item (optional) if a tuple is returned by
the data loaders, as dictionary. The phase context will hold the additional items, under an attribute with the same name as the key in this dictionary. Then such items can be accessed through phase callbacks.
- param additional_configs_to_log
Dict, dictionary containing configs that will be added to the training’s sg_logger. Format should be {“Config_title_1”: {…}, “Config_title_2”:{..}}.
- param model
torch.nn.Module, model to train.
- param train_loader
Dataloader for train set.
- param valid_loader
Dataloader for validation.
- param training_params
resume : bool (default=False)
- Whether to continue training from ckpt with the same experiment name
(i.e resume from CKPT_ROOT_DIR/EXPERIMENT_NAME/CKPT_NAME)
ckpt_name : str (default=ckpt_latest.pth)
- The checkpoint (.pth file) filename in CKPT_ROOT_DIR/EXPERIMENT_NAME/ to use when resume=True and
resume_path=None
resume_path: str (default=None)
Explicit checkpoint path (.pth file) to use to resume training.
max_epochs : int
Number of epochs to run training.
lr_updates : list(int)
List of fixed epoch numbers to perform learning rate updates when lr_mode=’step’.
lr_decay_factor : float
Decay factor to apply to the learning rate at each update when lr_mode=’step’.
lr_mode : str
Learning rate scheduling policy, one of [‘step’,’poly’,’cosine’,’function’]. ‘step’ refers to constant updates at epoch numbers passed through lr_updates. ‘cosine’ refers to Cosine Anealing policy as mentioned in https://arxiv.org/abs/1608.03983. ‘poly’ refers to polynomial decrease i.e in each epoch iteration self.lr = self.initial_lr * pow((1.0 - (current_iter / max_iter)), 0.9) ‘function’ refers to user defined learning rate scheduling function, that is passed through lr_schedule_function.
lr_schedule_function : Union[callable,None]
Learning rate scheduling function to be used when lr_mode is ‘function’.
lr_warmup_epochs : int (default=0)
Number of epochs for learning rate warm up - see https://arxiv.org/pdf/1706.02677.pdf (Section 2.2).
- cosine_final_lr_ratiofloat (default=0.01)
- Final learning rate ratio (only relevant when `lr_mode`=’cosine’). The cosine starts from initial_lr and reaches
initial_lr * cosine_final_lr_ratio in last epoch
inital_lr : float
Initial learning rate.
loss : Union[nn.module, str]
Loss function for training. One of SuperGradient’s built in options:
“cross_entropy”: LabelSmoothingCrossEntropyLoss, “mse”: MSELoss, “r_squared_loss”: RSquaredLoss, “detection_loss”: YoLoV3DetectionLoss, “shelfnet_ohem_loss”: ShelfNetOHEMLoss, “shelfnet_se_loss”: ShelfNetSemanticEncodingLoss, “ssd_loss”: SSDLoss,
or user defined nn.module loss function.
IMPORTANT: forward(…) should return a (loss, loss_items) tuple where loss is the tensor used for backprop (i.e what your original loss function returns), and loss_items should be a tensor of shape (n_items), of values computed during the forward pass which we desire to log over the entire epoch. For example- the loss itself should always be logged. Another example is a scenario where the computed loss is the sum of a few components we would like to log- these entries in loss_items).
IMPORTANT:When dealing with external loss classes, to logg/monitor the loss_items as described above by specific string name:
- Set a “component_names” property in the loss class, whos instance is passed through train_params,
to be a list of strings, of length n_items who’s ith element is the name of the ith entry in loss_items. Then each item will be logged, rendered on tensorboard and “watched” (i.e saving model checkpoints according to it) under <LOSS_CLASS.__name__>”/”<COMPONENT_NAME>. If a single item is returned rather then a tuple, it would be logged under <LOSS_CLASS.__name__>. When there is no such attributed, the items will be named <LOSS_CLASS.__name__>”/”Loss_”<IDX> according to the length of loss_items
- For example:
- class MyLoss(_Loss):
… def forward(self, inputs, targets):
… total_loss = comp1 + comp2 loss_items = torch.cat((total_loss.unsqueeze(0),comp1.unsqueeze(0), comp2.unsqueeze(0)).detach() return total_loss, loss_items
… @property def component_names(self):
return [“total_loss”, “my_1st_component”, “my_2nd_component”]
- Trainer.train(…
- train_params={“loss”:MyLoss(),
… “metric_to_watch”: “MyLoss/my_1st_component”}
- This will write to log and monitor MyLoss/total_loss, MyLoss/my_1st_component,
MyLoss/my_2nd_component.
- For example:
- class MyLoss2(_Loss):
… def forward(self, inputs, targets):
… total_loss = comp1 + comp2 loss_items = torch.cat((total_loss.unsqueeze(0),comp1.unsqueeze(0), comp2.unsqueeze(0)).detach() return total_loss, loss_items
…
- Trainer.train(…
- train_params={“loss”:MyLoss(),
… “metric_to_watch”: “MyLoss2/loss_0”}
This will write to log and monitor MyLoss2/loss_0, MyLoss2/loss_1, MyLoss2/loss_2 as they have been named by their positional index in loss_items.
Since running logs will save the loss_items in some internal state, it is recommended that loss_items are detached from their computational graph for memory efficiency.
optimizer : Union[str, torch.optim.Optimizer]
Optimization algorithm. One of [‘Adam’,’SGD’,’RMSProp’] corresponding to the torch.optim optimzers implementations, or any object that implements torch.optim.Optimizer.
criterion_params : dict
Loss function parameters.
- optimizer_paramsdict
When optimizer is one of [‘Adam’,’SGD’,’RMSProp’], it will be initialized with optimizer_params.
(see https://pytorch.org/docs/stable/optim.html for the full list of parameters for each optimizer).
train_metrics_list : list(torchmetrics.Metric)
Metrics to log during training. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
valid_metrics_list : list(torchmetrics.Metric)
Metrics to log during validation/testing. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
loss_logging_items_names : list(str)
The list of names/titles for the outputs returned from the loss functions forward pass (reminder- the loss function should return the tuple (loss, loss_items)). These names will be used for logging their values.
metric_to_watch : str (default=”Accuracy”)
will be the metric which the model checkpoint will be saved according to, and can be set to any of the following:
a metric name (str) of one of the metric objects from the valid_metrics_list
a “metric_name” if some metric in valid_metrics_list has an attribute component_names which is a list referring to the names of each entry in the output metric (torch tensor of size n)
one of “loss_logging_items_names” i.e which will correspond to an item returned during the loss function’s forward pass (see loss docs abov).
At the end of each epoch, if a new best metric_to_watch value is achieved, the models checkpoint is saved in YOUR_PYTHON_PATH/checkpoints/ckpt_best.pth
greater_metric_to_watch_is_better : bool
- When choosing a model’s checkpoint to be saved, the best achieved model is the one that maximizes the
metric_to_watch when this parameter is set to True, and a one that minimizes it otherwise.
ema : bool (default=False)
Whether to use Model Exponential Moving Average (see https://github.com/rwightman/pytorch-image-models ema implementation)
batch_accumulate : int (default=1)
Number of batches to accumulate before every backward pass.
ema_params : dict
Parameters for the ema model.
zero_weight_decay_on_bias_and_bn : bool (default=False)
Whether to apply weight decay on batch normalization parameters or not (ignored when the passed optimizer has already been initialized).
load_opt_params : bool (default=True)
Whether to load the optimizers parameters as well when loading a model’s checkpoint.
run_validation_freq : int (default=1)
- The frequency in which validation is performed during training (i.e the validation is ran every
run_validation_freq epochs.
save_model : bool (default=True)
Whether to save the model checkpoints.
silent_mode : bool
Silents the print outs.
mixed_precision : bool
Whether to use mixed precision or not.
save_ckpt_epoch_list : list(int) (default=[])
List of fixed epoch indices the user wishes to save checkpoints in.
average_best_models : bool (default=False)
If set, a snapshot dictionary file and the average model will be saved / updated at every epoch and evaluated only when training is completed. The snapshot file will only be deleted upon completing the training. The snapshot dict will be managed on cpu.
precise_bn : bool (default=False)
Whether to use precise_bn calculation during the training.
precise_bn_batch_size : int (default=None)
The effective batch size we want to calculate the batchnorm on. For example, if we are training a model on 8 gpus, with a batch of 128 on each gpu, a good rule of thumb would be to give it 8192 (ie: effective_batch_size * num_gpus = batch_per_gpu * num_gpus * num_gpus). If precise_bn_batch_size is not provided in the training_params, the latter heuristic will be taken.
seed : int (default=42)
Random seed to be set for torch, numpy, and random. When using DDP each process will have it’s seed set to seed + rank.
log_installed_packages : bool (default=False)
- When set, the list of all installed packages (and their versions) will be written to the tensorboard
and logfile (useful when trying to reproduce results).
dataset_statistics : bool (default=False)
Enable a statistic analysis of the dataset. If set to True the dataset will be analyzed and a report will be added to the tensorboard along with some sample images from the dataset. Currently only detection datasets are supported for analysis.
sg_logger : Union[AbstractSGLogger, str] (defauls=base_sg_logger)
Define the SGLogger object for this training process. The SGLogger handles all disk writes, logs, TensorBoard, remote logging and remote storage. By overriding the default base_sg_logger, you can change the storage location, support external monitoring and logging or support remote storage.
sg_logger_params : dict
SGLogger parameters
clip_grad_norm : float
Defines a maximal L2 norm of the gradients. Values which exceed the given value will be clipped
lr_cooldown_epochs : int (default=0)
Number of epochs to cooldown LR (i.e the last epoch from scheduling view point=max_epochs-cooldown).
pre_prediction_callback : Callable (default=None)
- When not None, this callback will be applied to images and targets, and returning them to be used
for the forward pass, and further computations. Args for this callable should be in the order (inputs, targets, batch_idx) returning modified_inputs, modified_targets
ckpt_best_name : str (default=’ckpt_best.pth’)
The best checkpoint (according to metric_to_watch) will be saved under this filename in the checkpoints directory.
enable_qat: bool (default=False)
- Adds a QATCallback to the phase callbacks, that triggers quantization aware training starting from
qat_params[“start_epoch”]
qat_params: dict-like object with the following key/values:
start_epoch: int, first epoch to start QAT.
- quant_modules_calib_method: str, One of [percentile, mse, entropy, max]. Statistics method for amax
computation of the quantized modules (default=percentile).
per_channel_quant_modules: bool, whether quant modules should be per channel (default=False).
calibrate: bool, whether to perfrom calibration (default=False).
calibrated_model_path: str, path to a calibrated checkpoint (default=None).
- calib_data_loader: torch.utils.data.DataLoader, data loader of the calibration dataset. When None,
context.train_loader will be used (default=None).
num_calib_batches: int, number of batches to collect the statistics from.
- percentile: float, percentile value to use when Trainer,quant_modules_calib_method=’percentile’.
Discarded when other methods are used (Default=99.99).
- Returns
- property get_arch_params
- property get_structure
- property get_architecture
- property get_module
- test(model: Optional[Module] = None, test_loader: Optional[DataLoader] = None, loss: Optional[_Loss] = None, silent_mode: bool = False, test_metrics_list=None, loss_logging_items_names=None, metrics_progress_verbose=False, test_phase_callbacks=None, use_ema_net=True) tuple [source]
Evaluates the model on given dataloader and metrics. :param model: model to perfrom test on. When none is given, will try to use self.net (defalut=None). :param test_loader: dataloader to perform test on. :param test_metrics_list: (list(torchmetrics.Metric)) metrics list for evaluation. :param silent_mode: (bool) controls verbosity :param metrics_progress_verbose: (bool) controls the verbosity of metrics progress (default=False). Slows down the program. :param use_ema_net (bool) whether to perform test on self.ema_model.ema (when self.ema_model.ema exists,
otherwise self.net will be tested) (default=True)
- Returns
results tuple (tuple) containing the loss items and metric values.
- All of the above args will override Trainer’s corresponding attribute when not equal to None. Then evaluation
is ran on self.test_loader with self.test_metrics.
- evaluate(data_loader: DataLoader, metrics: MetricCollection, evaluation_type: EvaluationType, epoch: Optional[int] = None, silent_mode: bool = False, metrics_progress_verbose: bool = False)[source]
Evaluates the model on given dataloader and metrics.
- Parameters
data_loader – dataloader to perform evaluataion on
metrics – (MetricCollection) metrics for evaluation
evaluation_type – (EvaluationType) controls which phase callbacks will be used (for example, on batch end, when evaluation_type=EvaluationType.VALIDATION the Phase.VALIDATION_BATCH_END callbacks will be triggered)
epoch – (int) epoch idx
silent_mode – (bool) controls verbosity
metrics_progress_verbose – (bool) controls the verbosity of metrics progress (default=False). Slows down the program significantly.
- Returns
results tuple (tuple) containing the loss items and metric values.
- property get_net
Getter for network. :return: torch.nn.Module, self.net
- set_net(net: Module)[source]
Setter for network.
- Parameters
net – torch.nn.Module, value to set net
- Returns
- class super_gradients.training.sg_trainer.MultiGPUMode(value)[source]
Bases:
str
,Enum
- OFF - Single GPU Mode / CPU Mode
- DATA_PARALLEL - Multiple GPUs, Synchronous
- DISTRIBUTED_DATA_PARALLEL - Multiple GPUs, Asynchronous
- OFF = 'Off'
- DATA_PARALLEL = 'DP'
- DISTRIBUTED_DATA_PARALLEL = 'DDP'
- AUTO = 'AUTO'
- class super_gradients.training.sg_trainer.StrictLoad(value)[source]
Bases:
Enum
Wrapper for adding more functionality to torch’s strict_load parameter in load_state_dict(). .. attribute:: OFF - Native torch “strict_load = off” behaviour. See nn.Module.load_state_dict() documentation for more details.
- ON - Native torch "strict_load = on" behaviour. See nn.Module.load_state_dict() documentation for more details.
- NO_KEY_MATCHING - Allows the usage of SuperGradient's adapt_checkpoint function, which loads a checkpoint by matching each
layer’s shapes (and bypasses the strict matching of the names of each layer (ie: disregards the state_dict key matching)).
- OFF = False
- ON = True
- NO_KEY_MATCHING = 'no_key_matching'
super_gradients.training.training_hyperparams module
- super_gradients.training.training_hyperparams.cifar10_resnet_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.cityscapes_ddrnet_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.cityscapes_regseg48_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.cityscapes_stdc_base_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.cityscapes_stdc_seg50_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.cityscapes_stdc_seg75_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.coco2017_ssd_lite_mobilenet_v2_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.coco2017_yolox_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.coco_segmentation_shelfnet_lw_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_efficientnet_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_mobilenetv2_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_mobilenetv3_base_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_mobilenetv3_large_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_mobilenetv3_small_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_regnetY_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_repvgg_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_resnet50_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_resnet50_kd_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_vit_base_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.imagenet_vit_large_train_params(overriding_params: Optional[Dict] = None)[source]
- super_gradients.training.training_hyperparams.get(config_name, overriding_params: Optional[Dict] = None) Dict [source]
- Class for creating training hyper parameters dictionary, taking defaults from yaml
files in src/super_gradients/recipes.
- Parameters
overriding_params – Dict, dictionary like object containing entries to override in the recipe’s training hyper parameters dictionary.
config_name – yaml config filename in recipes (for example coco2017_yolox).
super_gradients.training.transforms module
- class super_gradients.training.transforms.Transforms[source]
Bases:
object
Static class holding all the supported transform names
- SegRandomFlip = 'SegRandomFlip'
- SegResize = 'SegResize'
- SegRescale = 'SegRescale'
- SegRandomRescale = 'SegRandomRescale'
- SegRandomRotate = 'SegRandomRotate'
- SegCropImageAndMask = 'SegCropImageAndMask'
- SegRandomGaussianBlur = 'SegRandomGaussianBlur'
- SegPadShortToCropSize = 'SegPadShortToCropSize'
- SegColorJitter = 'SegColorJitter'
- DetectionMosaic = 'DetectionMosaic'
- DetectionRandomAffine = 'DetectionRandomAffine'
- DetectionMixup = 'DetectionMixup'
- DetectionHSV = 'DetectionHSV'
- DetectionHorizontalFlip = 'DetectionHorizontalFlip'
- DetectionPaddedRescale = 'DetectionPaddedRescale'
- DetectionTargetsFormat = 'DetectionTargetsFormat'
- DetectionTargetsFormatTransform = 'DetectionTargetsFormatTransform'
- RandomResizedCropAndInterpolation = 'RandomResizedCropAndInterpolation'
- RandAugmentTransform = 'RandAugmentTransform'
- Lighting = 'Lighting'
- RandomErase = 'RandomErase'
- Compose = 'Compose'
- ToTensor = 'ToTensor'
- PILToTensor = 'PILToTensor'
- ConvertImageDtype = 'ConvertImageDtype'
- ToPILImage = 'ToPILImage'
- Normalize = 'Normalize'
- Resize = 'Resize'
- CenterCrop = 'CenterCrop'
- Pad = 'Pad'
- Lambda = 'Lambda'
- RandomApply = 'RandomApply'
- RandomChoice = 'RandomChoice'
- RandomOrder = 'RandomOrder'
- RandomCrop = 'RandomCrop'
- RandomHorizontalFlip = 'RandomHorizontalFlip'
- RandomVerticalFlip = 'RandomVerticalFlip'
- RandomResizedCrop = 'RandomResizedCrop'
- FiveCrop = 'FiveCrop'
- TenCrop = 'TenCrop'
- LinearTransformation = 'LinearTransformation'
- ColorJitter = 'ColorJitter'
- RandomRotation = 'RandomRotation'
- RandomAffine = 'RandomAffine'
- Grayscale = 'Grayscale'
- RandomGrayscale = 'RandomGrayscale'
- RandomPerspective = 'RandomPerspective'
- RandomErasing = 'RandomErasing'
- GaussianBlur = 'GaussianBlur'
- InterpolationMode = 'InterpolationMode'
- RandomInvert = 'RandomInvert'
- RandomPosterize = 'RandomPosterize'
- RandomSolarize = 'RandomSolarize'
- RandomAdjustSharpness = 'RandomAdjustSharpness'
- RandomAutocontrast = 'RandomAutocontrast'
- RandomEqualize = 'RandomEqualize'
- class super_gradients.training.transforms.DetectionMosaic(input_dim: tuple, prob: float = 1.0, enable_mosaic: bool = True)[source]
Bases:
DetectionTransform
DetectionMosaic detection transform
- input_dim
(tuple) input dimension.
- prob
(float) probability of applying mosaic.
- enable_mosaic
(bool) whether to apply mosaic at all (regardless of prob) (default=True).
- class super_gradients.training.transforms.DetectionRandomAffine(degrees=10, translate=0.1, scales=0.1, shear=10, target_size=(640, 640), filter_box_candidates: bool = False, wh_thr=2, ar_thr=20, area_thr=0.1)[source]
Bases:
DetectionTransform
DetectionRandomAffine detection transform
- target_size
(tuple) desired output shape.
- degrees
(Union[tuple, float]) degrees for random rotation, when float the random values are drawn uniformly from (-degrees, degrees)
- translate
(Union[tuple, float]) translate size (in pixels) for random translation, when float the random values are drawn uniformly from (-translate, translate)
- scales
(Union[tuple, float]) values for random rescale, when float the random values are drawn uniformly from (0.1-scales, 0.1+scales)
- shear
(Union[tuple, float]) degrees for random shear, when float the random values are drawn uniformly from (shear, shear)
- enable
(bool) whether to apply the below transform at all.
- filter_box_candidates
(bool) whether to filter out transformed bboxes by edge size, area ratio, and aspect ratio (default=False).
- wh_thr
(float) edge size threshold when filter_box_candidates = True. Bounding oxes with edges smaller then this values will be filtered out. (default=2)
- ar_thr
(float) aspect ratio threshold filter_box_candidates = True. Bounding boxes with aspect ratio larger then this values will be filtered out. (default=20)
- area_thr
(float) threshold for area ratio between original image and the transformed one, when when filter_box_candidates = True. Bounding boxes with such ratio smaller then this value will be filtered out. (default=0.1)
- class super_gradients.training.transforms.DetectionHSV(prob: float, hgain: float = 0.5, sgain: float = 0.5, vgain: float = 0.5, bgr_channels=(0, 1, 2))[source]
Bases:
DetectionTransform
Detection HSV transform.
- class super_gradients.training.transforms.DetectionPaddedRescale(input_dim, swap=(2, 0, 1), max_targets=50, pad_value=114)[source]
Bases:
DetectionTransform
Preprocessing transform to be applied last of all transforms for validation.
Image- Rescales and pads to self.input_dim. Targets- pads targets to max_targets, moves the class label to first index, converts boxes format- xyxy -> cxcywh.
- input_dim
(tuple) final input dimension (default=(640,640))
- swap
image axis’s to be rearranged.
- class super_gradients.training.transforms.DetectionTargetsFormatTransform(input_format: DetectionTargetsFormat = DetectionTargetsFormat.XYXY_LABEL, output_format: DetectionTargetsFormat = DetectionTargetsFormat.LABEL_CXCYWH, min_bbox_edge_size: float = 1, max_targets: int = 120)[source]
Bases:
DetectionTransform
Detection targets format transform
Converts targets in input_format to output_format. .. attribute:: input_format
DetectionTargetsFormat: input target format
- output_format
DetectionTargetsFormat: output target format
- min_bbox_edge_size
int: bboxes with edge size lower then this values will be removed.
- max_targets
int: max objects in single image, padding target to this size.
super_gradients.training.utils module
- class super_gradients.training.utils.Timer(device: str)[source]
Bases:
object
A class to measure time handling both GPU & CPU processes Returns time in milliseconds
- class super_gradients.training.utils.HpmStruct(**entries)[source]
Bases:
object
- class super_gradients.training.utils.WrappedModel(module)[source]
Bases:
Module
- forward(x)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- super_gradients.training.utils.convert_to_tensor(array)[source]
Converts numpy arrays and lists to Torch tensors before calculation losses :param array: torch.tensor / Numpy array / List
- super_gradients.training.utils.get_param(params, name, default_val=None)[source]
Retrieves a param from a parameter object/dict. If the parameter does not exist, will return default_val. In case the default_val is of type dictionary, and a value is found in the params - the function will return the default value dictionary with internal values overridden by the found value
i.e. default_opt_params = {‘lr’:0.1, ‘momentum’:0.99, ‘alpha’:0.001} training_params = {‘optimizer_params’: {‘lr’:0.0001}, ‘batch’: 32 …. } get_param(training_params, name=’optimizer_params’, default_val=default_opt_params) will return {‘lr’:0.0001, ‘momentum’:0.99, ‘alpha’:0.001}
- Parameters
params – an object (typically HpmStruct) or a dict holding the params
name – name of the searched parameter
default_val – assumed to be the same type as the value searched in the params
- Returns
the found value, or default if not found
- super_gradients.training.utils.tensor_container_to_device(obj: Union[Tensor, tuple, list, dict], device: str, non_blocking=True)[source]
- recursively send compounded objects to device (sending all tensors to device and maintaining structure)
:param obj the object to send to device (list / tuple / tensor / dict) :param device: device to send the tensors to :param non_blocking: used for DistributedDataParallel :returns an object with the same structure (tensors, lists, tuples) with the device pointers (like
the return value of Tensor.to(device)
- super_gradients.training.utils.adapt_state_dict_to_fit_model_layer_names(model_state_dict: dict, source_ckpt: dict, exclude: list = [], solver: Optional[callable] = None)[source]
Given a model state dict and source checkpoints, the method tries to correct the keys in the model_state_dict to fit the ckpt in order to properly load the weights into the model. If unsuccessful - returns None
- param model_state_dict
the model state_dict
- param source_ckpt
checkpoint dict
:param exclude optional list for excluded layers :param solver: callable with signature (ckpt_key, ckpt_val, model_key, model_val)
that returns a desired weight for ckpt_val.
- return
renamed checkpoint dict (if possible)
- super_gradients.training.utils.raise_informative_runtime_error(state_dict, checkpoint, exception_msg)[source]
Given a model state dict and source checkpoints, the method calls “adapt_state_dict_to_fit_model_layer_names” and enhances the exception_msg if loading the checkpoint_dict via the conversion method is possible
- super_gradients.training.utils.random_seed(is_ddp, device, seed)[source]
Sets random seed of numpy, torch and random.
When using ddp a seed will be set for each process according to its local rank derived from the device number. :param is_ddp: bool, will set different random seed for each process when using ddp. :param device: ‘cuda’,’cpu’, ‘cuda:<device_number>’ :param seed: int, random seed to be set