Trainer

type

The type of the trainer. The following types are available:

basic a basic trainer with a standard training loop.
timm_basic a basic trainer with the timm learning rate scheduler.
diff_privacy a trainer that supports local differential privacy in its training loop by adding noise to the gradients during each step of training.

max_physical_batch_size

The limit on the physical batch size when using the diff_privacy trainer.

Default value: 128. The GPU memory usage of one process training the ResNet-18 model is around 2817 MB.

dp_epsilon

Total privacy budget of epsilon with the diff_privacy trainer.

Default value: 10.0

dp_delta

Total privacy budget of delta with the diff_privacy trainer.

Default value: 1e-5

dp_max_grad_norm

The maximum norm of the per-sample gradients with the diff_privacy trainer. Any gradient with norm higher than this will be clipped to this value.

Default value: 1.0
split_learning a trainer that supports the split learning framework.
self_supervised_learning a trainer that supports personalized federated learning based on self supervised learning.
gan a trainer for Generative Adversarial Networks (GANs).

rounds

The maximum number of training rounds.

round could be any positive integer.

max_concurrency

The maximum number of clients (of each edge server in cross-silo training) running concurrently on each available GPU. If this is not defined, no new processes are spawned for training.

Note

Plato will automatically use all available GPUs to maximize the concurrency of training, launching the same number of clients on every GPU. If max_concurrency is 7 and 3 GPUs are available, 21 client processes will be launched for concurrent training.

target_accuracy

The target accuracy of the global model.

target_perplexity

The target perplexity of the global Natural Language Processing (NLP) model.

epochs

The total number of epochs in local training in each communication round.

batch_size

The size of the mini-batch of data in each step (iteration) of the training loop.

optimizer

The type of the optimizer. The following options are supported:

Adam
Adadelta
Adagrad
AdaHessian (from the torch_optimizer package)
AdamW
SparseAdam
Adamax
ASGD
LBFGS
NAdam
RAdam
RMSprop
Rprop
SGD

lr_scheduler

The learning rate scheduler. The following learning rate schedulers are supported:

CosineAnnealingLR
LambdaLR
MultiStepLR
StepLR
ReduceLROnPlateau
ConstantLR
LinearLR
ExponentialLR
CyclicLR
CosineAnnealingWarmRestarts

Alternatively, all four schedulers from timm are supported if lr_scheduler is specified as timm and trainer -> type is specified as timm_basic. For example, to use the SGDR scheduler, we specify cosine as sched in its arguments (parameters -> learning_rate):


[trainer]
type = "timm_basic"

[parameters]

[parameters.learning_rate]
sched = cosine
min_lr = 1.e-6
warmup_lr = 0.0001
warmup_epochs = 3
cooldown_epochs = 10

loss_criterion

The loss criterion. The following options are supported:

L1Loss
MSELoss
BCELoss
BCEWithLogitsLoss
NLLLoss
PoissonNLLLoss
CrossEntropyLoss
HingeEmbeddingLoss
MarginRankingLoss
TripletMarginLoss
KLDivLoss
NegativeCosineSimilarity
NTXentLoss
SwaVLoss

global_lr_scheduler

Whether the learning rate should be scheduled globally (true) or not (false). If true, the learning rate of the first epoch in the next communication round is scheduled based on that of the last epoch in the previous communication round.

model_type

The repository where the machine learning model should be retrieved from. The following options are available:

cnn_encoder (for generating various encoders by extracting from CNN models such as ResNet models)
general_multilayer (for generating a multi-layer perceptron using a provided configuration)
huggingface (for HuggingFace causal language models)
torch_hub (for models from PyTorch Hub)
vit (for Vision Transformer models from HuggingFace, Tokens-to-Token ViT, and Deep Vision Transformer)

The name of the model should be specified below, in model_name.

Note

For vit, please replace the / in model name from https://huggingface.co/models with @. For example, use google@vit-base-patch16-224-in21k instead of google/vit-base-patch16-224-in21k. If you do not want to use the pretrained weights, set parameters -> model -> pretrained to false, as in the following example:


[parameters]
[parameters.model]
pretrained = false

model_name

The name of the machine learning model. The following options are available:

lenet5
resnet_x
vgg_x
dcgan
multilayer

Note

If the model_type above specified a model repository, supply the name of the model, such as gpt2, here.

For resnet_x, x = 18, 34, 50, 101, or 152; For vgg_x, x = 11, 13, 16, or 19.