SCAFFOLD is a synchronous federated learning algorithm that performs server aggregation with control variates to better handle statistical heterogeneity. It has been quite widely cited and compared with in the federated learning literature. In this example, two processors, called ExtractControlVariatesProcessor and SendControlVariateProcessor, have been introduced to the client using a callback class, called ScaffoldCallback. They are used for sending control variates between the clients and the server. Each client also tries to maintain its own control variates for local optimization using files.
The callbacks wire Δci through the payload exactly as Algorithm 1 prescribes: clients attach their control-variate deltas in examples/customized_client_training/scaffold/scaffold_callback.py:33-82, the server strips them off and averages c=c+(1/m)∗∑Δci in examples/customized_client_training/scaffold/scaffold_server.py:34-53, and the updated server control variate is sent back in the next payload.
On each client round, plato/trainers/strategies/algorithms/scaffold_strategy.py:190-345 applies the correction w=w−η∗(g+c−ci) after every optimizer step, and recomputes ci,new=c−(xlocal−xglobal)/(η∗τ) before emitting Δci, mirroring the Option 2 formula that Karimireddy et al. (2020) derive for SCAFFOLD control variates.
Because the paper was released without official source code, the Plato example persists the same state transitions defined in Algorithm 1 via examples/customized_client_training/scaffold/scaffold_client.py:23-64, yielding the message flow (server c, client Δci) required for the theoretical convergence guarantees.
FedProx
To better handle system heterogeneity, the FedProx algorithm introduced a proximal term in the optimizer used by local training on the clients. It has been quite widely cited and compared with in the federated learning literature.
plato/trainers/strategies/algorithms/fedprox_strategy.py:111-193 snapshots the global iterate wt at round start and augments the loss with (μ/2)∗∣∣w−wt∣∣, which is the FedProx objective hk(w;wt)=Fk(w)+(mu/2)∗∣∣w−wt∣∣2 defined in Section 3 of Li et al. (2020). Autograd therefore produces the perturbed-gradient step without requiring a bespoke optimizer.
The config-aware wrapper FedProxLossStrategyFromConfig (plato/trainers/strategies/algorithms/fedprox_strategy.py:208-247) reads μ from the same knobs (clients.proximal_term_penalty_constant / algorithm.fedprox_mu) that the paper exposes in Algorithms 1 and 2, so experiments reproduce the authors' hyperparameter schedules.
The reference TensorFlow release (litian96/FedProx/flearn/optimizer/pgd.py#L27-L92) applies an identical perturbation, computing g+μ∗(w−wt) before the gradient step; Plato mirrors that logic in PyTorch by letting the proximal penalty backpropagate through the loss term, yielding a line-for-line correspondence with Perturbed Gradient Descent.
FedDyn
FedDyn is proposed to provide communication savings by dynamically updating each participating device's regularizer in each round of training. It is a method proposed to solve data heterogeneity in federated learning.
Reference: Acar, D.A.E., Zhao, Y., Navarro, R.M., Mattina, M., Whatmough, P.N. and Saligrama, V. "Federated learning based on dynamic regularization," Proceedings of International Conference on Learning Representations (ICLR), 2021.
Alignment with the paper
The loss strategy plato/trainers/strategies/algorithms/feddyn_strategy.py:148-205 evaluates Lk(w)+α∗<w,−wglobal+hk>+(α/2)∗∣∣w−wglobal∣∣2, exactly the dynamic-regularization objective introduced in Section 3 of Acar et al. (2021), with _get_alpha_coefficient reproducing the client-weighted scaling discussed beneath that formulation.
FedDynUpdateStrategy.on_train_end (plato/trainers/strategies/algorithms/feddyn_strategy.py:284-317) updates the cumulative gradient state via hk=hk+(wk−wglobal) before persisting it, which is the recursion that Algorithm 1 relies on to couple successive local solutions.
The authors' PyTorch implementation (alpemreacar/FedDyn/utils_methods.py#L286-L399) performs the same bookkeeping - after every client run it accumulates curr_model_par - cld_mdl_param into local_param_list and averages the corrected weights - demonstrating that Plato's composable trainer follows the released reference code step for step.
FedMoS
FedMoS is a communication-efficient FL framework with coupled double momentum-based update and adaptive client selection, to jointly mitigate the intrinsic variance.
Reference: X. Wang, Y. Chen, Y. Li, X. Liao, H. Jin and B. Li, "FedMoS: Taming Client Drift in Federated Learning with Double Momentum and Adaptive Selection," IEEE INFOCOM 2023.
Alignment with the paper
plato/trainers/strategies/algorithms/fedmos_strategy.py:104-205 implements FedMoS double-momentum update by first computing dt=gt+(1−a)∗(dt−1−gt−1) and then stepping w=(1−μ)∗w−η∗dt+μ∗wglobal; these are the same recursions described in Algorithm 1 of Wang et al. (2023).
The training loop enforces the paper's sequencing: FedMosStepStrategy.training_step (plato/trainers/strategies/algorithms/fedmos_strategy.py:487-538) calls update_momentum() immediately after backward() and passes the cached global model from FedMosUpdateStrategy.on_train_start (plato/trainers/strategies/algorithms/fedmos_strategy.py:329-347) into the optimizer step so the proximal pull uses the broadcast parameters from the server.