FedRep learns a shared data representation (the global layers) across clients and a unique, personalized local "head" (the local layers) for each client. In this implementation, after each round of local training, only the representation on each client is retrieved and uploaded to the server for aggregation.
plato/trainers/strategies/algorithms/personalized_fl_strategy.py:185-314 implements FedRep's alternating schedule by freezing global layers while local epochs run, then swapping to train the shared representation, and finally locking the body during personalization rounds just as Collins et al. prescribe.
The personalized FedAvg payload strategy (plato/clients/strategies/fedavg_personalized.py:18-51) saves each client's head before uploading, so the server only aggregates the shared representation, matching the paper's requirement that local heads remain private.
plato/servers/fedavg_personalized.py:34-56 mirrors the FedRep personalization phase by launching a final round where every client fine-tunes its head with the frozen representation supplied from the aggregated body.
FedBABU
FedBABU only updates the global layers of the model during FL training. The local layers are frozen at the beginning of each local training epoch.
The FedBABU callback (examples/personalized_fl/fedbabu/fedbabu_trainer.py:15-63) freezes the classifier head while clients collaborate and then swaps to freezing the representation during personalization, matching Algorithm 1 in Oh et al., where a global body is updated first and only heads are tuned afterwards.
Both stages call trainer_utils.freeze_model / activate_model, so the tensors listed under algorithm.global_layer_names are the only ones pushed to the server while each client's head parameters stay private—exactly the split used in the authors' PyTorch release.
Because the trainer inherits the personalized FedAvg server (plato/servers/fedavg_personalized.py:34-56), the final round runs purely local epochs on the unfrozen heads, recreating the fine-tuning pass the paper uses to personalize FedBABU.
APFL
APFL jointly optimizes the global model and personalized models by interpolating between local and personalized models. Once the global model is received, each client will carry out a regular local update, and then conduct a personalized optimization to acquire a trained personalized model. The trained global model and the personalized model will subsequently be combined using the parameter "alpha," which can be dynamically updated.
plato/trainers/strategies/algorithms/apfl_strategy.py:48-232 keeps a second personalized model and persists the per-client mixing coefficient alpha, aligning with APFL's dual-model formulation and adaptive interpolation.
The training loop (plato/trainers/strategies/algorithms/apfl_strategy.py:267-393) follows Algorithm 1 exactly: update the global weights, blend personalized and global logits via alpha, backprop on the personalized copy, and apply the gradient-based alpha update from Eq. 10.
The FedTorch reference from the authors (MLOPTPSU/FedTorch/fedtorch/comms/trainings/federated/apfl.py:33-178) orchestrates the same sequence of broadcast, dual-model training, alpha adaptation, and FedAvg synchronization, giving a direct mapping between the repository code and the Plato strategies.
FedPer
FedPer learns a global representation and personalized heads, but makes simultaneous local updates for both sets of parameters, therefore makes the same number of local updates for the head and the representation on each local round.
plato/trainers/strategies/algorithms/personalized_fl_strategy.py:1-176 mirrors FedPer's workflow by keeping the entire network trainable during collaborative rounds and then freezing the representation once personalization rounds begin, so only the top layers fine-tune as in Arivazhagan et al.
The layer-selection logic respects algorithm.global_layer_names, so the exact modules the paper treats as shared are the ones Plato locks in personalization mode while each client adjusts its local classification head.
Because the clients rely on the personalized FedAvg payload strategy (plato/clients/strategies/fedavg_personalized.py:18-51), the saved local head is restored after every download, matching the paper's requirement that personalization layers never leave the device.
FedALA
FedALA performs adaptive local aggregation (ALA) by blending the incoming
global model with a cached local model on each client before local training.
The adaptive weights are computed from a small pre-training window, so clients
start each round from a personalized initialization.
The adaptive local aggregation logic lives in
plato/trainers/strategies/algorithms/fedala_strategy.py:42-528, where the
client blends its cached local model with the broadcast model before
training, matching the ALA initialization described in Section 3.
The configuration wrapper
(plato/trainers/strategies/algorithms/fedala_strategy.py:532-579) reads
the ALA hyperparameters (eta, rand_percent, layer_idx, threshold,
num_pre_loss) from algorithm, so the example config mirrors Appendix
C.2 of the paper.
pFedGraph
pFedGraph infers a collaboration graph on the server to compute personalized
aggregation weights and then regularizes local training toward the aggregated
model for each client.
The pFedGraph aggregation strategy
(plato/servers/strategies/aggregation/pfedgraph.py:13-324) builds the
collaboration graph by computing cosine similarities over weight updates,
then projects each row onto the simplex to match the graph inference step
described in Section 3.2.
The server (plato/servers/pfedgraph.py:17-71) stores per-client aggregated
weights and returns them via customize_server_payload, mirroring the
client-specific aggregation in Algorithm 1.
The trainer wiring (plato/trainers/pfedgraph.py:8-25) combines
PFedGraphUpdateStrategy and PFedGraphLossStrategyFromConfig
(plato/trainers/strategies/algorithms/pfedgraph_strategy.py:20-123) to
capture the reference vector at round start and add the cosine similarity
regularizer from Equation 3.
This mirrors the official pFedGraph implementation's split between
server-side graph inference and client-side regularized optimization, while
Plato's defaults (cosine similarity + simplex-projected weights) provide a
concrete instantiation of the paper's model-similarity + dataset-size
formulation.
LG-FedAvg
With LG-FedAvg only the global layers of a model are sent to the server for aggregation, while each client keeps local layers to itself.
The LG-FedAvg step strategy (plato/trainers/strategies/algorithms/lgfedavg_strategy.py:34-122) performs two optimizer passes per batch — first training local layers with the global ones frozen, then swapping roles — which is the alternating scheme detailed in Liang et al.
Configured layer name lists let the strategy decide which parameters stay on device versus which are shared, mirroring the local/global split from the paper.
The authors' implementation (pliang279/LG-FedAvg/main_lg.py:87-137) likewise accumulates only the selected global keys when averaging, so Plato's keyed updates land on the exact parameter subsets that the reference code synchronizes.
Ditto
Ditto jointly optimizes the global model and personalized models by learning local models that are encouraged to be close together by global regularization. In this example, once the global model is received, each client will carry out a regular local update and then optimizes the personalized model.
plato/trainers/strategies/algorithms/ditto_strategy.py:46-259 snapshots the broadcast model at round start, then after FedAvg training runs local epochs that minimize F_k(v) + (lambda/2)*||v - w||^2, reproducing Ditto's Algorithm 1.
The personalized model stays on device - only the global weights are returned - so the implementation matches Ditto's requirement that v remains private while being regularized toward w.
The TensorFlow code shared by the authors (litian96/ditto/flearn/trainers_MTL/ditto.py:82-146) applies the same proximal update by adding lam * (v - w) before each personalized step, matching the regularization term implemented inside DittoUpdateStrategy.
Per-FedAvg
Per-FedAvg uses the Model-Agnostic Meta-Learning (MAML) framework to perform local training during the regular training rounds. It performs two forward and backward passes with fixed learning rates in each iteration.
The Per-FedAvg training step (examples/personalized_fl/perfedavg/perfedavg_trainer.py:18-135) follows Fallah et al.'s inner-loop exactly: Step 1 copies the weights, runs an alpha-sized SGD step, Step 2 rewinds to the snapshot, applies a beta-weighted meta-gradient on a fresh batch, and finally writes the meta-update back.
The callback resets the iterator each epoch so Step 2 always sees a different batch, matching the bilevel formulation in the paper where meta-gradients use fresh data rather than the inner-loop samples.
During personalization rounds (current_round > Config().trainer.rounds), the strategy skips the meta-update and performs plain SGD, duplicating the paper's deployment phase where each client adapts locally without the meta step.
Hermes
Hermes utilizes structured pruning to improve both communication efficiency and inference efficiency of federated learning. It prunes channels with the lowest magnitudes in each local model and adjusts the pruning amount based on each local model's test accuracy and its previous pruning amount. When the server aggregates pruned updates, it only averages parameters that were not pruned on all clients.
The Hermes server (examples/personalized_fl/hermes/hermes_server.py:29-126) only averages weights where every mask indicates the parameter survived pruning, replicating the overlap-aware aggregation described in Li et al. while keeping pruned entries client-specific.
The trainer-side pruning callback and mask pipeline (examples/personalized_fl/hermes/hermes_trainer.py:22-142, examples/personalized_fl/hermes/hermes_processor.py:1-51, examples/personalized_fl/hermes/hermes_callback.py:1-33) evaluate accuracy, adjust pruning rates, save masks, and attach them to outbound payloads exactly like the structured pruning workflow in the paper.