Skip to content

Personalized Federated Learning

FedRep

FedRep learns a shared data representation (the global layers) across clients and a unique, personalized local "head" (the local layers) for each client. In this implementation, after each round of local training, only the representation on each client is retrieved and uploaded to the server for aggregation.

cd examples/personalized_fl
uv run fedrep/fedrep.py -c configs/fedrep_CIFAR10_resnet18.toml

Reference: Collins et al., "Exploiting Shared Representations for Personalized Federated Learning," in Proc. International Conference on Machine Learning (ICML), 2021.

Alignment with the paper

plato/trainers/strategies/algorithms/personalized_fl_strategy.py:185-314 implements FedRep's alternating schedule by freezing global layers while local epochs run, then swapping to train the shared representation, and finally locking the body during personalization rounds just as Collins et al. prescribe.

The personalized FedAvg payload strategy (plato/clients/strategies/fedavg_personalized.py:18-51) saves each client's head before uploading, so the server only aggregates the shared representation, matching the paper's requirement that local heads remain private.

plato/servers/fedavg_personalized.py:34-56 mirrors the FedRep personalization phase by launching a final round where every client fine-tunes its head with the frozen representation supplied from the aggregated body.


FedBABU

FedBABU only updates the global layers of the model during FL training. The local layers are frozen at the beginning of each local training epoch.

cd examples/personalized_fl
uv run fedbabu/fedbabu.py -c configs/fedbabu_CIFAR10_resnet18.toml

Reference: Oh et al., "FedBABU: Towards Enhanced Representation for Federated Image Classification," in Proc. International Conference on Learning Representations (ICLR), 2022.

Alignment with the paper

The FedBABU callback (examples/personalized_fl/fedbabu/fedbabu_trainer.py:15-63) freezes the classifier head while clients collaborate and then swaps to freezing the representation during personalization, matching Algorithm 1 in Oh et al., where a global body is updated first and only heads are tuned afterwards.

Both stages call trainer_utils.freeze_model / activate_model, so the tensors listed under algorithm.global_layer_names are the only ones pushed to the server while each client's head parameters stay private—exactly the split used in the authors' PyTorch release.

Because the trainer inherits the personalized FedAvg server (plato/servers/fedavg_personalized.py:34-56), the final round runs purely local epochs on the unfrozen heads, recreating the fine-tuning pass the paper uses to personalize FedBABU.


APFL

APFL jointly optimizes the global model and personalized models by interpolating between local and personalized models. Once the global model is received, each client will carry out a regular local update, and then conduct a personalized optimization to acquire a trained personalized model. The trained global model and the personalized model will subsequently be combined using the parameter "alpha," which can be dynamically updated.

cd examples/personalized_fl
uv run apfl/apfl.py -c configs/apfl_CIFAR10_resnet18.toml

Reference: Deng et al., "Adaptive Personalized Federated Learning," in Arxiv, 2021.

Alignment with the paper

plato/trainers/strategies/algorithms/apfl_strategy.py:48-232 keeps a second personalized model and persists the per-client mixing coefficient alpha, aligning with APFL's dual-model formulation and adaptive interpolation.

The training loop (plato/trainers/strategies/algorithms/apfl_strategy.py:267-393) follows Algorithm 1 exactly: update the global weights, blend personalized and global logits via alpha, backprop on the personalized copy, and apply the gradient-based alpha update from Eq. 10.

The FedTorch reference from the authors (MLOPTPSU/FedTorch/fedtorch/comms/trainings/federated/apfl.py:33-178) orchestrates the same sequence of broadcast, dual-model training, alpha adaptation, and FedAvg synchronization, giving a direct mapping between the repository code and the Plato strategies.


FedPer

FedPer learns a global representation and personalized heads, but makes simultaneous local updates for both sets of parameters, therefore makes the same number of local updates for the head and the representation on each local round.

cd examples/personalized_fl
uv run fedper/fedper.py -c configs/fedper_CIFAR10_resnet18.toml

Reference: Arivazhagan et al., "Federated learning with personalization layers," in Arxiv, 2019.

Alignment with the paper

plato/trainers/strategies/algorithms/personalized_fl_strategy.py:1-176 mirrors FedPer's workflow by keeping the entire network trainable during collaborative rounds and then freezing the representation once personalization rounds begin, so only the top layers fine-tune as in Arivazhagan et al.

The layer-selection logic respects algorithm.global_layer_names, so the exact modules the paper treats as shared are the ones Plato locks in personalization mode while each client adjusts its local classification head.

Because the clients rely on the personalized FedAvg payload strategy (plato/clients/strategies/fedavg_personalized.py:18-51), the saved local head is restored after every download, matching the paper's requirement that personalization layers never leave the device.


LG-FedAvg

With LG-FedAvg only the global layers of a model are sent to the server for aggregation, while each client keeps local layers to itself.

cd examples/personalized_fl
uv run lgfedavg/lgfedavg.py -c configs/lgfedavg_CIFAR10_resnet18.toml

Reference: Liang et al., "Think Locally, Act Globally: Federated Learning with Local and Global Representations," in Proc. NeurIPS, 2019.

Alignment with the paper

The LG-FedAvg step strategy (plato/trainers/strategies/algorithms/lgfedavg_strategy.py:34-122) performs two optimizer passes per batch — first training local layers with the global ones frozen, then swapping roles — which is the alternating scheme detailed in Liang et al.

Configured layer name lists let the strategy decide which parameters stay on device versus which are shared, mirroring the local/global split from the paper.

The authors' implementation (pliang279/LG-FedAvg/main_lg.py:87-137) likewise accumulates only the selected global keys when averaging, so Plato's keyed updates land on the exact parameter subsets that the reference code synchronizes.


Ditto

Ditto jointly optimizes the global model and personalized models by learning local models that are encouraged to be close together by global regularization. In this example, once the global model is received, each client will carry out a regular local update and then optimizes the personalized model.

cd examples/personalized_fl
uv run ditto/ditto.py -c configs/ditto_CIFAR10_resnet18.toml

Reference: Li et al., "Ditto: Fair and robust federated learning through personalization," in Proc ICML, 2021.

Alignment with the paper

plato/trainers/strategies/algorithms/ditto_strategy.py:46-259 snapshots the broadcast model at round start, then after FedAvg training runs local epochs that minimize F_k(v) + (lambda/2)*||v - w||^2, reproducing Ditto's Algorithm 1.

The personalized model stays on device - only the global weights are returned - so the implementation matches Ditto's requirement that v remains private while being regularized toward w.

The TensorFlow code shared by the authors (litian96/ditto/flearn/trainers_MTL/ditto.py:82-146) applies the same proximal update by adding lam * (v - w) before each personalized step, matching the regularization term implemented inside DittoUpdateStrategy.


Per-FedAvg

Per-FedAvg uses the Model-Agnostic Meta-Learning (MAML) framework to perform local training during the regular training rounds. It performs two forward and backward passes with fixed learning rates in each iteration.

cd examples/personalized_fl
uv run perfedavg/perfedavg.py -c configs/perfedavg_CIFAR10_resnet18.toml

Reference: Fallah et al., "Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach," in Proc NeurIPS, 2020.

Alignment with the paper

The Per-FedAvg training step (examples/personalized_fl/perfedavg/perfedavg_trainer.py:18-135) follows Fallah et al.'s inner-loop exactly: Step 1 copies the weights, runs an alpha-sized SGD step, Step 2 rewinds to the snapshot, applies a beta-weighted meta-gradient on a fresh batch, and finally writes the meta-update back.

The callback resets the iterator each epoch so Step 2 always sees a different batch, matching the bilevel formulation in the paper where meta-gradients use fresh data rather than the inner-loop samples.

During personalization rounds (current_round > Config().trainer.rounds), the strategy skips the meta-update and performs plain SGD, duplicating the paper's deployment phase where each client adapts locally without the meta step.


Hermes

Hermes utilizes structured pruning to improve both communication efficiency and inference efficiency of federated learning. It prunes channels with the lowest magnitudes in each local model and adjusts the pruning amount based on each local model's test accuracy and its previous pruning amount. When the server aggregates pruned updates, it only averages parameters that were not pruned on all clients.

cd examples/personalized_fl
uv run hermes/hermes.py -c configs/hermes_CIFAR10_resnet18.toml

Reference: Li et al., "Hermes: An Efficient Federated Learning Framework for Heterogeneous Mobile Clients," in Proc. 27th Annual International Conference on Mobile Computing and Networking (MobiCom), 2021.

Alignment with the paper

The Hermes server (examples/personalized_fl/hermes/hermes_server.py:29-126) only averages weights where every mask indicates the parameter survived pruning, replicating the overlap-aware aggregation described in Li et al. while keeping pruned entries client-specific.

The trainer-side pruning callback and mask pipeline (examples/personalized_fl/hermes/hermes_trainer.py:22-142, examples/personalized_fl/hermes/hermes_processor.py:1-51, examples/personalized_fl/hermes/hermes_callback.py:1-33) evaluate accuracy, adjust pruning rates, save masks, and attach them to outbound payloads exactly like the structured pruning workflow in the paper.