Asynchronous Federated Learning

FedAsync

FedAsync is one of the first algorithms proposed in the literature towards operating federated learning training sessions in asynchronous mode, which Plato supports natively. It advocated aggregating aggressively whenever only one client reported its local updates to the server.

In its implementation, FedAsync's server subclasses from the FedAvg server and overrides its configure() and aggregate_weights() functions. In configure(), it needs to add some custom features (of obtaining a mixing hyperparameter for later use in the aggregation process), and calls super().configure() first, similar to its __init__() function calling super().__init__(). When it overrides aggregate_weights(), however, it supplied a completely custom implementation of this function.


cd examples/async/fedasync
uv run fedasync.py -c fedasync_MNIST_lenet5.toml

Reference: C. Xie, S. Koyejo, I. Gupta. "Asynchronous Federated Optimization," in Proc. Annual Workshop on Optimization for Machine Learning (OPT), 2020.

Alignment with the paper

examples/async/fedasync/fedasync_algorithm.py:9-19 updates the global weights as w_new = (1 - mixing) * w_prev + mixing * w_client, matching Eq. (2) in Xie et al. (2020) with the mixing hyperparameter alpha_t supplied by the server.

The strategy layer plato/servers/strategies/aggregation/fedasync.py:34-107 loads the constant, polynomial, and hinge staleness functions described in Section 5.2 of the paper and scales the mixing rate when adaptive_mixing is enabled, reproducing the decay schedule for stale updates.

As soon as minimum_clients_aggregated reports arrive (set to 1 for FedAsync), the asynchronous core plato/servers/base.py:208-789 triggers aggregation while enforcing the staleness bound check in the same loop, which is the event-driven update policy outlined in Algorithm 1.

FedBuff

With over 400 citations based on Google Scholar, FedBuff is one of the widely cited asynchronous federated learning algorithms, and known for its simplicity. To run it:


cd examples/async/fedbuff
uv run fedbuff.py -c fedbuff_cifar10.toml

Reference: J. Nguyen et al. "Federated Learning with Buffered Asynchronous Aggregation," in Proc. AISTATS 2022.

Alignment with the paper

The buffer flush performs a uniform average of the stored deltas in plato/servers/strategies/aggregation/fedbuff.py:19-42, just like Algorithm 1 in Nguyen et al. (2022) where each buffered update contributes equally once the buffer is released.

plato/servers/base.py:229-789 maintains the asynchronous buffer using minimum_clients_aggregated and the staleness guard, mirroring the FedBuff rule that waits for B arrivals and skips overly stale updates before applying the aggregated step.

When request_update is enabled, the same base server invokes the urgent update path in plato/servers/base.py:980-1038, which implements the RequestUpdate routine described in the appendix even though the authors did not publish official source code.

Port

Port is one of the newer asynchronous federated learning algorithms. The server will aggregate when it receives a minimum number of clients' updates, which can be tuned with 'minimum_clients_aggregated'. The 'staleness_bound' is also a common parameter in asynchronous FL, which limit the staleness of all clients' updates. 'request_update' is a special design in Port, to force clients report their updates and shut down the training process if their too slow. 'similarity_weight' and 'staleness_weight' are two hyper-parameters in Port, tuning the weights of them when the server do aggregation. 'max_sleep_time', 'sleep_simulation', 'avg_training_time' and 'simulation_distribution' are also important to define the arrival clients in Port.


cd examples/async/port
uv run port.py -c port_cifar10.toml

Reference: N. Su, B. Li. "How Asynchronous can Federated Learning Be?," in Proc. IEEE/ACM International Symposium on Quality of Service (IWQoS), 2022.

Alignment with the paper

examples/async/port/port_server.py:20-112 recreates Port's weighted aggregation by combining sample counts, cosine similarity, and staleness factors exactly as in Eq. (5) of Su and Li (2022): the code normalizes the weights after multiplying num_samples / total_samples with the similarity and staleness terms.

The cosine-similarity helper loads the checkpoint from weights_aggregated() (examples/async/port/port_server.py:114-134) so that stale clients are scored against the correct historical round, matching the history-dependent similarity weight defined by the authors.

The asynchronous infrastructure in plato/servers/base.py:229-789 and the urgent update hook in plato/servers/base.py:980-1038 use minimum_clients_aggregated, staleness_bound, and request_update exactly like Port's queue management, forcing laggards to report or be dropped, which the paper highlights as a key systems contribution.