Skip to content

Results

types

The set of columns that will be written into the runtime .csv file.

Common built-in values include:

  • round
  • accuracy
  • accuracy_std
  • elapsed_time
  • comm_time
  • processing_time
  • round_time
  • comm_overhead
  • train_loss
  • core_metric
  • local_epoch_num
  • edge_agg_num
  • evaluation_primary_value

Note

Use commas to separate them. The default is round, accuracy, elapsed_time.

Structured evaluators

When [evaluation] is configured, Plato automatically appends any new evaluation_* columns that appear at runtime. You do not need to predeclare every Lighteval task metric in results.types, although predeclaring the summary columns can keep the CSV order stable.

result_path

The path to the result .csv files.

Default value: <base_path>/results/, where <base_path> is specified in the general section.

record_clients_accuracy

Whether a separate *_accuracy.csv file should be written for client-side test metrics.

Default value: false

This setting only has an effect when clients.do_test = true.

Structured evaluation columns

Structured evaluators write their metrics directly into the runtime CSV using the evaluation_ prefix.

Examples:

  • evaluation_ifeval_avg
  • evaluation_hellaswag
  • evaluation_arc_easy
  • evaluation_arc_challenge
  • evaluation_arc_avg
  • evaluation_piqa
  • evaluation_ifeval_prompt_level_strict_acc
  • evaluation_arc_easy_acc
  • evaluation_arc_challenge_acc_stderr

Summary metrics come from EvaluationResult.metrics. Plato also exports evaluation_primary_value, which mirrors the evaluator's configured primary metric value. Some evaluators, such as Lighteval, also expose detailed task-level metrics that Plato flattens into additional CSV columns automatically.

Example

[results]
types = "round, elapsed_time, accuracy, train_loss, evaluation_ifeval_avg, evaluation_hellaswag, evaluation_arc_avg, evaluation_piqa"

This predeclares the summary columns, and Plato adds any extra detailed evaluation_* columns later if the evaluator emits them.

Per-client accuracy logging

If you want a separate *_accuracy.csv file containing one row per selected client, enable both:

[clients]
do_test = true

[results]
record_clients_accuracy = true

Client-side testing alone is not enough; the extra CSV is only written when results.record_clients_accuracy = true is also set.

Logging format

The runtime CSV is the authoritative structured-evaluation log. Plato no longer writes a separate JSONL sidecar for evaluator outputs.