Data
dataset
The training and test datasets. The following options are available:
HuggingFace: including all datasets from Hugging Face (requiresdataset_name)Torchvision: including torchvision datasets such as MNIST, FashionMNIST, EMNIST, CIFAR10, CIFAR100, CelebA, or STL10 (requiresdataset_name)CINIC10FEMNIST: Federated EMNISTTinyImageNetPurchaseTexas
Torchvision configuration
When using the Torchvision datasource, specify dataset_name to choose the
dataset class exposed by torchvision.datasets. Optional fields include:
split_parameter: name of the constructor argument controlling the split (defaults totrainorsplitwhen available).train_split,test_split,unlabeled_split: values passed to the split parameter for each subset. For boolean splits, strings such as"train"and"test"map toTrueandFalse.dataset_args/dataset_kwargs: positional or keyword arguments shared across all splits.train_args/train_kwargs(and the equivalents fortestorunlabeled): per-split overrides.download: whether to trigger dataset downloads (defaults totruewhen supported by the selected dataset).- For EMNIST, the balanced split is assumed by default; override
dataset_kwargs = { split = "<variant>" }to select a different subset. - For CelebA, attributes and identities are enabled by default; adjust
dataset_kwargs.target_typewhen a different combination is required.
Sample Torchvision block
[data]
datasource = "Torchvision"
dataset_name = "MNIST"
download = true
dataset_kwargs = { root = "datasets" }
train_kwargs = { train = true }
test_kwargs = { train = false }
data_path
Where the dataset is located.
Default value: <base_path>/data
Note
For the CINIC10 dataset, the default is <base_path>/data/CINIC-10
For the TinyImageNet dataset, the default is <base_path>/data/tiny-imagenet-200
train_path
Where the training dataset is located.
test_path
Where the test dataset is located.
sampler
How to divide the entire dataset to the clients. The following options are available:
iid-
noniidCould have concentration attribute to specify the concentration parameter in the Dirichlet distributionconcentration
If the sampler is
noniid, the concentration parameter for the Dirichlet distribution can be specified.Default value:
1 -
orthogonalEach institution's clients have data of different classes. Could have institution_class_ids and label_distribution attributesinstitution_class_ids
If the sampler is
orthogonal, the indices of classes of local data of each institution's clients can be specified. e.g.,0, 1; 2, 3(the first institution's clients only have data of class #0 and #1; the second institution's clients only have data of class #2 and #3).label_distribution
If the sampler is
orthogonal, the class distribution of every client's local data can be specified. The value should beiidornoniid.Default value:
iid -
mixedSome data are iid, while others are non-iid. Must have non_iid_clients attributesnon_iid_clients
If the sampler is
mixed, the indices of clients whose datasets are non-i.i.d. need to be specified. Other clients' datasets are i.i.d.
testset_sampler
How the test dataset is sampled when clients test locally. Any sampler type is valid.
Note
Without this parameter, the test dataset on either the client or the server is the entire test dataset of the datasource.
random_seed
The random seed used to sample each client's dataset so that experiments are reproducible.
partition_size
The number of samples in each client's dataset.
testset_size
The number of samples in the server's test dataset when server-side evaluation is conducted.