distributed-computing

Describe the bug
I found that some names agruments in framework aren't consistent.
So for example:

class SupervisedRunner(Runner):
    """Runner for experiments with supervised model."""

    _experiment_fn: Callable = SupervisedExperiment

    def __init__(
        self,
        model: Model = None,
        device: Device = None,
        input_key: Any = "features",

In our API docs we currently use

.. autosummary::
   Client
   Client.call_stack
   Client.cancel
    ...

To generate a table of Client methods at the top of the page. Later on we use

.. autoclass:: Client
   :members:

to display the docstrings for all the public methods on Client (here an example for

If enter_data() is called with the same train_path twice in a row and the data itself hasn't changed, a new Dataset does not need to be created.

We should add a column which stores some kind of hash of the actual data. When a Dataset would be created, if the metadata and data hash are exactly the same as an existing Dataset, nothing should be added to the ModelHub database and the existing

This could be example that uses these supported syntax and APIs:

https://github.com/couler-proj/couler/tree/d34a690/couler/core/syntax

It seems that the number of joining clients (not the num of computing clients) is fixed in fedml_api/data_preprocessing/**/data_loader and cannot be changed except CIFAR10 datasets.

Here I mean that it seems the total clients is decided by the datasets, rather the input from run_fedavg_distributed_pytorch.sh.

https://github.com/FedML-AI/FedML/blob/3d9fda8d149c95f25ec4898e31df76f035a33b5d/fed

We should make a pass (namely after the SQL change) to ensure status attributes are consistent throughout the code and all pull from one Enum pydantic model object if possible.

In several places of the code, there are debug calls to the logger that are inside loops and/or cause expensive evaluations. As the statement is fully evaluated whether or not the log message is printed this is poor practise. The following needs to be done

Identify debug statements that are either in loops or that have expensive evaluation (so just about anything beyond a simple string)

Is your feature request related to a problem? Please describe.

Currently learningOrchestra needs a knowledge in architecture and infrastructure to deploy.

Describe the solution you'd like

There is a way to facilitate or abstract the infrastructure requirements to deploy the learningOrchestra?

Describe alternatives you've considered

Additional context

distributed-computing

Here are 188 public repositories matching this topic...

jostmey / NakedTensor

catalyst-team / catalyst

maxpumperla / elephas

dask / distributed

lensacom / sparkit-learn

uber / fiber

HDI-Project / ATM

couler-proj / couler

FedML-AI / FedML

CamDavidsonPilon / tdigest

UIUC-PPL / charm4py

selinon / selinon

AshwinRJ / Federated-Learning-PyTorch

pgiri / dispy

niklasf / fishnet

lightforever / mlcomp

dmmiller612 / sparktorch

0x00-x / wukong-agent

MolSSI / QCFractal

erdewit / distex

petuum / autodist

AsynkronIT / protoactor-python

codepr / tasq

ganga-devs / ganga

TalwalkarLab / paleo

uoguelph-mlrg / Theano-MPI

mahmoudparsian / pyspark-algorithms

pgiri / asyncoro

learningOrchestra / learningOrchestra

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

electronick1 / stairs

Improve this page

Add this topic to your repo

Essential cookies

Always active

Analytics cookies