Skip to content
#

distributed-computing

Here are 188 public repositories matching this topic...

ogvalt
ogvalt commented Apr 25, 2020

Describe the bug
I found that some names agruments in framework aren't consistent.
So for example:

class SupervisedRunner(Runner):
    """Runner for experiments with supervised model."""

    _experiment_fn: Callable = SupervisedExperiment

    def __init__(
        self,
        model: Model = None,
        device: Device = None,
        input_key: Any = "features", 
      
bcyphers
bcyphers commented Jan 31, 2018

If enter_data() is called with the same train_path twice in a row and the data itself hasn't changed, a new Dataset does not need to be created.

We should add a column which stores some kind of hash of the actual data. When a Dataset would be created, if the metadata and data hash are exactly the same as an existing Dataset, nothing should be added to the ModelHub database and the existing

wizard1203
wizard1203 commented Nov 7, 2020

It seems that the number of joining clients (not the num of computing clients) is fixed in fedml_api/data_preprocessing/**/data_loader and cannot be changed except CIFAR10 datasets.

Here I mean that it seems the total clients is decided by the datasets, rather the input from run_fedavg_distributed_pytorch.sh.

https://github.com/FedML-AI/FedML/blob/3d9fda8d149c95f25ec4898e31df76f035a33b5d/fed

egede
egede commented Nov 23, 2020

In several places of the code, there are debug calls to the logger that are inside loops and/or cause expensive evaluations. As the statement is fully evaluated whether or not the log message is printed this is poor practise. The following needs to be done

  • Identify debug statements that are either in loops or that have expensive evaluation (so just about anything beyond a simple string)
learningOrchestra
riibeirogabriel
riibeirogabriel commented Oct 4, 2020

Is your feature request related to a problem? Please describe.

Currently learningOrchestra needs a knowledge in architecture and infrastructure to deploy.

Describe the solution you'd like

There is a way to facilitate or abstract the infrastructure requirements to deploy the learningOrchestra?

Describe alternatives you've considered

Additional context

Improve this page

Add a description, image, and links to the distributed-computing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-computing topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.