Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 23,383 public repositories matching this topic...

superset

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated Nov 4, 2021
  • Python
matthewdeng
matthewdeng commented Dec 6, 2021

Problem

When checkpointing a Torch model's state_dict, there may be some inconsistencies when saving/loading depending on whether the model is wrapped in DDP.

Proposal

Provide a utility method that always fetches the non-DDP version of the state_dict.

Without DDP:

model.state_dict()

With DDP:

model.module.state_dict()

Also see

[torch.nn.modu

pytorch-lightning
awaelchli
awaelchli commented Dec 8, 2021

Proposed refactor

Deprecate tpu_global_core_rank in favor of global_rank.

Pitch

It "looks" like tpu_global_core_rank in the TPUPlugin is just another name for what is known as the global rank. After #10896 we should investigate whether the two are equivalent and revert back to using the unified global_rank property exclusively.


dash
story645
story645 commented Dec 9, 2021

As discussed in #21874, there aren't reprs on the locators and formatters. Reprs of the form where eval(repr) = call, something like

eval('AutoDateLocator(maxticks=8)') = AutoDateLocator(maxticks)

would mean reprs could be used in the documentation examples, which would help keep the labels in sync. This is useful for the new example #21874 &

gensim
danieldeutsch
danieldeutsch commented Jun 2, 2021

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

nni