Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 23,956 public repositories matching this topic...

adrinjalali
adrinjalali commented Nov 8, 2021

These examples take quite a long time to run, and they make our documentation CI fail quite frequently due to timeout. It'd be nice to speed the up a little bit.

To contributors: if you want to work on an example, first have a look at the example, and if you think you're comfortable working on it and have found a potential way to speed-up execution time while preserving the educational message

superset

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated Nov 4, 2021
  • Python
matthewdeng
matthewdeng commented Jan 6, 2022

Problem: Currently JsonLoggerCallback.handle_result will load in the entirety of the existing results, append the new result, and then rewrite the entire file. This may not scale when running long-running jobs or jobs with large results.

https://github.com/ray-project/ray/blob/4e8f90aca20aa7bb87a4e84039889444824382ca/python/ray/train/callbacks/logging.py#L138-L142

Potential Fix:

pytorch-lightning
dash
yurzo
yurzo commented Nov 15, 2021

For regular lists:

In [11]: list(range(50))
Out[11]: 
[0,
 1,
 2,
 3,
 4,
...
 46,
 47,
 48,
 49]

However:

In [13]: collections.UserList(range(50))
Out[13]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
jklymak
jklymak commented Jan 4, 2022

Bug summary

imshow extents cannot be expressed with units.

Code for reproduction

fig, ax = plt.subplots()
dates = np.arange("2020-01-01","2020-01-10 23:00", dtype='datetime64[h]')
ys = np.random.random(dates.size)
arr = np.random.random((10, 10))

ax.imshow(arr, extent=[dates[0], dates[1], 0, 10])

Actual outcome

Traceback (most recent call last):
  File "
gensim
nni
danieldeutsch
danieldeutsch commented Jun 2, 2021

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.