Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

These examples take quite a long time to run, and they make our documentation CI fail quite frequently due to timeout. It'd be nice to speed the up a little bit.

To contributors: if you want to work on an example, first have a look at the example, and if you think you're comfortable working on it and have found a potential way to speed-up execution time while preserving the educational message

Screenshot

I've added a red vertical ruler so that you see the issue

Description

As already explained in numerous issues, the use of 'Inter' font is problematic, it does not allow to align dates for instance,
and does not play nice with numbers either.

In my supe

Problem: Currently JsonLoggerCallback.handle_result will load in the entirety of the existing results, append the new result, and then rewrite the entire file. This may not scale when running long-running jobs or jobs with large results.

https://github.com/ray-project/ray/blob/4e8f90aca20aa7bb87a4e84039889444824382ca/python/ray/train/callbacks/logging.py#L138-L142

Potential Fix:

Summary

Aesthetically trivial, yet I've spotted a discrepancy with font sizes in our tooltip (front-end + back-end screenshots below).
I believe sections #1 and #2 should have the same font size?

![image](https://user-images.githubusercontent.com/27242399/139825179-4d62e3

@Borda

Proposed refactor

Rename pytorch_lightning/loggers/csv_logs.py to pytorch_lightning/loggers/csv.py

Motivation

None of the other Logger files have "logs" as a suffix in the filename, e.g. comet.py, neptune.py, tensorboard.py

cc @Borda @justusschock @awaelchli @akihironitta @rohitgr7

In recent versions (can't say from exactly when), there seems to be an off-by-one error in dcc.DatePickerRange. I set max_date_allowed = datetime.today().date(), but in the calendar, yesterday is the maximum date allowed. I see it in my apps, and it is also present in the first example on the DatePickerRange documentation page.

E

For regular lists:

In [11]: list(range(50))
Out[11]: 
[0,
 1,
 2,
 3,
 4,
...
 46,
 47,
 48,
 49]

However:

In [13]: collections.UserList(range(50))
Out[13]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

Bug summary

imshow extents cannot be expressed with units.

Code for reproduction

fig, ax = plt.subplots()
dates = np.arange("2020-01-01","2020-01-10 23:00", dtype='datetime64[h]')
ys = np.random.random(dates.size)
arr = np.random.random((10, 10))

ax.imshow(arr, extent=[dates[0], dates[1], 0, 10])

Actual outcome

Traceback (most recent call last):
  File "

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervi

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

Data Science

Here are 23,956 public repositories matching this topic...

keras-team / keras

scikit-learn / scikit-learn

apache / superset

Screenshot

Description

GokuMohandas / MadeWithML

microsoft / ML-For-Beginners

CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

donnemartin / data-science-ipython-notebooks

explosion / spaCy

eriklindernoren / ML-From-Scratch

ray-project / ray

eugeneyan / applied-ml

academic / awesome-datascience

streamlit / streamlit

Summary

PyTorchLightning / pytorch-lightning

Proposed refactor

Motivation

plotly / dash

AMAI-GmbH / AI-Expert-Roadmap

ipython / ipython

matplotlib / matplotlib

Bug summary

Code for reproduction

Actual outcome

fastai / fastbook

virgili0 / Virgilio

afshinea / stanford-cs-229-machine-learning

RaRe-Technologies / gensim

bharathgs / Awesome-pytorch-list

d2l-ai / d2l-en

microsoft / recommenders

rasbt / python-machine-learning-book

hangtwenty / dive-into-machine-learning

microsoft / nni

allenai / allennlp

qax-os / excelize

Related Topics