Skip to content
#

datasets

Here are 1,433 public repositories matching this topic...

datasets
dlwh
dlwh commented Mar 16, 2022

Describe the bug

Streaming Datasets can't be pickled, so any interaction between them and multiprocessing results in a crash.

Steps to reproduce the bug

import transformers
from transformers import Trainer, AutoModelForCausalLM, TrainingArguments
import datasets

ds = datasets.load_dataset('oscar', "unshuffled_deduplicated_en", split='train', streaming=True).with_format("
bug good first issue
label-studio
omishali
omishali commented Jan 3, 2022

Describe the bug
I am trying to label Hebrew text (RTL language). When labels are attached to the text, the words of the text are mixed and not shown in their original order.

To Reproduce
Steps to reproduce the behavior:

  1. Create a project with attached dataset.json dataset.txt
  2. Choose NER template
  3. Start
bug good first issue text editor
Daremitsu1
Daremitsu1 commented Apr 1, 2022

Hello,

Doccano is not importing any text data. When importing the text data the following browser loading is going on:
image

The command line terminal is showing the following:-

<Starting server with port 8000.
WARNING:waitress.queue:Task queue depth is 1
WARNING:waitress.queue:
bug good first issue
AbhinavTuli
AbhinavTuli commented Mar 22, 2022

🚨🚨 Feature Request

  • A new implementation (Improvement, Extension)

Is your feature request related to a problem?

Currently, if a user tries to access an index that is larger than the dataset length or tensor length, an internal error is thrown which is not easy to understand.

Description of the possible solution

We can catch the error and throw a more descriptive e

enhancement good first issue
tiphaineruy
tiphaineruy commented Oct 11, 2021

Note sure if it could be interesting but:

When registering a table:

addr: 0.0.0.0:8084
tables:
  - name: "example"
    uri: "/data/"
    option:
      format: "parquet"
      use_memory_table: false

add in options:
glob

pattern: "file_typev1*.parquet"

or regexp

pattern: "\wfile_type\wv1\w*.parquet"

It would allow selecting in uri's with different exte

enhancement good first issue help wanted

Improve this page

Add a description, image, and links to the datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the datasets topic, visit your repo's landing page and select "manage topics."

Learn more