Skip to content
#

datasets

Here are 1,431 public repositories matching this topic...

datasets
trentonstrong
trentonstrong commented Mar 28, 2022

Describe the bug

When downloading this subset as of 3-28-2022 you will encounter a split size error after the dataset is extracted. The extracted dataset has roughly ~6m rows while the split expects <1m.

Upon digging a little deeper, I downloaded the raw files from https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_PC_v1_00.tsv.gz and extracted them. A line count via `wc -

bug good first issue
label-studio
omishali
omishali commented Jan 3, 2022

Describe the bug
I am trying to label Hebrew text (RTL language). When labels are attached to the text, the words of the text are mixed and not shown in their original order.

To Reproduce
Steps to reproduce the behavior:

  1. Create a project with attached dataset.json dataset.txt
  2. Choose NER template
  3. Start
bug good first issue text editor
Daremitsu1
Daremitsu1 commented Apr 1, 2022

Hello,

Doccano is not importing any text data. When importing the text data the following browser loading is going on:
image

The command line terminal is showing the following:-

<Starting server with port 8000.
WARNING:waitress.queue:Task queue depth is 1
WARNING:waitress.queue:
bug good first issue
AbhinavTuli
AbhinavTuli commented Mar 22, 2022

🚨🚨 Feature Request

  • A new implementation (Improvement, Extension)

Is your feature request related to a problem?

Currently, if a user tries to access an index that is larger than the dataset length or tensor length, an internal error is thrown which is not easy to understand.

Description of the possible solution

We can catch the error and throw a more descriptive e

enhancement good first issue
tiphaineruy
tiphaineruy commented Oct 11, 2021

Note sure if it could be interesting but:

When registering a table:

addr: 0.0.0.0:8084
tables:
  - name: "example"
    uri: "/data/"
    option:
      format: "parquet"
      use_memory_table: false

add in options:
glob

pattern: "file_typev1*.parquet"

or regexp

pattern: "\wfile_type\wv1\w*.parquet"

It would allow selecting in uri's with different exte

enhancement good first issue help wanted

Improve this page

Add a description, image, and links to the datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the datasets topic, visit your repo's landing page and select "manage topics."

Learn more