datasets

Describe the bug

When downloading this subset as of 3-28-2022 you will encounter a split size error after the dataset is extracted. The extracted dataset has roughly ~6m rows while the split expects <1m.

Upon digging a little deeper, I downloaded the raw files from https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_PC_v1_00.tsv.gz and extracted them. A line count via `wc -

Describe the bug
I am trying to label Hebrew text (RTL language). When labels are attached to the text, the words of the text are mixed and not shown in their original order.

To Reproduce
Steps to reproduce the behavior:

Create a project with attached dataset.json dataset.txt
Choose NER template
Start

Hello,

Doccano is not importing any text data. When importing the text data the following browser loading is going on:

The command line terminal is showing the following:-

<Starting server with port 8000.
WARNING:waitress.queue:Task queue depth is 1
WARNING:waitress.queue:

🚨🚨 Feature Request

A new implementation (Improvement, Extension)

Is your feature request related to a problem?

Currently, if a user tries to access an index that is larger than the dataset length or tensor length, an internal error is thrown which is not easy to understand.

Description of the possible solution

We can catch the error and throw a more descriptive e

References

https://colab.research.google.com/drive/1Oyn913zkXYB8Uf8k1hiM8_5gifuMVqta?usp=sharing

Issue to track tutorial requests:

Deep Learning with PyTorch: A 60 Minute Blitz - #69
Sentence Classification - #79

Note sure if it could be interesting but:

When registering a table:

addr: 0.0.0.0:8084
tables:
  - name: "example"
    uri: "/data/"
    option:
      format: "parquet"
      use_memory_table: false

add in options:
glob

pattern: "file_typev1*.parquet"

or regexp

pattern: "\wfile_type\wv1\w*.parquet"

It would allow selecting in uri's with different exte

What?

Currently, API manually throws its own messages and errors. We should move them to werkzeug exceptions.

datasets

Here are 1,431 public repositories matching this topic...

awesomedata / awesome-public-datasets

huggingface / datasets

Describe the bug

tonybeltramelli / pix2code

heartexlabs / label-studio

liuruoze / EasyPR

doccano / doccano

simonw / datasette

akfamily / akshare

activeloopai / Hub

🚨🚨 Feature Request

Is your feature request related to a problem?

Description of the possible solution

tensorflow / datasets

jdorfman / awesome-json-datasets

robmarkcole / satellite-image-deep-learning

CLUEbenchmark / CLUEDatasetSearch

justinzm / gopup

github / CodeSearchNet

ChineseGLUE / ChineseGLUE

jsbroks / coco-annotator

colour-science / colour

References

JuliaData / DataFrames.jl

prabhuomkar / pytorch-cpp

snap-stanford / ogb

roapi / roapi

juand-r / entity-recognition-datasets

PolyAI-LDN / conversational-datasets

iamaziz / PyDataset

midas-research / audino

What?

jim-schwoebel / voice_datasets

CUTR-at-USF / awesome-transit

logpai / loghub

explosion / projects

Improve this page

Add this topic to your repo