Google Research Datasets

cvss Public
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

108 CC-BY-4.0 7 0 0 Updated Aug 26, 2022
dstc8-schema-guided-dialogue Public
The Schema-Guided Dialogue Dataset

Python 412 CC-BY-SA-4.0 98 3 0 Updated Aug 23, 2022
hiertext Public
The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and paragraph level annotations.

Jupyter Notebook 115 CC-BY-SA-4.0 8 0 0 Updated Aug 17, 2022
Objectron Public
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box descri…

Jupyter Notebook 1,979 246 21 0 Updated Jul 20, 2022
clang8 Public
cLang-8 is a dataset for grammatical error correction.

Python 51 3 6 0 Updated Jul 19, 2022
maverics Public
MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering (VQA).

3 0 0 0 Updated Jul 7, 2022
wit Public
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

709 29 3 1 Updated Jun 9, 2022
informal Public
InFormal is a formality style transfer dataset for four Indic Languages. The dataset is made up of a pair of sentences and corresponding human-annotated labels identifying the more formal sentence as well the pair’s semantic similarity. This dataset can be used as an evaluation set for style transfer tasks in Indic Languages. InFormal contains s…

0 Apache-2.0 0 0 0 Updated May 24, 2022
RxR Public
Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual per…

Python 81 CC-BY-4.0 10 1 0 Updated Apr 8, 2022
TF-IDF-IIF-top100-wordlists Public
These are lists for a variety of languages containing words that are distinctive to each language.

22 3 1 0 Updated Apr 6, 2022

View all repositories

Google Research Datasets

Pinned

Repositories

People

Top languages

Most used topics