Skip to content
@google-research-datasets

Google Research Datasets

Datasets released by Google Research

Pinned repositories

  1. Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question ans…

    Python 544 103

  2. Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

    Shell 216 12

  3. ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, p…

    105 4

  4. The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the dataset includes a large collection of native script Wikipedia tex…

    99 10

  5. TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the trai…

    Python 116 24

  6. GAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of (ambiguous pronoun, antecedent name), sampled from Wikipedia for the evaluation of coreference resolution in practica…

    Python 171 68

Repositories

Top languages

Loading…

Most used topics

Loading…

You can’t perform that action at this time.