Skip to content

Conversation

@redoctopus
Copy link
Collaborator

@redoctopus redoctopus commented Apr 30, 2020

Adding support for loading ASR datasets with tarred audio, using the WebDataset library. Includes a script to convert existing datasets to a TarredAudioToTextDataLayer-accepted format.

Signed-off-by: Jocelyn Huang <[email protected]>
…ollate_fn for non-distributed

Signed-off-by: Jocelyn Huang <[email protected]>
…DataLayer (prevent duplicate samples)

Signed-off-by: Jocelyn Huang <[email protected]>
… an already-filtered-out sample.

Signed-off-by: Jocelyn Huang <[email protected]>
…le with TarredAudioToTextDataLayer.

Signed-off-by: Jocelyn Huang <[email protected]>
@redoctopus redoctopus changed the title Webdataset [WIP] Tarred audio support in ASR data layer Apr 30, 2020
@lgtm-com
Copy link

lgtm-com bot commented Apr 30, 2020

This pull request introduces 2 alerts when merging 7f44b46 into ee8e578 - view on LGTM.com

new alerts:

  • 2 for Unused import

@redoctopus redoctopus marked this pull request as ready for review May 5, 2020 21:20
@redoctopus redoctopus changed the title [WIP] Tarred audio support in ASR data layer Tarred audio support in ASR data layer May 5, 2020
@redoctopus redoctopus merged commit d219483 into NVIDIA-NeMo:master May 6, 2020
@redoctopus redoctopus deleted the webdataset branch May 6, 2020 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants