bug
Something isn't working
help wanted
Extra attention is needed
good first issue
Good for newcomers
triaged
A team member looked at the bug, acknowledged and triaged it. Expect a reply soon.
#
multimodal
Here are 184 public repositories matching this topic...
A curated list of Multimodal Related Research.
-
Updated
Jul 29, 2021 - Python
The data structure for unstructured data
graphql
elasticsearch
deep-learning
protobuf
sqlite
data-structures
nearest-neighbor-search
cross-modal
unstructured-data
multimodal
nested-data
weaviate
dataclass
neural-search
qdrant
docarray
-
Updated
Jul 2, 2022 - Python
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
-
Updated
Jun 27, 2022 - Python
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
-
Updated
Jun 9, 2022
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
image-captioning
visual-question-answering
multimodal
text-to-image-synthesis
vision-language
pretraining
referring-expression-comprehension
vision-and-language-pre-training
-
Updated
Jun 23, 2022 - Python
CVPR 2019: "Pluralistic Image Completion"
-
Updated
Aug 29, 2021 - Python
Platform for Situated Intelligence
streaming
framework
pipelines
artificial-intelligence
stream-processing
perception
component-library
human-robot-interaction
multimodal-interactions
multimodal
-
Updated
Jul 1, 2022 - C#
Open-AI's DALL-E for large scale training in mesh-tensorflow.
transformers
artificial-intelligence
autoregressive
text-to-image
variational-autoencoder
multimodal
-
Updated
Feb 12, 2022 - Python
Easily compute clip embeddings and build a clip retrieval system with them
-
Updated
Jun 24, 2022 - Jupyter Notebook
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
search
retrieval
ranking
clip
multimodality
multimodal-learning
multimodal
activitynet
retrieval-model
msvd
msrvtt
video-text-retrieval
lsmdc
didemo
video-clip-retrieval
-
Updated
Jun 1, 2022 - Python
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
deep-learning
artificial-intelligence
openai
image-generation
multimodality
text-to-image
diffusion
multimodal
text-to-image-synthesis
openai-clip
-
Updated
Feb 8, 2022 - Python
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
python
pytorch
classification
paddlepaddle
imagecaptioning
multimodal-learning
multimodal
crossmodal-retrieval
-
Updated
May 8, 2022 - Python
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.
-
Updated
Jun 29, 2022 - Python
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
deep-learning
transformers
artificial-intelligence
image-to-text
attention-mechanism
multimodal
contrastive-learning
-
Updated
Jun 8, 2022 - Python
Create Disco Diffusion artworks in one line
-
Updated
Jul 2, 2022 - Python
Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]
deep-learning
openai
text-to-image
diffusion
multimodal
openai-clip
text-guided-manipulation
blended-diffusion
-
Updated
Jun 14, 2022 - Jupyter Notebook
KDD Cup 2020 Challenges for Modern E-Commerce Platform: Multimodalities Recall first place
-
Updated
Jul 22, 2020 - Jupyter Notebook
FairyTailor: Multimodal Generative Framework for Storytelling
-
Updated
Apr 25, 2021 - Python
Flexible time series feature extraction & processing
python
processing
data-science
time-series
pandas
feature-extraction
multivariate
feature-engineering
multimodal
window-stride
-
Updated
Jun 24, 2022 - Python
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection(ECCV 2020)
-
Updated
Aug 25, 2020 - Python
Fusing Histology and Genomics via Deep Learning - IEEE TMI
genomics
fusion
transcriptomics
pathology
multimodal
histopathology
computational-pathogenomics
pathomic
multimodal-network
mahmoodlab
-
Updated
May 18, 2022 - Jupyter Notebook
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
tensorflow
attention
generative-adversarial-networks
inpainting
multimodal
vq-vae
autoregressive-neural-networks
-
Updated
Jul 11, 2021 - Python
Language Models Can See: Plugging Visual Controls in Text Generation
text-generation
image-captioning
unsupervised-learning
clip
zero-shot
story-generation
multimodal
gpt-2
plug-and-play-language-models
-
Updated
Jun 1, 2022 - Python
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
deep-learning
cnn
pytorch
multi-modal
image-registration
affine-transformation
stn
image-to-image-translation
multimodal
deformable-transformation
multi-modal-learning
cvpr2020
registartion
multimodal-image-registration
-
Updated
Aug 2, 2020 - Python
-
Updated
Feb 9, 2022 - Python
第五届百度西安交大大数据竞赛 城市区域功能分类 Baseline
-
Updated
Dec 16, 2021 - Jupyter Notebook
-
Updated
Oct 6, 2020
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
-
Updated
Jun 22, 2022 - Python
Improve this page
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."
File "/home/ubuntu/vqa/GMN/mmf/mmf/datasets/builders/visual_genome/dataset.py", line 44, in init
scene_graph_file = self._get_absolute_path(scene_graph_file)
AttributeError: 'VisualGenomeDataset' object has no attribute '_get_absolute_path'
Command that i run in shell
CUDA_VISIBLE_DEVICES="0" mmf_run config=projects/gmn/configs/visual_genome/defaults.yaml model=gm