multi-modal-learning

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

vqa awesome-list multi-modal multi-modal-learning attention-networks

Updated Jul 6, 2023

lucidrains / x-clip

Sponsor

Star

A concise but complete implementation of CLIP with various experimental improvements from recent papers

deep-learning artificial-intelligence zero-shot-learning multi-modal-learning contrastive-learning

Updated Jun 29, 2023
Python

moabarar / nemar

Star

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

deep-learning cnn pytorch multi-modal image-registration affine-transformation stn image-to-image-translation multimodal deformable-transformation multi-modal-learning cvpr2020 registartion multimodal-image-registration

Updated Aug 2, 2020
Python

GuanRunwei / Achelous

Star

Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

object-detection object-tracking semantic-segmentation multi-task-learning point-cloud-segmentation multi-modal-learning multi-modal-fusion panoptic-perception 4d-mmwave-radar

Updated Jun 14, 2023
Python

qizekun / ReCon

Star

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

representation-learning 3d-point-clouds self-supervised-learning multi-modal-learning

Updated Jun 20, 2023
Python

josedolz / HyperDenseNet_pytorch

Star

Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation

deep-learning pytorch segmentation image-segmentation medical-image-processing 3d-convolutional-network 3d-cnn pytorch-cnn medical-image-segmentation hyperdensenet multi-modal-imaging multi-modal-learning

Updated Nov 20, 2019
Python

likyoo / Multimodal-Remote-Sensing-Toolkit

Star

A python tool to perform deep learning experiments on multimodal remote sensing data.

python pytorch remote-sensing multi-modal-learning

Updated Jan 23, 2022
Python

rentainhe / TRAR-VQA

Star

[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering

visualization pytorch transformer attention official multi-modal clevr visual-question-answering vision-and-language dynamic-network multi-modality multi-modal-learning multi-scale-features vqav2 iccv2021 local-and-global

Updated Oct 11, 2021
Python

rinnakk / japanese-clip

Star

Japanese CLIP by rinna Co., Ltd.

japanese vision pretrained-models language-model clip multi-modal-learning cloob

Updated Jul 19, 2022
Python

zhjohnchan / awesome-vision-and-language-pretraining

Star

A curated list of vision-and-language pre-training (VLP). :-)

pre-training multi-modal-learning vision-and-language-pre-training

Updated Jul 6, 2022

YuanGongND / uavm

Star

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

computer-vision audio-classification multi-modal-learning

Updated Apr 20, 2023
Python

RL4M / MRM-pytorch

Star

An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)

chest-xray-images representation-learning self-supervised-learning pre-trained-model multi-modal-learning

Updated Feb 21, 2023
Python

Ysz2022 / NeRCo

Star

NeRCo: Implicit Neural Representation for Cooperative Low-light Image Enhancement. The official code is coming soon!

low-light-image multi-modal-learning low-light-image-enhancement neural-representation

Updated Apr 8, 2023

liyichen-cly / MMEA

Star

MMEA: Entity Alignment for Multi-Modal Knowledge Graphs

knowledge-graph entity-alignment multi-modal-learning

Updated Jun 4, 2022
Python

peymanbateni / multimodal-emotion-analysis-in-conversations

Star

Multi-model analysis of sentiment and emotion in multi-speaker conversations.

deep-learning sentiment-classification emotion-recognition graph-neural-networks multi-modal-learning

Updated Jul 6, 2023
Jupyter Notebook

ttgeng233 / UnAV

Star

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

multi-modal-learning audio-visual-learning audio-visual-events

Updated Jun 1, 2023
Python

Improve this page

Add a description, image, and links to the multi-modal-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-modal-learning topic, visit your repo's landing page and select "manage topics."

Learn more