An open source implementation of CLIP.
-
Updated
Jul 6, 2023 - Jupyter Notebook
An open source implementation of CLIP.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
A concise but complete implementation of CLIP with various experimental improvements from recent papers
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation
A python tool to perform deep learning experiments on multimodal remote sensing data.
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering
Japanese CLIP by rinna Co., Ltd.
A curated list of vision-and-language pre-training (VLP). :-)
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
NeRCo: Implicit Neural Representation for Cooperative Low-light Image Enhancement. The official code is coming soon!
MMEA: Entity Alignment for Multi-Modal Knowledge Graphs
Multi-model analysis of sentiment and emotion in multi-speaker conversations.
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Add a description, image, and links to the multi-modal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal-learning topic, visit your repo's landing page and select "manage topics."