Skip to content

baaivision/Emu

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
July 12, 2023 08:35
July 12, 2023 08:35
July 12, 2023 08:35

Emu is a Large Multimodal Model (LMM) trained with a unified autoregressive objective, i.e., predict-the-next-element, including both visual embeddings and textual tokens. Trained under this objective, Emu can serve as a generalist interface for both image-to-text and text-to-image tasks.

Generalist Interface

Emu serves as a generalist interface capable of diverse multimodal tasks, such as image captioning, image/video question answering, and text-to-image generation, together with new abilities like in-context text and image generation, and image blending:

Setup

Clone this repository and install required packages:

git clone https://github.com/baaivision/Emu
cd Emu

pip install -r requirements.txt

Model Weights

We release the pretrained and instruction-tuned weights of Emu. Our weights are subject to LLaMA's license.

Model name Weight
Emu πŸ€— HF link (27GB)
Emu-I πŸ€— HF link (27GB)

Inference

At present, we provide inference code that can process interleaved image-text as input, and output text. This includes image-to-text tasks such as image captioning and visual question answering:

python inference.py --instruct --ckpt-path $Instruct_CKPT_PATH

Schedule

We are commited to open-sourcing all Emu related materials, including:

  • The weights of Emu and Emu-I
  • Inference example for interleaved image-text as input, text as output
  • Video inference example
  • The weights of image decoder & image generation/blending example
  • The YT-Storyboard-1B pretraining data
  • The pretraining code
  • The instruction tuning code
  • The evaluation code

We hope to foster the growth of our community through open-sourcing and promoting collaborationπŸ‘¬. Let's step towards multimodal intelligence together🍻.

Acknowledgement

We thank the great work from LLaMA, BLIP-2, Stable Diffusion, and FastChat.

Citation

If you find Emu useful for your research and applications, please consider starring this repository and citing:

@article{Emu,
  title={Generative Pretraining in Multimodality},
  author={Sun, Quan and Yu, Qiying and Cui, Yufeng and Zhang, Fan and Zhang, Xiaosong and Wang, Yueze and Gao, Hongcheng and Liu, Jingjing and Huang, Tiejun and Wang, Xinlong},
  publisher={arXiv:2307.05222},
  year={2023},
}

Misc

Stargazers repo roster for @baaivision/Emu

Forkers repo roster for @baaivision/Emu

Star History Chart

About

Emu: An Open Multimodal Generalist

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages