Even_Adder@lemmy.dbzer0.com to

Stable Diffusion@lemmy.dbzer0.comEnglish · 1 year ago

KOSMOS-G: Generating Images in Context with Multimodal Large Language Models

lemmy.dbzer0.com

7

29

KOSMOS-G: Generating Images in Context with Multimodal Large Language Models

lemmy.dbzer0.com

Even_Adder@lemmy.dbzer0.com to

Stable Diffusion@lemmy.dbzer0.comEnglish · 1 year ago

7

Abstract:

Recent advancements in text-to-image (T2I) and vision-language-to-image (VL2I) generation have made significant strides. However, the generation from generalized vision-language inputs, especially involving multiple images, remains under-explored. This paper presents KOSMOS-G, a model that leverages the advanced perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. Our approach aligns the output space of MLLM with CLIP using the textual modality as an anchor and performs compositional instruction tuning on curated data. KOSMOS-G demonstrates a unique capability of zero-shot multi-entity subject-driven generation. Notably, the score distillation instruction tuning requires no modifications to the image decoder. This allows for a seamless substitution of CLIP and effortless integration with a myriad of U-Net techniques ranging from fine-grained controls to personalized image decoder variants. We posit KOSMOS-G as an initial attempt towards the goal of “image as a foreign language in image generation.” The code can be found at https://aka.ms/Kosmos-G.

Chat

Even_Adder@lemmy.dbzer0.comOP
link
fedilink
English
arrow-up
2·
edit-2
1 year ago
PBS did a short video on this too. You might have seen it already.
- PipedLinkBot@feddit.rocksB
  link
  fedilink
  English
  arrow-up
  2·
  1 year ago
  Here is an alternative Piped link(s):
  
  PBS
  
  Piped is a privacy-respecting open-source alternative frontend to YouTube.
  
  I’m open-source; check me out at GitHub.

Stable Diffusion@lemmy.dbzer0.com

stable_diffusion@lemmy.dbzer0.com

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !stable_diffusion@lemmy.dbzer0.com

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

9 users / day
24 users / week
148 users / month
886 users / 6 months
3 local subscribers
4.31K subscribers
717 Posts
1.12K Comments
Modlog

mods:
db0@lemmy.dbzer0.com