Tag: multimodal understanding
-
Hacker News: Janus: Decoupling Visual Encoding for Multimodal Understanding and Generation
Source URL: https://github.com/deepseek-ai/Janus Source: Hacker News Title: Janus: Decoupling Visual Encoding for Multimodal Understanding and Generation Feedly Summary: Comments AI Summary and Description: Yes Summary: The text introduces Janus, a novel autoregressive framework designed for multimodal understanding and generation, addressing previous shortcomings in visual encoding. This model’s ability to manage different visual encoding pathways while…
-
Hacker News: MM1.5: Methods, Analysis and Insights from Multimodal LLM Fine-Tuning
Source URL: https://arxiv.org/abs/2409.20566 Source: Hacker News Title: MM1.5: Methods, Analysis and Insights from Multimodal LLM Fine-Tuning Feedly Summary: Comments AI Summary and Description: Yes Summary: The paper introduces MM1.5, a novel set of multimodal large language models (MLLMs) aimed at improving multimodal understanding and reasoning through enhanced training methodologies. It highlights innovative techniques in data…