Multimodal Foundation Models

Multimodal Foundation Models From Specialists to General-Purpose Assistants - Foundations and Trends¬ in Computer Graphics and Vision

Paperback (06 May 2024)

Save $11.99

  • RRP $109.14
  • $97.15
Add to basket

Includes delivery to the United States

10+ copies available online - Usually dispatched within 7 days

Publisher's Synopsis

This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants. The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics - methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics - unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs. The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.

Book information

ISBN: 9781638283362
Publisher: Now Publishers
Imprint: Now Publishers
Pub date:
Language: English
Number of pages: 230
Weight: 330g
Height: 234mm
Width: 156mm