Visual Genome: A new interface for image search and retrieval

Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language. It allows for a multi-perspective study of an image, from pixel-level information like objects, to relationships that require further inference, and to even deeper cognitive tasks like question answering. It is a comprehensive dataset for training and benchmarking the next generation of computer vision models. With Visual Genome, we expect these models to develop a broader understanding of our visual world, complementing computers’ capacities to detect objects with abilities to describe those objects and explain their interactions and relationships. Visual Genome is a large formalized knowledge representation for visual understanding and a more complete set of descriptions and question answers that grounds visual concepts to language.

Explore the Visual Genome webpage

Read the paper

Other papers that have used Visual Genome so far