Google DeepMind has introduced Gemini Embedding 2, a new model designed to help machines understand different types of data in a more connected way. The system can convert text, images, videos, audio and documents into a shared numerical format, making it easier for AI systems to compare and analyse information across formats.
Developers can access the model through the Gemini API and Vertex AI, according to the company. It also works with several popular AI development tools such as LangChain, LlamaIndex, Haystack, Weaviate, Qdrant and ChromaDB, allowing teams to integrate it into existing workflows.
Embedding models play an important role in modern AI systems. They transform raw data into numerical vectors that help algorithms identify patterns and relationships between pieces of information. With Gemini Embedding 2, different media types can be placed in the same vector space, which allows systems to search and analyse them together.
Google says the model can capture meaning across more than 100 languages. By creating a shared representation of information, developers can build applications that perform tasks such as semantic search, classification and retrieval across multiple forms of content.
The model also supports interleaved inputs, meaning developers can send combinations like an image with accompanying text in a single request. This helps simplify workflows when applications need to process several types of information at once.
In terms of capacity, Gemini Embedding 2 can handle up to 8,192 tokens of text, process six images per request, analyse videos of up to 120 seconds, and work with native audio inputs. It can also process PDF files of up to six pages.
Another feature included in the model is Matryoshka Representation Learning, which allows the size of embeddings to be adjusted depending on needs. While the default dimension is 3,072, developers can scale it down to 1,536 or 768, helping balance performance with storage efficiency.
Google said the new model performs better than earlier embedding systems across several benchmarks involving text, images and video. It also introduces support for speech data.
Legal technology company Everlaw is among the early users testing the system. The firm is using the model to analyse large volumes of litigation-related data during the legal discovery process.
Max Christoff, chief technology officer at Everlaw, said the model has improved the accuracy of searches across large datasets. He added that the technology also makes it easier to locate relevant images and videos within legal records.
Also Read: Google Introduces Nano Banana 2 Offering Faster, Pro-Quality Image Generation








