With the introduction of its EmbeddingGemma, Google is offering a multilingual textual content embedding mannequin designed to run immediately on cellphones, laptops, and different edge units for mobile-first generative AI.
Unveiled September 4, EmbeddingGemma includes a 308 million parameter design that permits builders to construct purposes utilizing methods corresponding to RAG (retrieval-augmented technology) and semantic search that may run immediately on the focused {hardware}, Google defined. Based mostly on the Gemma 3 light-weight mannequin structure, EmbeddingGemma is skilled on greater than 100 languages and is sufficiently small to run on fewer than 200MB of RAM with quantization. Customizable output dimensions are featured, starting from 768 dimensions to 128 dimensions by way of Matryoshka illustration and a 2K token context window.
EmbeddingGemma empowers builders to construct on-device, versatile, privacy-centric purposes, in keeping with Google. Mannequin weights for EmbeddingGemma might be downloaded from Hugging Face, Kaggle, and Vertex AI. By working with the Gemma 3n mannequin, EmbeddingGemma can unlock new use circumstances for cellular RAG pipelines, semantic search, and extra, Google mentioned. EmbeddingGemma works with instruments corresponding to sentence-transformers, llama.cpp, MLX, Ollama, LiteRT, transformers.js, LMStudio, Weaviate, Cloudflare, LlamaIndex, and LangChain. Documentation for EmbeddingGemma might be discovered at ai.google.dev.