Loading article…
Google rolls out Gemini 3 Flash API at $0.50/1M input tokens and unveils Gemma 4 12B, an 11.95B‑parameter model that runs on a 16 GB laptop.
Google made Gemini 3 Flash the default model in its Gemini app and opened the model to developers via a preview API, pricing it at $0.50 per 1 million input tokens and $3.00 per 1 million output tokens [1]. The move follows the launch of Gemma 4 12B, an 11.95‑billion‑parameter open‑weights model that can run locally on a typical 16 GB enterprise laptop [4].
| At a glance | |
|---|---|
| Model | Gemini 3 Flash (API) |
| Price | $0.50 / 1 M input tokens; $3.00 / 1 M output tokens |
| Benchmark | 33.7% on Humanity’s Last Exam (no tools) |
| Open model | Gemma 4 12B (11.95 B params) |
Gemini 3 Flash improves on its predecessor Gemini 2.5 Flash, scoring 33.7% on the Humanity’s Last Exam benchmark—up from 11% for Gemini 2.5 Flash and just 0.8 points shy of GPT‑5.2’s 34.5% [1]. On the multimodal MMMU‑Pro benchmark it posted an 81.2% score, outpacing all listed rivals [1]. Google positions the model as a “workhorse” for bulk tasks, noting it uses 30% fewer tokens than Gemini 2.5 Pro on thinking‑type workloads [1]. The model is already integrated into products like the Gemini app, Vertex AI, and the Antigravity coding tool, and is being used by companies such as JetBrains and Figma [1].
Gemma 4 12B is released under an Apache 2.0 license, enabling free download and local deployment on consumer hardware via Hugging Face or Kaggle [4]. Its “Unified” architecture removes separate audio and vision encoders, replacing them with lightweight linear layers that cut VRAM needs to 16 GB and lower inference latency [4]. The model supports a 256 K token context window and native tool‑use, bridging the gap between edge devices and data‑center‑scale models [4]. Google also introduced Multi‑Token Prediction (MTP) drafter models for Gemma, using speculative decoding to accelerate token generation up to three‑fold compared with standard autoregressive sampling [2].
Gemini 3 Flash’s benchmark scores place it within striking distance of frontier models like Gemini 3 Pro and GPT‑5.2, while its pricing remains higher than Gemini 2.5 Flash but justified by speed gains of three‑times faster inference [1]. Gemma 4 12B’s ability to run on a laptop contrasts with larger, cloud‑only offerings from OpenAI and Anthropic, offering enterprises a privacy‑preserving alternative for multimodal tasks. The introduction of MTP drafter models further narrows the performance gap between edge and data‑center hardware [2].
Google’s dual push—commercializing a faster, cheaper Gemini 3 Flash for bulk cloud workloads while opening a highly efficient Gemma 4 model for edge deployment—highlights a strategy to dominate both enterprise AI services and the emerging local‑model market. The next test will be whether developers and enterprises adopt the new pricing and hardware‑light architecture at scale.
Coverage is mostly measured — 107 of 119 reports stay neutral.
Every Monday — the token unlocks, Fed dates & catalysts set to move crypto and markets this week. So you’re never blindsided.
Free · 3-min read · one-click unsubscribe
AI-assisted synthesis by the TrendWatcher Editorial Desk · sourced from 4 outlets · Jun 29, 2026 · How we report
Google AI's work is guided by bold innovation, responsible development and deployment, and collaborative progress, as outlined in its AI Principles.
Google AI was announced at Google I/O 2017, and in 2023 it merged Google Brain with DeepMind, elevating Jeff Dean to chief scientist.
Google AI works on large language models, generative AI, quantum processors, creative AI tools, and provides products such as Google Assistant, TensorFlow, and TPU cloud services.
Yes, Google AI provides AI skills programs, courses, and tutorials covering fundamentals, generative AI, and model development.
Google AI employs a multi‑layered governance approach with risk assessment, testing, monitoring, and safeguards to address safety, bias, and privacy concerns.