EN · · 5 min read

Gemini 3.5 Flash deployed day one, EU residency preserved

Dultra deployed Google's Gemini 3.5 Flash on the day of general availability. Migration scope, measured operational impact, and continued sovereign EU delivery.

productinfrastructureAIEU data residencyGeminienterprise LT Skaityti lietuviškai

Executive summary. On 19 May 2026, the date of Google’s general-availability announcement for Gemini 3.5 Flash, Dultra completed a feature-flagged migration of its non-live reasoning workloads to the new model across every customer organisation. Post-call evaluation latency declined from 11.2 seconds to 4.7 seconds. Per-evaluation compute cost decreased by 47%. End-to-end EU data residency was preserved, with no inference traffic egressing the European Union at any point during the rollout. The realtime audio path and the in-call coach, which run on a separately tuned lite inference profile, remained intentionally unchanged.

The platform release

Earlier today Google announced Gemini 3.5 Flash, the latest entry in its frontier model family. The headline benchmarks published with the release — Terminal-Bench 2.1 at 76.2%, GDPval-AA at 1656 Elo, MCP Atlas at 83.6%, and CharXiv Reasoning at 84.2% — place 3.5 Flash above Gemini 3.1 Pro on a range of coding and agentic evaluations, while delivering output throughput approximately four times that of comparably positioned frontier models, at less than half the inference cost. Google has designated 3.5 Flash the default model for the Gemini app, AI Mode in Google Search, and the Gemini Enterprise Agent Platform.

For Dultra, 3.5 Flash now serves as the reasoning substrate beneath every analytical and generative function performed outside the realtime audio loop.

Migration scope

The Dultra platform separates inference into two distinct paths. Each was treated independently in this release.

Realtime conversational path — unchanged by design. The live exchange between a sales representative and the AI buyer, together with the in-call coaching layer, runs on a separately tuned profile selected for sub-second response on every conversational turn. Both met their established service-level objectives prior to this release, and we made the deliberate decision not to migrate them to 3.5 Flash: doing so would not have produced a user-perceptible improvement at the cost profile required for live streaming. In-call latency was therefore identical before and after the release.

Non-live reasoning surfaces — migrated to 3.5 Flash. All analytical and generative workloads downstream of the conversation were transitioned in a single coordinated release. These include post-call evaluation and narrative summarisation, counterfactual reasoning (“what could have been said instead”), simulation authoring for administrators, regulatory-grade transcript review across compliance and misselling rule-sets, and automated coaching-plan synthesis.

Measured impact

The figures below were collected across a representative sample of customer simulation sessions and their associated post-call reasoning runs in the migration window.

MetricPrior generationGemini 3.5 FlashChange
In-call latency (live conversational path)baselineunchangedno change
Post-call evaluation latency (median, 30-minute simulation)11.2 s4.7 s−58%
Per-evaluation compute costbaseline−47%−47%
Compliance review throughput, full-organisation transcriptsbaseline~3.7×+274%
Evaluation rubric faithfulness (versus expert human, blind evaluation)81%89%+8 pts
Multi-step evaluation chains completed without interventionn/a94%new capability

Two outcomes warrant particular emphasis.

First, post-call reasoning has crossed a perceptual threshold. Where the previous generation required a representative to wait for evaluation outputs, 3.5 Flash returns the scorecard, citation set, counterfactual transcript, and compliance review before the representative typically closes the simulation interface. The post-call experience is no longer a delay; it is part of the simulation itself.

Second, the migration enables agentic evaluation that was not previously tractable in production. Evaluation has been refactored from a single-prompt response into a multi-step reasoning chain that scores each rubric section independently, cross-checks for internal consistency, and synthesises a final summary. Gemini 3.5 Flash sustains this chain without retry intervention 94% of the time. Under the prior model generation, the same workflow required substantially more orchestration overhead.

Compliance review has likewise entered a different operational regime. The throughput improvement permits realtime compliance dashboards across full-organisation transcript histories without expanding underlying capacity — a meaningful structural change for our regulated-industry customers.

EU data residency

Dultra operates exclusively within sovereign European Union infrastructure, with both primary and failover inference regions situated in the EU. Gemini 3.5 Flash availability in our designated regions preceded the migration window. No customer audio, transcript, embedding, or derived artefact transited a non-EU region during the rollout, and none does in steady-state operation.

For Dultra customers in regulated sectors — banking, insurance brokerage, real estate, healthcare-adjacent — sovereign EU delivery is a procurement precondition rather than an enhancement. We continue to operate without architectural exceptions in this regard, including during model upgrades.

Early-access participation

Dultra was admitted to the Vertex AI Gemini 3.5 private preview programme earlier this year and used the intervening period to validate the model against representative production workload profiles and a controlled cohort of opted-in customer organisations. Today’s migration executed on a rehearsed plan — phased rollout, automated regression on health-check failure, end-to-end observability — and proceeded with zero unplanned downtime and no customer-reported regressions. Further details of the early-access engagement remain confidential at our partner’s request.

Roadmap implications

Three items now in delivery, each enabled or accelerated by 3.5 Flash:

  1. Polish-language analytical layer — currently in staging, with general availability scheduled for later this month. The improved reasoning fidelity of 3.5 Flash on Polish, historically a difficult target for prior Gemini generations, completes the analytical capability required to enter the Polish enterprise market on the same operational footprint.
  2. Expanded counterfactual reasoning — the alternate-transcript output now incorporates the full buyer profile and the customer’s knowledge base, rather than a localised exchange window. Made tractable by the agentic capability of 3.5 Flash.
  3. Automated coaching-plan synthesis — sales managers will be able to request structured, multi-week coaching plans derived from a representative’s recent simulation history. Demonstrably out of reach on the prior model generation; operating within target on this one.

Commercial implications

A 47% reduction in compute cost on the second-largest line item in our infrastructure expenditure further widens an already favourable gross margin position. In combination with our existing pricing model — flat per-seat licensing with unlimited practice usage — this cost movement enables commercial mechanisms that competitors operating at standard-rate inference economics are structurally unable to offer: extended trials, outcome-based commercial guarantees, and multi-year price commitments at attractive customer-side terms.

Dultra does not intend to apply this saving as a list-price reduction. The savings will be reinvested into customer acquisition, regulated-industry certification work, and the platform-partnership programme that produced this advantage. Day-one delivery on a frontier model release is the operating tempo we maintain on behalf of our customers and our investors. We expect to maintain it on subsequent releases.

Related posts