[Talk::Overflow #16] DevOpsDays - Tel Aviv 2025

From Shiny Tools to Sharp Edges: The State of Modern DevOps

Jan 07, 2026

DevOpsDays Tel Aviv 2025 wasn’t just another conference — it surfaced some of the hard edges teams are bumping into as cloud-native maturity matures into real-world complexity. Between GitOps quests, cloud bill puzzles, AI/ML infra, and control-plane design limits, what stands out isn’t tooling hype but engineering trade-offs we have to wrestle with today: how to build reliable platforms without drowning in overhead, how automation and agents really impact ops flows, and how we confront brittleness in distributed systems at scale.

The agenda blended classical “DevOps best practices” with emerging pressures like agentic AI, cost debugging, and control-plane bottlenecks — a reflection of where the ecosystem actually is: pragmatic, messy, and tool-agnostic at the core.

Featured Talks

Building Docker: Behind the Scenes of the Container Revolution — Solomon Hykes

Solomon revisits where containerization actually came from and why those original design decisions still matter to platform engineers. This talk stood out because it grounded today’s orchestration tooling in historical design constraints, giving practical intuition for why containers behave the way they do — critical when debugging at scale.

2. Orchestrating Autonomous Agents in DevOps: Comparing Strands Agents, Deep Agents & AutoGen — Engin Diri

A rare practitioner discussion on how autonomous agents fit into DevOps workflows. Rather than buzz, Engin focused on trade-offs in autonomy vs control, observability implications, and failure modes — essential if you’re experimenting with AI helpers in production pipelines.

3. Would You Drive a Car Without a Dashboard? — Gabriella Nir

A crisp metaphor with real engineering consequences: if you can’t see the state of your system clearly, you can’t run it safely. Gabriella unpacks how teams often overlook basic observability hygiene and what it costs — fire-fighting time, outages, and trust.

4. Building a Production-Grade AI/ML Inference Platform on Kubernetes — Liad Drori

Moving beyond notebooks: this talk dug into the nitty-gritty of deploying and scaling ML inference on Kubernetes. The engineering insights here — resource isolation, rollout strategies, observability for models — are directly transportable to real AI workloads, not just demo pipelines.

5. Launch Party’s Over. Now What? A Guide to Real-World Ops with Crossplane & ACK — Guy Menahem

A grounded look at Crossplane + AWS ACK in production — not the sales pitch. Guy walked through how to wrestle with API drift, reconciliation loops, and environment fragmentation. If you’re contemplating GitOps beyond Kubernetes manifests, this is one of the few talks that shows you what actually breaks and how to fix it.

All Other Talks:

Behind the Cloud @Twitter 1.0 — Bobby Dorlus

Operational lessons from scaling and running massive distributed systems at Twitter’s early scale.

Newsflash: There is no Quality as a Service — Niv Yungelson

Why quality can’t be outsourced — it must be designed into pipelines and checks.

We need to talk about limits — Avishai Ish-Shalom

A conceptual talk on system limits and when abstractions leak — great framing for platform edge cases.

From Duct Tape to Declarative: Playtika’s Platform Overhaul at Unicorn Scale — S. Rosenberg & S. Mashiach

How a large platform team moves from brittle scripting to declarative definitions at scale.

Chasing Ghost Traffic: Catching the Cross-AZ Culprit Killing Your Cloud Bill — Aviv Zohari

Concrete techniques to detect and fix cross-zone network cost anomalies in cloud bills.

How to Build Quality-Driven Agentic AI in Noisy Big Data Environments — Itiel Shwartz

Data hygiene and feedback loops for AI tooling — what actually moves the needle in “agentic” workflows.

Beyond Argo Events: Leveraging NATS for Scalable Webhook Management — Or Navon

Using NATS to build scalable event delivery where webhook semantics break down.

5 Serverless Patterns You Should Stop Using (And What to Do Instead) — Ran Isenberg

Opinionated yet practical look at common anti-patterns in serverless applications.

Orchestrating Autonomous Agents in DevOps: Comparing Strands Agents, Deep Agents & AutoGen — Engin Diri

How different autonomous agent models compare and where they fit in DevOps flows.

Build a Self-Service Hub in Slack — Shaked Braimok-Yosef

Step-by-step on turning Slack into a lightweight self-service automation interface for teams.

Thinking Outside the Compositions: When Control Plane’s Logic Becomes the Bottleneck — Elhay Efrat

When control plane complexity itself throttles workflows — useful patterns for platform engineering.

Disaster Recovery in the Serverless Realm — Orel Bello

Challenges and strategies around recovering serverless applications under failure.

Building AI Agents with Serverless, Strands, and MCP — Hila Fish

Integrating serverless and agent frameworks — trade-offs in latency, cost, and observability.

Dancing with Failure — The Art of Timeouts & Retries — Alon Nativ

Concrete timeout and retry patterns that improve resiliency without causing cascading backoffs.

Oops-Driven Development — Shahar Shporer

Learning fast via controlled failure feedback — tactical tips for building healthier debug loops.

Truly Cloud Native AI Agents with Kagent and Khook — Anton Weiss

Exploring cloud native agent tooling — architectural considerations and integration patterns.

Would You Drive a Car Without a Dashboard? — Gabriela Nir

Metaphor-rich discussion on why clear observability surfaces are critical to team effectiveness.

The Hidden Complexity of Time in Serverless: A 5-Minute Reality Check

How time semantics and cold starts introduce hidden complexity in serverless environments.

Adjusting Your Mirrors: Finding and Fixing Blind Spots in Your Configurations — Dor Meiri

Patterns to detect config anti-patterns and blind spots before they bite you in production.

The Anatomy of a Patch: Backporting CVEs Without Breaking Things — Benji Kalman

How to backport security fixes safely, with minimal risk to running systems.

DevOpsDays Tel Aviv made it clear that the industry is done with hype and deeply focused on failure modes, trade-offs, and operational pain — from ghost traffic and control-plane bottlenecks to real disaster recovery in serverless systems. The common thread was survival: making existing platforms reliable under real load, not chasing the next shiny tool. Observability came through as core engineering work, not an add-on, reinforcing that you can’t design resilient systems without visibility baked into architecture and workflows. AI agents were everywhere, but the message was sober: autonomy without guardrails, reliability, and human oversight just creates new failure domains. Finally, platform engineering is evolving into complexity management, where success depends less on tool choice and more on building observable, debuggable abstractions teams can actually operate.

💬 Like what you read?

You’re reading Talk::Overflow #16 DevOpsDays — the weekly digest for developers who want to stay sharp and skip the noise.
→ Browse past issues
→ Suggest a talk
→ Share it with the friends

Stay curious. Stay kind.

— Talk::Overflow

Talk::Overflow

Discussion about this post

Ready for more?