ArgoCD, EKS, 20+ Services… and a Dev Environment That Couldn’t Keep Up.

Dec 25, 2025

🎅🏼🎄 MERRY CHRISTMAS to all my readers🎄🎅🏼

In this conversation, I spoke with the head of platform and a platform engineer at a mid-size B2B SaaS company running on AWS. They’re preparing for significant R&D growth next year and are worried their current developer environments won’t scale – both technically and in terms of developer experience.

They already have an internal solution for on-demand environments on EKS, but it’s fragile, slow, and hard to maintain. They’re now evaluating dedicated platforms for development environments to improve developer velocity.

This call was a good snapshot of where many platform teams are today: stuck between Kubernetes, GitOps, internal tools, and the messy reality of developer workflows.

What Is Data Center Redundancy? - Levels and Best Practices

Their reality: a small platform team, a growing R&D org

Team shape

Head of Platform + 3 platform engineers
Sitting between infra and product teams (72 engineers)
Mandate: prepare for a big R&D headcount increase while improving developer velocity, not slowing it

+62% growth in 1 year - +255% growth in 2 years

Environment & scale

AWS with EKS at the core
Dev environments are per developer and on demand
Each environment runs 20-30 microservices
Tech highlights:
- Kubernetes on EKS
- Traefik as API gateway
- ArgoCD for deployments
- Auth0 for authentication
- Docker Compose already in place for local / initial setups

ArgoCD in a “double role”

One interesting detail: ArgoCD is used not only for internal environments, but also as a core infra component:

Manages product environments (dev/prod)
Also manages customer integrations and customer applications outside the internal microservice ecosystem

This dual usage creates additional complexity when thinking about how to isolate environments for developers while still being able to run Argo itself inside those environments if needed.

What they’re trying to achieve

The platform team has two main goals:

Short term POC goal (by January 2026 - in 1 month): Have a “base” dev environment setup working:
- Frontend + 3 core backend services (hard dependencies)
- Traefik as gateway
- Auth0 integrated
- Running as an on-demand environment in Kubernetes
Q1 KPI: At least one full product team working entirely on the new flow:
- Spinning up environments easily
- Using per-branch / per-PR environments
- Having a smooth workflow from feature branch → running environment → validation

They’re less concerned about production right now and more focused on developer and testing environments – but they do want confidence that whatever they adopt can be extended later.

Their current setup – and where it hurts

They already built an internal solution for dev environments on EKS:

Each developer can deploy their own isolated environment
Environments are made up of ~20–30 microservices
Everything is technically “on demand”

But in practice, several pain points came up:

1. Environment deployment is slow and manual

“The main issue is the deployment of the environment. It takes time and has several manual steps developers need to do.”

Issues:

Too many manual actions to get an environment up
Success rate of environment creation is low
Platform team spends energy “keeping it alive” instead of evolving it

2. Development experience is not “live” enough

They explicitly called out the lack of a live syncing / tight feedback experience:

“The developing process is not really a live syncing process. That’s also a big issue.”

For them, a big part of developer velocity is:

How fast a developer can see code changes reflected in a running environment
How painless it is to debug and iterate

Any platform that doesn’t address this ends up feeling like “just another slow CI/CD step”.

3. Docker Compose exists - and they want to reuse it

They’ve already invested in Docker Compose definitions and want to leverage that:

Use current Docker Compose as the starting point for defining environments
Avoid manually re-defining every service in a new UI
Ideally bootstrap environment configuration directly from Compose

If a platform ignores this and forces a full redefinition, it raises the migration cost significantly.

4. The local dev vs. remote env tension

When we dug into developer experience, the head of platform steered the conversation away from UI and environment management and straight into developer workflows:

“How does the IDE experience look?”
“How does local development work with this?”
“How quickly can a developer debug their service?”

They were very clear: A polished UI for environment management is nice, but if the path from “code on my laptop” to “running in my dev environment” is slow or clunky, the platform fails.

When they realized our focus is mostly on remote Kubernetes environments (not running everything locally), they flagged this as a small red flag – not a deal breaker, but something they need to weigh against tools that lean more into local dev / live sync.

5. Time-to-environment is a critical metric

They explicitly asked:

“How much time should I expect from the moment I have new code on my local machine to the point where this code is already live on my review environment?”

This is the core metric for them:

Time from commit → new image → rollout → ready environment
Time from “I changed something” → “I can see it working (or broken) in my env”

In their current world, this can be very slow – sometimes hours in other companies I talk to. They’re aiming for minutes, end-to-end.

What they want from a platform

From this call, their wishlist for a platform looked roughly like this.

1. Remote environments on their own cloud, managed for them

Everything runs on their own AWS accounts
They don’t want to operate EKS day-to-day:
- No cluster upgrades
- No messing with TLS, ingress controllers, load balancers, etc.
But they do want escape hatches:
- Custom VPCs, subnets
- Specific instance types
- Spot / Karpenter / GPU node pools if needed

Managed control plane is fine – as long as workloads and data live in their accounts.

2. Strong CD + environment orchestration

They’re open to:

Using their existing CI (GitHub Actions, etc.)
Letting a platform handle CD and environment orchestration, including:
- Per-environment secrets and config
- Deploying both Kubernetes workloads and external services (e.g. RDS)
- Handling PR/preview environments automatically

The “holy grail” for them is:

Push code → PR opened → environment created automatically
QA / product can validate on that environment
Merge PR → environment is automatically destroyed

3. Environment templates and blueprints

They want environment templates:

A “blueprint environment” for a given product area
Then:
- Clone this blueprint per developer, per PR, or per load test
- Optionally shut down base envs and spin new ones from templates

They specifically asked if it’s possible to:

Create a “load environment” template that is normally shut down
Spin it up on demand
Use it as a base for cloning

This is becoming a common pattern: a small curated set of golden environments that everything else is derived from.

4. Flexibility with ArgoCD

Because ArgoCD is part of their infra, they need to know:

Can they run separate Argo instances inside developer namespaces?
Can applications talk to Argo instances in a more “app-level” way, not just as infra?
Can they extend their current Argo usage into this new developer environment model?

Even if this is not part of the initial POC, they want to be sure it’s possible down the road.

Patterns I see across platform teams from conversations like this

Talking to this team reinforced several patterns I see again and again.

1. Everyone already has “something” for dev environments

By the time they talk to vendors, most teams:

Already run some form of per-dev or per-branch environment
Already glued together EKS + CI + Argo + some scripts
Already suffer from:
- Drift between envs
- Manual steps and secret handling
- Low success rates when environments are recreated

You’re never replacing “nothing”. You’re competing with a fragile but familiar internal system.

2. Developer experience matters more than a beautiful UI

Platform engineers like nice dashboards, but the real success metric is:

“How quickly can a developer go from git commit to debugging their change in a realistic environment?”

If you nail:

Fast environment creation
Short feedback loops
Simple debugging

…developers will tolerate almost anything else. If you don’t, it doesn’t matter how powerful or flexible the platform is.

3. Local vs. remote is still an unresolved tension

Even teams leaning into remote dev environments still care a lot about:

Local IDE experience
Live reload / live sync
Minimal friction between local tools and remote clusters

If your platform only thinks in terms of “Kubernetes and clusters” and ignores local development, you’ll always feel slightly misaligned with what product teams actually do all day.

4. Docker Compose is the de facto spec for many teams

Before Kubernetes is fully standardized internally, Docker Compose is often:

The canonical definition of how to run the system
The artifact that new tools are expected to understand or import

If you can’t ingest or map from Compose, you’re basically asking the platform team to redo work they’ve already done. That’s a hard sell.

5. Time-to-first-PR environment is a killer POC metric

For platform tools in this space, I’m increasingly convinced the key POC metric should be:

“How long until we can spin up an environment for a real PR from a real team?”

If it takes weeks of setup to reach that point, most teams run out of patience or lose trust. If you can help them:

Connect to their cloud
Import a few services
Create a template
Spin an environment for a real feature branch

…all in a couple of days, the conversation becomes much more concrete.

6. Managed, but not opaque

Platform teams want:

Managed infrastructure (no one wants to run their own control plane if they can avoid it)
But still:
- Access to logs
- Control over networking
- Ability to tweak node pools and infra details

“Managed but inspectable” seems to be the sweet spot.

Closing thoughts

This conversation captured a situation I see a lot:

A small platform team
A growing engineering org
A homegrown dev environment solution that “kind of works” but is fragile
A desire to industrialize dev environments without making life worse for developers

If you’re in a similar situation - juggling EKS, ArgoCD, Docker Compose, and per-developer environments – you’re not alone. Many teams are trying to find the balance between:

Giving developers powerful, isolated environments
Keeping feedback loops fast enough for real productivity
Not drowning the platform team in custom scripts and maintenance

If you want to learn from what other DevOps and platform engineers are doing, feel free to connect with me and follow along with these conversations on LinkedIn or here :)

About the author

I’m Romaric, CEO and Co-founder of Qovery – our DevOps automation platform that makes DevOps and Platform teams lives easier by simplifying how you provision and manage environments on your own cloud.

Neural Foundry

Fantastic breakdown of the Docker Compose reusability challenge. The point about teams wanting to bootstrap from existing Compose files really nails somethign most platform vendors miss when they expect full service redefinition. We ran into thsi exact issue at a previos company where we tried to migrate off Compose, and that friction basically killed adoption even though the new platform had better features. The "time from commit to environment ready" metric being measuerd in minutes (not hours) is probably the most actionable KPI here.

Expand full comment

Platform Engineering Tips

Discussion about this post

Ready for more?