We run production Kubernetes on both EKS and GKE for paying customers. Not as a lab. Not as a side project. As the thing that pages us at three in the morning.
The benchmark posts you’ll find on the rest of the internet measure pod startup time and call it a comparison. Pod startup time matters for about a week of your life. We want to talk about the rest of it: control-plane behavior under stress, IAM ergonomics, the upgrade story, and the operational tax that doesn’t show up on a quote.
Control plane behavior
GKE’s control plane is opinionated. You get less to tune, but the things Google tunes for you are the things most teams get wrong anyway. We’ve seen one GKE control plane outage in five years and it was during a documented maintenance window we should have moved.
EKS gives you more knobs and more responsibility. The control plane has gotten dramatically more reliable since 2022, but you’ll still spend afternoons reasoning about EKS Anywhere edge cases or coredns scaling that GKE handles silently.
Verdict: GKE wins on control plane unless you genuinely need the configuration surface EKS exposes. Most teams don’t.
IAM ergonomics
This is the comparison nobody makes and the one that costs you the most over time. GCP IAM is one model, evaluated everywhere, granted at one level. AWS IAM is several models stacked on top of each other, with IRSA papering over the join with Kubernetes service accounts.
IRSA works. We use it. But the failure modes are subtle, and “why can’t this pod assume that role” is a question your platform team will answer many times in an EKS shop.
Workload Identity on GKE has fewer footguns. The annotation sets it up, the binding makes it work, and the audit trail is one query.
Upgrade pain
GKE Autopilot upgrades happen and you don’t notice. Standard GKE upgrades happen on a schedule you set, with surge controls, and they work.
EKS upgrades require coordinated steps across the control plane, the data plane, and your CNI. We’ve scripted it. It’s fine. But it’s not the same as letting Google do it for you.
Verdict: GKE, decisively. This is where most teams underestimate the operational tax.
The hidden costs
Both cloud providers will quote you the same per-node price. The hidden costs are people-hours, not infrastructure dollars.
On EKS, count the time your team spends reasoning about VPC CNI behavior, IRSA, NodeGroup vs. self-managed, Karpenter configuration, and the slow drift between AWS’s recommended pattern and what actually scales for you.
On GKE, count the time you don’t spend on those things, and instead spend on the GCP-specific work — IAM hierarchies, organization policies, project structure.
In our experience: GKE costs less in operator-hours per cluster, by a noticeable margin, after the first quarter.
When to pick EKS anyway
You’re already deeply in AWS. Your data lives in S3, your queues are SQS, your CDN is CloudFront. The cross-cloud egress alone makes GKE the wrong answer.
Or: you need an EKS-specific feature — specific instance types only AWS offers, specific compliance regimes, specific marketplace integrations.
Most other times, the right answer is GKE.
When to pick GKE
You’re net-new, or you have the option to choose. You want fewer operator-hours per cluster. You’re going to use Workload Identity heavily. You like Google’s default opinions on most things.
You also want the control plane upgrade story to stop being a project.
How to make the call without burning a quarter
Pick the workload that scares you most — the one that pages, or the one that’s about to. Stand it up on both for two weeks. Run real traffic at it. Count the operator-hours.
You will know within ten days. We have done this with twelve customers in the last three years and the numbers are consistent enough that we will run the spike for you, with our own engineers, against your real traffic, and tell you the answer.
If the answer is to switch, we’ll do the migration. If the answer is to stay, we’ll tell you that too.