Satyajit Roy

Engineering Executive | Platform, SRE & AI Infrastructure

I design, scale, and lead hyperscale platforms delivering reliability, efficiency, and clarity at internet scale.

View Case Studies View Resume

20+ Years Experience

30B Daily Requests Scale

99.95%+ Reliability

55+ Engineers Leadership

Strategic Expertise

What I bring to engineering organizations

Leadership of global engineering organizations
Architecting hyperscale systems
Turning reactive engineering into proactive, high‑velocity excellence

Delivering 20-65% cloud cost reductions via FinOps
Scaling AI/ML and High-Performance infrastructure
Bridging technical risk with strategic business value

Featured Case Studies

View All Work →

Zero to Production-Grade: Rebuilding Mandolin's Entire Cloud Platform from Scratch

Engineering Leader — Mandolin

Mandolin is a healthcare AI startup ($40M Series A, Greylock), operating in hyper-growth with HIPAA/GDPR compliance requirements and enterprise healthcare tenants.

Four deployment systems, no source of truth, every service on a public endpoint no segmentation, no mTLS, static credentials scattered across codebases. No DR, no defined RTO. Huge operating cost with 45 to 1 hour per build and zero developer self-service.

Impact & Metrics

53% infra cost reduction, 67% MTTR reduction, 53 days, solo execution (while building a team from scratch)
53% reduction in infrastructure costs while simultaneously growing the resource footprint.
67% reduction in MTTR via Resolve.ai-automated incident triage.

GitOps GKE Healthcare

Read Full Case Study →

Hyperscale ML/Search Platform at Adobe

Architect & Technical Leader

Adobe's Core Search and Sensei platform serves as the intelligence layer behind flagship products, processing 30B+ daily requests.

AI/ML workloads were outgrowing the existing infrastructure, creating scaling, latency, and cost challenges.

Impact & Metrics

Multi Billion requests, GPU utilization +38%
Supported 30B+ daily API requests with >99.98% availability.
Increased GPU utilization by 38% through smarter scheduling.

AI/ML HPC Kubernetes

Read Full Case Study →

Enterprise Elasticsearch Consolidation at Adobe

Architect & Technical Leader

Adobe’s search infrastructure was fragmented across 18+ managed clusters with varying versions, driving high licensing costs and operational complexity.

Managed service lock-in and version fragmentation were creating a multi-million dollar licensing burden without the necessary operational control.

Impact & Metrics

Millions in annual savings, 30% cost reduction
Reduced annual Elasticsearch licensing costs by millions of dollars (30% net savings).
Achieved full operational control over search performance and security posture.

Elasticsearch Open Source Cost Optimization

Read Full Case Study →

Global SRE Operating Model at F5

Sr. Director of Product Engineering & Head of SRE

F5’s Distributed Cloud platform powers global multi‑cloud networking and security for enterprise customers.

Silos, inconsistent incident response, and burnout were slowing down a platform facing explosive traffic growth.

Impact & Metrics

55+ engineers, MTTR −73%
Reduced MTTR by 73% and improved incident consistency.
Lowered attrition by 10% by eliminating hero culture.

SRE Transformation Compliance

Read Full Case Study →

FedRAMP High & Zero-Trust Architecture at F5

Sr. Director of Product Engineering & Head of SRE

F5's Distributed Cloud platform required the highest levels of security to serve federal and highly regulated enterprise customers.

Achieving and sustaining high-bar compliance (FedRAMP High) while maintaining rapid feature velocity in a multi-cloud environment.

Impact & Metrics

FedRAMP High, PCI-DSS, SOC 2
Successfully achieved FedRAMP High, PCI-DSS, and SOC 2 certifications.
Accelerated feature velocity by 40% by shifting security and compliance left.

Security FedRAMP Zero-Trust

Read Full Case Study →

Platform Modernization & FinOps at Arkose Labs

Director of Engineering & SRE

Arkose Labs fights fraud at internet scale, requiring real‑time decisioning under unpredictable attack traffic.

Cloud spend was rising faster than revenue, and technical debt was slowing delivery.

Impact & Metrics

22% cloud spend reduction
Reduced cloud spend by 22% while supporting 7x transaction growth.
Maintained 99.9% SLA even during attack spikes.

FinOps Modernization eBPF

Read Full Case Study →

How I Work

Leadership & Philosophy

Systems Thinking

I approach engineering organizations as distributed systems—optimizing for flow, feedback loops, and resilience at scale.

Empowered Teams

I build high-trust cultures where engineers own their outcomes, with clear paths for growth and autonomy.

Operational Excellence

Reliability is a feature. I champion SRE principles to shift from reactive firefighting to proactive stability.

Work With Me

Open Source

View All →

Setup DevBox

Onboarding slows down when every engineer’s laptop behaves differently.

View Repo

Git Selective Ignore

Local configs and secrets often sneak into commits.

View Repo

Blogs Publisher

Cross‑posting content manually wastes time and breaks consistency.

View Repo

Writing

View All →

The Matryoshka Dolls of Modern Networking: A Technical Evolution

A layered exploration of modern networking — from packets to policy — and how it shapes cloud‑native systems.

Architecture Read Article →

And I thought I knew about DNS

A deep dive into DNS resolution, propagation, and common pitfalls in distributed environments.

Infrastructure Read Article →

Git Selective Ignore - Because Sometimes You Need to Keep Secrets from Git (But Not From Yourself)

How to use local ignores to keep secrets out of git without losing your sanity.

Tooling Read Article →

Interested in platform leadership, architecture reviews, or collaboration?

Contact Me LinkedIn