SATYAJIT ROY CHOUDHURY
| **Engineering Executive | 20+ years in US & India | Leader in Cloud, AI/ML Infra, and SRE Engineering** | Â | Â | Â |
| San Mateo, 94404 | (561) 866-3499 | talk2sroy.ch@gmail.com | Portfolio | Personal Site |
EXECUTIVE SUMMARY
I build and scale engineering organizations that turn infrastructure investment into business outcomes. Over 20 years, I have run global platform and SRE teams through rapid growth, built reliability into products that couldn’t afford to fail, and stayed close enough to the architecture to make the hard calls myself. I lead by developing the people around me, staying hands-on where it matters, and keeping complex systems honest with clear metrics and accountability.
KEY EXECUTIVE ACHIEVEMENTS
- Hyperscale Platform Leadership: Architected and operated search and AI platforms handling 30B+ daily requests at 99.98%+ availability across hybrid multi-cloud infrastructure, while increasing GPU cluster utilization by 38–40% through workload-aware scheduling.
- Growth & Reliability Enablement: Led platform and SRE strategy that absorbed 400% attack traffic growth and supported 200%+ SaaS customer expansion at F5 without proportional infrastructure cost increases while improving release velocity and incident response.
- Financial & Cost Optimization: Delivered 22–65% cloud cost reductions across three organizations through hybrid architectures, FinOps governance, and storage tiering including a 30% annual TCO reduction at F5 while accelerating feature delivery by 40%.
- Organizational & Talent Leadership: Built and led global engineering organizations up to 55+ engineers; maintained sub-10% attrition and advanced 30%+ of team members into senior or leadership roles through structured development and promotion planning.
AREAS OF EXPERTISE
Leadership & Strategy
- P&L & Budget Accountability
- Org Design & Headcount Planning
- Manager-of-Managers
- Build vs Buy Strategy
- Technical Roadmapping & OKRs
- FinOps / Cloud TCO Optimization
- Executive & Board Stakeholder Mgmt
Cloud & Platform Engineering
- Multi-Cloud (AWS, Azure, GCP)
- Kubernetes (EKS, AKS, GKE)
- GitOps (ArgoCD, Flux CD)
- IaC (Terraform, Crossplane)
- eBPF / Service Mesh (Cilium)
- Microservices & API Platforms
- CI/CD & Release Engineering
Reliability & Operations
- SLO/SLI Design & Incident Command
- MTTR Reduction (60–73%)
- Observability (OTel, Prometheus, ELK, Grafana, Jaeger)
- Chaos Engineering & Global On-call
- WAAP / WAF & Zero-Trust
- FedRAMP High · SOC 2 · PCI-DSS
- Mutual TLS & Policy-as-Code
- Runtime eBPF Monitoring
AI/ML & HPC Infrastructure
- NVIDIA GPU (V100, A100, T4)
- HPC Networking (RDMA, InfiniBand, RoCE)
- MLOps (Kubeflow, MLflow)
- Job Scheduling (Volcano)
- Inference Optimization MIG & GPU-aware Scheduling
Data & Storage Engineering
- Elasticsearch
- Object Storage Lifecycle
- High Ingestion Pipelines
- Multi-tenant Search Architecture
- Data Sovereignty & Compliance
- Sub-5ms P95 Latency Optimization
TECHNICAL EXPERIENCE
Director of Engineering and SRE at Arkose Labs – San Mateo, CA (Aug 2024 — June 2025)
High-growth fraud detection SaaS. Owned platform engineering and SRE across reliability, cloud cost, and security posture while the platform absorbed sharp enterprise-driven traffic growth.
- Re-architected the platform to EKS-based microservices with an eBPF service mesh, supporting 7x transaction growth over 10 months while holding 99.9% SLA under sustained attack traffic.
- Cut P1 incident resolution time by 58% by rebuilding observability and SOC signal quality, then leading high-severity incident command directly until the on-call team had the tools and process to own it.
- Drove migration from Cloudflare to CloudFront with Lambda@Edge, cutting edge latency by 35% for enterprise customers.
- Built and executed FinOps governance program, reducing cloud spend by 22% without limiting infrastructure capacity lowering effective cost-per-million transactions quarter over quarter.
- Established SLO-based release gates with Product, giving engineering and product a shared framework for trading feature velocity against reliability risk measurably reducing customer-impacting incidents.
- Ran build-vs-buy evaluations and vendor POCs across cloud, edge, and observability platforms selecting best-fit solutions that cut evaluation cycles and avoided long-term lock-in.
- Scaled the engineering organization from 4 to 16 engineers across two teams; managed 2 engineering managers and owned the hiring plan and team structure.
Sr. Director of Product Engineering & Head of SRE at F5 Inc – San Jose, CA (Oct 2022 — April 2024)
Full P&L accountability for Platform Engineering and Global SRE for F5’s Distributed Cloud SaaS, a globally distributed, security-critical platform spanning 25+ PoPs across multiple cloud providers.
- Oversaw architecture strategy for a 25+ global PoP platform delivering multi-cloud networking, WAAP/WAF, and edge services enabling the platform to absorb 400% growth in attack traffic while supporting 200%+ customer expansion.
- Reduced annual TCO by 30% and accelerated feature delivery by 40% through a disciplined build-vs-buy governance process that balanced internal IP development with the right vendor solutions.
- Partnered with the CISO to achieve FedRAMP High, PCI-DSS, and SOC 2 compliance — implementing policy-as-code, zero-trust architecture, mutual TLS, and runtime eBPF monitoring — unlocking regulated telecom and financial services markets.
- Cut MTTR by 73% and sustained 99.92%+ platform availability by re-architecting observability with ELK, Jaeger, and ML-based anomaly detection.
- Rebuilt the global SRE model into a follow-the-sun structure, reducing on-call burnout, cutting attrition by 10 percentage points, and improving incident response consistency across time zones.
- Translated Board and C-suite priorities into multi-quarter engineering roadmaps and OKRs, aligning Cloud, SRE, and product teams across the full F5 Distributed Cloud portfolio.
- Led an organization of 55+ engineers as manager of managers: 3 Directors and 1 Senior Manager, owning hiring plans, headcount allocation, and operating budget.
Architect/Technical Leader at Adobe Inc – San Jose, CA (Aug 2018 — Oct 2022)
Technical leader for Adobe’s Core Search and Sensei Machine Learning platform hyperscale infrastructure serving billions of daily requests across the full Search and Sensei product ecosystem.
- Built and operated a hybrid GPU/CPU infrastructure across AWS, Azure, and on-prem that handled ~30B daily API requests at 99.98%+ availability.
- Designed and optimized NVIDIA V100/A100 HPC clusters with RDMA InfiniBand, MIG, and GPU-aware scheduling (Kubeflow, Volcano), increasing cluster utilization by 38% measurably reducing compute cost for equivalent workload volume.
- Migrated from managed AWS Elasticsearch to a self-managed hybrid architecture across 18 clusters handling 10B+ documents at 6,000 writes/sec, delivering 30% licensing savings while improving operational control.
- Consolidated fragmented Kubernetes environments into 15 multi-tenant clusters using Cilium (eBPF), eliminating 90% of cluster sprawl the reference architecture was later adopted across Adobe’s broader engineering organization.
- Cut cloud storage costs by 65% through data lifecycle and tiering policies across search and ML pipelines while maintaining sub-5ms P95 latency.
- Led 12 senior platform and infrastructure engineers, setting the architectural direction for search and ML infrastructure used across Adobe’s product portfolio.
Tech Leader at Macys.com – San Francisco, CA (Nov 2016 — Aug 2018)
Technical leader for enterprise-wide CI/CD and platform modernization supporting Macy’s and Bloomingdale’s e-commerce engineering organizations.
- Designed and delivered a company-wide CI/CD platform using Jenkins pipelines, Spinnaker, and Kubernetes reducing deployment time from 2–3 days to under 1 hour and enabling 100+ tested deployments per week.
- Implemented blue-green and canary deployment strategies, enabling near-zero-downtime releases for revenue-critical e-commerce workloads during peak retail periods.
- Designed a hybrid cloud architecture across GCP (GKE), AWS (ECS/EKS), and on-prem VMware Tanzu, enabling consistent workload execution across environments.
- Introduced early GitOps workflows using Flux CD, improving deployment consistency and reducing infrastructure provisioning errors during peak retail traffic.
- Led a team of 15 engineers across platform modernization and delivery tooling for both the Macy’s and Bloomingdale’s engineering organizations.
EARLIER EXPERIENCE
- Prior Engineering Roles (2009 – 2016): Workday, Chegg, RocketFuel, Adobe, Saba Software, CMC
- International Engineering Roles (2004 – 2009): Autonomy, SDG Group, HCL Tech, vCustomer (India)
EDUCATION
-
Bachelor of Science Mahatma Gandhi Kashi Vidyapith Varanasi, UP, India June 2002 -
Leading Effective Decision-Making (certification) Yale School of Management Online October 2021