PRASHANT BHARADWAJ

Lead DevOps Engineer @ Games24x7 | Observability, Kubernetes, AWS

SUMMARY

DevOps Engineer with 9+ years of experience spanning DevOps, SRE, and Cloud Infrastructure, specializing in Observability, Kubernetes, and AWS. Proven track record in incident management, migration of legacy systems to modern architectures, and scaling observability systems beyond 20M cardinality.

EXPERIENCE

SDE II -> SDE III, DevOps, Games24x7

01/2022 - Present

Bangalore, India

Observability:

  • Built and scaled a federated observability stack with 20M+ metric cardinality. Defined the observability charter and led its execution with the team.
  • Platformized alerts and recording rules creation via helm values, empowering dev teams to own and manage it independently for 200+ services across 5 verticals.
  • Reduced log ingestion volume from ~50TB to ~25TB per day by optimizing agent config and collaborating with development teams.
  • Migrated 1000+ PagerDuty Event Rules (Deprecated) to Event Orchestrator via Infrastructure-as-Code.
  • Templatised Grafana dashboards to reduce operation toil and create consistent experience across environment.
  • Centralized observability for legacy applications by creating dynamic logging config.

EKS/Kubernetes:

  • Created EKS migration SOP for the team to follow, while segregating apps from one shared cluster to vertical specific dedicated cluster.
  • Capacity planning for networking and appropriate subnet sizing for each vertical for EKS clusters of each vertical.
  • Designed and Maintained EKS clusters of 500+ nodes via laC for different verticals of Games24x7.
  • Enabled canary relases via Argo Rollouts for ~50 services.

Cost Efficiency:

  • Designed an alternate logging solution for AWS lambda, reducing cost by 90%, by shipping logs to last9 via mounted EFS volume.
  • Optimized AWS costs by leading migration to Graviton instances, migrated my11circle apps to Graviton, helped steer towards 30% graviton adoption.

SRE, Ethos Life

09/2023-03/2024

Bangalore, India

  • Worked on setting up foundation for multi region active passive DR capabilities.
  • Built a pipeline to detect any changes made via AWS console (UI) and raise an alert to security team to enfore laC.
  • Operational work and on-call via terraform, github actions, atlantis.

Product Solution Engineer II, Flipkart

02/2020-12/2021

Bangalore, India

  • End to end migration of Data Reporting services to Docker and kubernetes, used helm package manager for easier k8s yaml management and easier deployments.
  • Maintained Reporting Services written in Java - added small features and fixed some bugs.
  • Migrated our CI/CD pipelines from Jenkins to in-house SOX compliant CI/CD tool.
  • Enabled CPU based auto scaling for reporting services and tuned the apps to utilise the CPU effectively.
  • As a part of SOX compliance track, worked on moving away the credentials from the repositories to a central service and used that as a secret via mutated k8s secrets.
  • Worked on adding authentication and authorization support in reporting apps.
  • Worked with team to migrate one of the data pipeline from hive to spark using an in house Framework written in Java.
  • Worked on performance testing of the Data Apps and helped the team to optimise the performance of the App.

Site Reliability Engineer, Soroco

12/2017-01/2020

Bangalore, India

  • Writing gitlab-ci.yml to acheive continuous integration.
  • Containerization of different application including flask, django using Docker Setting up Monitoring Infrastructure using prometheus and grafana.
  • Setting up pypi server, email server and a forum using discourse.
  • Automating various activities using scripting languages like bash and python.

Test Engineer, Infosys

05/2015-12/2017

Bangalore, India

  • Worked for a telecom client to understand requirement and validate the functionality using templatized automation framework.
  • Took training for OOPs using Java, HTML, CSS, JS, Data Warehousing using Informatica.

EDUCATION

B.Tech in Electronics and Telecommunications Engineering, KIIT University

07/2011-05/2015

SKILLS

Category Skills
Observability Prometheus, Grafana, Open Telemetry, Sumologic, Signoz, PagerDuty, Alertmanager, Clickhouse, Cloudwatch
Container Orchestration - Proficient Docker, Kubernetes, Karpenter, Helm
Infrastructure as Code Terraform, Ansible
Cloud Provider AWS - EC2, EKS, S3, VPC, Route53, IAM, Cloudwatch
CI/CD Jenkins, ArgoCD, ArgoRollouts, Github Actions, Gitlab-ci, Atlantis
Tools/Programming/SRE practices Python, kafka, Linux, Shell scripting, Golden Signal, SLI/SLO/SLA

LANGUAGES