Site Reliability Engineering Manager (IN)

Bengaluru, Karnataka, India

Fortanix logo
Fortanix
Apply now Apply later

As part of this job, you will be responsible for managing and growing an engineering team that is responsible for production reliability. Design operations as code, work with the product engineering to improve reliability, implement actionable monitoring framework and design 24/7 process for reacting to critical incidents.


Key Responsibilities

  • Manage and grow the global SRE engineering team.
  • SRE management and prioritization for multiple Fortanix products.
  • Own production upgrades, migrations, disaster recovery drills, backup/restore, securing cloud environments, logging, log analytics etc
  • Work with DevOps, Networking, Customer Success, and Development to continuously improve the production environment.
  • Design metrics to measure quality improvements in production and work with Customer Success to define SLA/SLO/SLI.
  • Own service status and incidence reporting portal.
  • Improving the on-call incident response for critical issues
  • Responding/communicating to impacted customers and providing root-cause-analysis/action plan.
  • Design tests to simulate scenarios/events before they occur.
  • Manage IAM to production system and implement auditability of access.

Requirements

Technical Experience

Experience with modern enterprise Site reliability engineering. Along with experience in the following areas

  • Advanced experience of managing software deployment on Cloud via pipelines (example: bitbucket/Gitlab) and Datacentre.
  • Understanding DevOps practices on how modern software is deployed, upgraded and monitored.
  • Experience with both managed (AKS, EKS, GKE.) and unmanaged (on-prem) Kubernetes. Especially production experiences with Kubernetes and Docker.
  • Advanced experience with Linux administration and automation.
  • Experience with high-level network infrastructure for Datacentre and Cloud.
  • Understanding security aspects of an internet-exposed SAAS service.


Key Requirements

  • Bachelors/Masters in Computer Science, Engineering or a related field.
  • Engineering: 12+ Years of engineering experience with 3+ Years of management experience with focus in Site reliability engineering.
  • Solid understanding of Cloud technologies.
  • Demonstrated ability to coordinate cross-functional work teams toward completion.
  • Demonstrated multitasking, effective leadership and analytical skills.
  • Advanced written and verbal communication skills is a must.
  • Must be a team player.

Benefits

  • Medical insurance
  • Friendly culture that brings the best out of everybody
Job region(s): Asia/Pacific
Job stats:  0  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities