Manager - Site Reliability Engineering (Devops)

Bangalore

Everbridge logo
Everbridge
Critical events happen every day that threaten safety, interrupt supply chains, and disrupt operations. Rapidly pinpoint threats and automate response.
Apply now Apply later

About the position:Everbridge is looking for a full-time Manager for its Site Reliability Team with functional knowledge in all areas of SaaS Operations and software delivery enablement with experience in the management of large-scale global applications infrastructure and software delivery. The successful candidate should have experience working in a Software-as-a-Service offering. Candidates should also have experience designing, planning, implementing, tuning, and operating software application technologies including automation code, cloud environments, micro-service architectures, and clustering technology. The right candidate shall know and follow all applicable industry best practices for the management of a global application platform. About the team:As a member of our SaaS Operations team, you will join a highly motivated group of bright, fast-paced engineers. You'll work in a cutting-edge cloud environment that will power our company's impressive growth. We are smart, innovative, and ambitious, and are looking for great people to join us.

What you'll do:

  • Lead and manage Everbridge’s high-performing site reliability team while being hands-on.  
  • Mentor, grow, and empower your team by giving them the skills, confidence, space, and motivation to make decisions independently that lead to their personal and professional success, and enable them to become technical leaders. In other words, align the best outcomes for the growth of the people around and business impact.
  • Participate in deep technical design discussions within your team, and across engineering teams, and ensure that we’re building the right systems and keeping the quality high.
  • Drive Design, Architecture, Operability, Security, and Scaling of the Everbridge Platforms
  • Help develop and maintain processes, tools, and documentation in a multi-region cloud deployment.
  • Facilitate the evaluation of automation and new software solutions.
  • Collaborate with Architects, Developers, Data Reliability, and platform teams on designing scalable and highly available systems.
  • Ensure proper security, monitoring, alerting, and reporting for application platforms.
  • Troubleshoot and resolve production issues
  • Help drive the capacity planning process

What you'll bring:

  • You have 12 + years of software support, reliability, or operations engineering experience in a highly customer-focused SaaS environment.
  • Experience in migrating from a monolith N-tier architecture to a distributed microservices architecture (event-driven vs message-driven)
  • Experience in designing for the cloud and utilizing cloud-native solutions.
  • Experience with medium-scale to large-scale UNIX/Linux production environments, preferably as part of an online service provider.
  • Strong sense of ownership of large projects and complex tasks.
  • You have production experience with multiple cloud vendors
  • You endorse infrastructure as code
  • You have a proven track record of managing diverse and distributed teams, ensuring all members can bring their best.
  • You possess strong leadership skills and the ability to motivate teams.
  • You will bring a collaborative partnership mindset, focused on business impact.
  • Ability to solve problems quickly while taking an automation-first approach.
  • Hands-on experience release, deployment, and environment lifecycle management.
  • Experience with Open Source technologies.
  • Experience with virtualization & container technologies
  • Hands-on experience with infrastructure-as-code tools and CI/CD concepts. (Preferably HashiCorp tools like Terraform/Consul/Packer/Nomad and management tools like Kubernetes/Salt)
  • Experience with more advanced automated monitoring and log aggregation systems. (NewRelic, DataDog, SumoLogic, Splunk, Logstash, etc.)
  • Experience with multi-geography and distributed systems.
  • Working knowledge of web, application, database, and OS server systems (Nginx, Tomcat, MongoDB, ElasticSearch, ZooKeeper, RabbitMQ, Redis)
  • Ability to manage competing priorities in a complex environment
Bridger Culture: 
At Everbridge, we have a mission that matters – to keep people safe and businesses running during critical events. Our “Bridgers” join Everbridge to make a positive impact on the world through their work. The core of our company culture is built around making a difference. Our people are dedicated to solving problems during difficult times and challenging situations as our software was built to save lives. We are a rapidly growing organization transforming the field of critical event management and need passionate, committed and determined individuals to help us carry out our mission. Our environment is dynamic, and our culture is constantly evolving and expanding in order to provide the best employee experience. Click here to learn more about what we do. Passionate about our mission? Want to #BeTheBridge? Apply to be a part of our team today! Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
Job region(s): Asia/Pacific
Job stats:  1  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities