Senior Site Reliability Engineer

United States

Full Time Senior-level / Expert
Everbridge logo
Everbridge
Apply now Apply later

*This role can be based anywhere in the United States.
Are you motivated by an incredible sense of purpose in doing work that helps keep people safe and business running daily, with results that regularly make headlines? Are you passionate about innovating on the industry’s cutting edge to develop solid architecture principles, operability guidelines, progressive scaling methodologies, and other sophisticated techniques to reliably operate critical technology infrastructure at scale? Do you have an insatiable appetite for streamlining out inefficiency, automating away toil, and proactively eliminating problems before they occur in the first place? If so, this position is a perfect opportunity for you to join the Everbridge Site Reliability Engineering team in a hands-on role driving the design, implementation, and operation of our global platforms.
As the Everbridge Site Reliability Engineering team, we are responsible for ensuring overall service quality and availability of Everbridge's solutions. The technology platforms that we support automate the international delivery of critical information to help keep people safe and businesses running.
We are a 24x7x365 distributed team that can do our job anytime, anywhere on the planet with an Internet connection. Our holistic understanding of OSI layers 0 through 8 allows us to effectively maintain a heterogeneous blend of worldwide public and private cloud services where lives and livelihoods are at stake in the event of failures. We are dedicated, passionate people who are committed to internal/external customer service and doing the right thing.

What you'll do:

  • Keep people safe and businesses running.
  • Own operational availability, security, scalability, efficiency, monitoring, instrumentation, and overall service reliability of Everbridge's solutions.
  • Collaborate across Agile teams with Architects, Developers, Quality, Data, Security, and other Operations engineers on designing and implementing highly reliable solutions.
  • Embrace Site Reliability Engineering principles of proactivity, automation, cross-functional collaboration, data-driven decision making, and fast+safe failing to continually improve our technology and culture.
  • Enhance our infrastructure, tooling, and processes to extend operability as a self-service function for other groups in the engineering value stream.
  • Participate in a rotating on-call schedule to troubleshoot and resolve production escalations from our 24x7x365 NOC.
  • Have fun while we work hard to make a difference.

What you'll bring:

  • Previous experience contributing in a production Site Reliability, DevOps, SaaS/Technical Operations, or NOC environment
  • Dedicated commitment to technical excellence and quality customer service
  • Ability to write code in at least one programming language (e.g. Python, Perl, Java, Ruby, Go)
  • Comfort using Git for practical configuration data and code management
  • Expertise with cloud compute IaaS/abstracted PaaS solutions (AWS Solutions Architect or equivalent) and hybrid/on-premises private compute environments (VMware Certified Professional or equivalent)
  • Deep knowledge in one of these disciplines forms the central pillar of your T-shaped skill set:
  • Network architecture and operation with an emphasis on: application load balancing at local and global scales (ALB/ELB/Route 53), IPv4 routing and dynamic routing protocols (OSPF, BGP), VPN, and network security best practices
  • Automation framework orchestration, configuration management, and software-defined infrastructure management techniques (SaltStack preferred, others e.g. Puppet, Chef, Ansible, etc. also acceptable)
  • Large scale production UNIX/Linux operating system, application, and security maintenance in an online service provider environment (Ubuntu and Debian GNU/Linux preferred)
Bridger Culture: 
At Everbridge, we have a mission that matters – to keep people safe and businesses running during critical events. Our “Bridgers” join Everbridge to make a positive impact on the world through their work. The core of our company culture is built around making a difference. Our people are dedicated to solving problems during difficult times and challenging situations as our software was built to save lives. We are a rapidly growing organization transforming the field of critical event management and need passionate, committed and determined individuals to help us carry out our mission. Our environment is dynamic, and our culture is constantly evolving and expanding in order to provide the best employee experience. Click here to learn more about what we do. Passionate about our mission? Want to #BeTheBridge? Apply to be a part of our team today! Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
Job region(s): North America
Job stats:  3  1  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities