Senior SRE, Infrastructure group

London, Tel Aviv, or Remote

Full Time Senior-level / Expert
Snyk logo
Snyk
Apply now Apply later

Posted 1 week ago

We are looking for an experienced SRE to join us on our mission to create products and services that make our customers happy and secure. Reliability is right at the heart of enabling that.

As a Senior SRE, you will be responsible for building a culture of high standards around reliability and resilience practices, and paving the road for our observability capabilities at Snyk.

You’ll spend your time:

  • Paving the road for internal adoption of our observability, monitoring and logging capabilities that span over multiple clouds and deployment options.
  • Educating our Engineering and Product Management functions about SRE and how to to achieve their reliability goals.
  • Identifying and driving the build-out of new observability capabilities where gaps exist for our R&D teams.
  • Consulting with R&D teams to determine a sensible set of SLIs, overarching SLOs, reasonable error budgets, and the dashboards to monitor against them
  • Working with teams to ensure that monitoring and alerting are instrumented to be customer impact focused. The goal is that no one should get out of bed at 3am for non-customer facing issues.
  • Working with vendors to leverage hosted tools into our overall observability stack. Currently, we’re working with both hosted and self-hosted open source tools. We use Prometheus, Grafana, Kibana, and Jaeger alongside other Kubernetes utilities and cloud facilities.
  • Working very closely with Infra Group’s Cloud Platforms team to ensure that our practices and capabilities are adopted as far and wide throughout the organisation as possible.
  • Developing a framework for assessing Production Readiness, and working with R&D teams to meet this bar.

You should apply if you:

  • Believe that DevOps and SRE are part of the company culture, not a role.
  • Have hands-on experience as a Site Reliability Engineer working in company cultures that encouraged a strong team ownership model (“you build it, you run it”).
  • Are willing to inspect and challenge the status quo. You aim to discover fresh ideas on how appropriate reliability is achieved, adopted at scale, and how it drives organisation-wide culture change.
  • Have experience with reducing toil required by internal teams through building and maintaining automation and observability tooling.
  • Communicate proactively, enjoy consulting directly with individuals and teams across multiple disciplines to achieve their observability and reliability goals.
  • Have seen good (and bad) infrastructure, and have opinions about what works and what doesn’t.
  • Are hands-on, curious and love to explore new domains, technologies, and approaches.
  • Are agile and enjoy the speed of a fast-paced, highly engaged environment.

We’d especially love to hear from you if you:

  • Have designed and instrumented distributed tracing telemetry and visualizations across big, complex codebases to drive insights and outcomes.
  • Have experience running and operating software on Kubernetes.
  • Have experience with running Prometheus and Grafana for multiple clusters and environments.

About Snyk

We’re on a mission to make the world a safer place with more secure software.

We’re living in a world of digital transformation that is turning ever more industries into a software-development industry.  Cyber security is taking centre stage for many companies, and demand for Snyk’s product is sky-rocketing!  

Snyk has already been adopted by over 2.2M developers, including multiple leading enterprise customers such as Google, Salesforce and Intuit, who are using Snyk to find and fix vulnerabilities in their open source libraries and container images, empowering them to develop secure software, faster.

In March of 2021, we raised $300M in Series E funding at a $4.7 billion company valuation, just after securing an additional $200M in Series D funding in September 2020, and successfully closing two strategic acquisitions. On top of that, we doubled the size of our global team, and we’re not stopping there!  

We believe open source software is a force for good, and we’re building Snyk to make it easier for developers who aren’t security experts to stay secure.  Join us!

#LI-TO1
#LI-Remote

Job tags: Grafana Kubernetes Open source Prometheus Vulnerabilities
Job region(s): Middle East Remote/Anywhere
Job stats:  6  2  0
  • Share this job via
  • or