Senior Site Reliability Engineer

Prague

Outreach.io logo
Outreach.io
Apply now Apply later

Posted 2 days ago

The Role
The Core Infrastructure team at Outreach is responsible for the foundation on which all the other software that Outreach engineering teams build runs. That means we need to be empathetic to the needs of our co-workers in the performance of their jobs. It also means that we must be pretty focused on how our systems are performing according to our SLOs and SLIs. We have spent the last year transitioning much of our production infrastructure to run on top of Kubernetes. We are looking for someone to come help us mature that new platform, and finish transitioning the long tail of legacy systems to the new one. We also need someone to help us reshape other portions of our underlying production infrastructure as we continue to rapidly grow and scale. Outreach has grown enormously each of the last several years; we don't see any signs of stopping soon. That means we need someone analytically minded with strong attention to detail to help us identify the constraints in our system, and prioritize which ones we address first.
About the Team
Outreach continues to grow rapidly and the SRE COR (Compute, Observability and Reliability) team is composed of folks with disparate skills and backgrounds. Our unifying attribute is our desire to work together to find creative, scalable solutions to the problems we run into. In the ideal world, the software we build creates kubernetes clusters, validates performance, and can automatically migrate workloads to account for the new clusters safely. We are heavily integrated into AWS and the team is responsible for the underlying infrastructure on which Outreach runs, providing operational insights into it, and maximizing the reliability of that infrastructure. This is the infrastructure that allows us to continue to grow.
Your Daily Adventures Will Include
Our Site Reliability Engineers are usually iterating on our planned projects on a day to day basis. Our main focus is to always ensure that we spend more time improving our platform, making it more performant and scalable and also making it easier for the other software engineers to do their jobs. We are also called to assist other teams. However, we are occasionally disrupted by urgent circumstances (read: alerts). When confronted with disruptive events, we strive to codify what we've learned and feed that information back into how we plan and prioritize our work.

Basic requirements

  • Have you ever written a CI/CD pipeline for your application and then deployed it to the Kubernetes cluster? Did you enjoy it?
  • If yes to both questions, then it's very likely that you have some skills relevant to this position. We would also very much like someone who believes heavily in automating away problems (read GitOps), is heavily invested in continuing to learn and grow both as a human and in their career, and has strong verbal and written communication skills.

  • Our tech stack: In addition to Kubernetes, Docker and AWS, we use Terraform/Atlantis, Helm, CircleCI, Concourse, Prometheus/Grafana, Vault and Git. Our services and apps are written in Go and Ruby. It's awesome if you have experience with any of those things, but we are happy to help you learn. Other things that you may have experience in that are potentially relevant, but not specifically required:
  • Experience building highly available services
  • Understanding of distributed systems and their commonly associated problems
  • Cloud computing fundamentals and cloud-based networking, preferably in AWS
  • Understanding of how to properly build monitoring, alerting and logging with emphasis on both the business and technical aspects of the application
  • Good Unix fundamentals
  • Elasticsearch, RabbitMQ and KafkaPerformance profiling, especially in ruby
  • We encourage you to apply, even if you think the position sounds a bit outside your wheelhouse.
  • We want to find folks who are interested in learning and developing on the job, even if you can walk in the door and do amazing things.

  • We are ideally looking for SREs with 4+ years dedicated experience.
Why You Will Love it Here
• Highly competitive salary• Amazing working space with a running track on its roof • Flexible time off and 5 weeks of vacation • 4% employer supplemental pension monthly contribution• 5.000 CZK monthly allowance for meal vouchers, flexipasses and other personal expenses• 16 weeks of annual top up maternity leave pay or 12 weeks of fully paid paternity leave • Opportunity to be part of company success via stock options program • Company-organized and personal paid volunteer days to support the community that supports us• Diversity and inclusion programs that promote employee resource groups like OWN (Outreach Women's Network)• Employee referral bonuses to encourage the addition of great new people to the team• Fun company and team outings because we play just as hard as we work
Job tags: AWS CD CI Docker Elasticsearch Git Go Grafana Kubernetes Prometheus RabbitMQ Ruby Terraform Unix Vault