Team Lead Site Reliability (SRE) (m/f/x)
At HelloFresh, our mission is to change the way people eat - forever. From our 2011 founding in Europe’s vibrant tech hub Berlin, we’ve become the global market leader in the meal kit sector and inspire millions of energised home cooks across the globe every week.
We offer our meal kit boxes full of exciting recipes and thoughtfully sourced, fresh ingredients in more than 12 countries, operating from offices in Berlin, New York City, Sydney, Toronto, London, Amsterdam and Copenhagen and shipped out more than 250 Million meals in 2019.
Our more than 5,000 employees are the heart and soul of our highly international, fast-paced, and dynamic environment where innovation and smart, fast action is encouraged.
We want you to join us and help take HelloFresh to the next level - as a company in its growth phase this is a great time to join. Career and development opportunities are endless.
We will encourage you to make an immediate impact in your area of work as well as empower you to grow your career with us.
Our Engineering, Data, Product and Security teams are located in Berlin and New York and are critical to what we do. From procurement tools, to conversion rate optimization, live pricing tools, payment services and add-on upselling features, we work on challenging problems and have a high output of building and releasing features and engines that make our business thrive and deliver real financial impact.
You can get a taste of what we've been working on by checking out our tech blog.
About the job
You will be joining the Platform Tribe at HelloTech. Platform forms a stable and fresh environment for our talented teams of developers to thrive. As well as building a great foundation, Platform is also responsible for spreading their knowledge throughout the other tribes, they make sure everyone is taking advantage of the easy to use infrastructure, and applying the best practices when it comes to Reliability, Observability, Monitoring, Containerisation, Performance, Security etc.
- You will be responsible for the on-time delivery of your team’s projects, through creating a framework of accountability, and continuously optimizing your team’s performance by promoting a culture of excellence, where best practices and industry standards are an everyday habit
- You will coach, mentor, and further grow and develop engineers on your team.
- Build a team culture to aim for high service availability, scalability and observability goals
You will lead the team that is responsible for
- Spread of SRE knowledge and best practices in the greater organisation
- Ownership the end-to-end availability (SLO/SLA), reliability, and performance
- Ownership the solution-wide alerting strategy in the tech organisation
- Optimisation the incident management systems, policies and procedures
- Ensuring the engineering organisation has self-service observability tools, and advocate for observability best practices
- Driving positive change in MTTD, MTTR and MTBF metrics
- Guiding and educating the engineering organisation about operations and reliability
- Removing toil in infrastructure through automation
- Undertaking measured, methodical troubleshooting of complicated systems
Who we are looking for
- Solid experience as Site Reliability/Infrastructure Engineer operating a public cloud-based solution
- You have experience leading a team in an agile environment
- You have passion for SRE/DevOps and running highly resilient/automated systems
- You have experience designing highly scalable production architectures (config management, monitoring, infrastructure-as-a-code, load balancing, CDNs, distributed systems)
- You have solid experience with Cloud Infrastructure (eg. AWS, Azure), Kubernetes and the following technologies: Helm, Docker, Terraform, Graylog, Prometheus, Jaeger, Kafka, Concourse CI
- Experience in negotiation and setting SLIs, SLOs, and SLAs with product owners
- Experience using Data to diagnose and troubleshoot complex distributed systems
- Experience as a software developer (Python or Go)
- You work anywhere in the stack, from right beside the OS and up
- Solid Linux background
- Solid experience with operations: metrics/statistics, incident management, post mortems etc.
- You know what to monitor and alert when things go awry
- You are passionate about mentoring and sharing knowledge
What we offer
- Relocation assistance to move to Berlin and visa application support
- Competitive compensation
- Significant reduction on our meal kits
- Annual learning and development budget to attend conferences or purchase educational resources
- Sabbatical policy
- Work in our office located in the heart of Berlin
- A diverse and vibrant international environment
- A range of perks (Free in-house crash course in German, compensation for advanced German classes, in-house lecture series and knowledge sharing programme, discounts for our neighboring gym & Urban Sports Club, free weekly yoga classes, summer & winter parties, discount on our HelloFresh GO vending machines)
- The chance to have a significant impact on one of the fastest-growing technology companies in Europe in an exciting growth phase
Are you up for a challenge?
Please submit your complete application below including your salary expectations and earliest starting date.