Site Reliability Engineer (12 month Fixed Term Contract)

Hatfield, Hertfordshire, UK

Ocado Technology logo
Ocado Technology
Apply now Apply later

Posted 3 weeks ago

“We are on a mission to transform the future of grocery retail through sustained technology innovation.”

Ocado Technology is putting the world’s retailers online using the cloud, robotics, AI, and IoT. We develop the innovative software and systems that power, the world’s largest online-only grocery retailer as well as the global ‘Ocado Smart Platform’. With everything from websites to fully autonomous warehouse that we design in-house, our employees need to be specialists in a wide range of technologies to help drive our business.

We champion a value-led culture to get our teams working at their very best and to help create a collaborative working environment that our people love. Core values of Trust, Autonomy, Craftsmanship, Collaboration and Learn Fast help drive our innovative culture.  But don’t just take our word for it, have a look at what our people are saying about us on Glassdoor.

 What does the Cloud Platform team do?

The Cloud Platform teams within the Private & Edge Cloud department, provision and maintain more than 15 kubernetes environments, have a very large portfolio, and is responsible for maintaining multiple UK and international CFC as well as supporting commissioning of new ones. To satisfy the recent reorganisation and meet all the company’s objectives, the team needs to reduce the portfolio (simplify the current solution, use fewer tools, etc), and clearly grow in size. The team has recently split into two teams with 5 engineers each to allow engineers to focus on different equally important priorities

The mission of the team is to deliver an ever more reliable and scalable on-premise ecosystem for low-latency services, automation and edge devices that is fully adapted to the complete gamut of CFC sizes and geographies (UK included), while ensuring smooth transition to OCEngO cloud platform and providing uninterrupted service for existing sites

 What will you be doing...

  • Developing and maintaining our kubernetes platform and associated tooling focusing on reliability.
  • Managing the deployment pipelines and repositories to deploy into multiple remote sites and public cloud environments.
  • Data driven approach to tackling TOIL and product planning. 
  • Continuous improvement of our observability solutions to improve MTBF and MTTR. 
  • Conducting Post Mortem meetings to ensure we identify root causes and capture remedial actions.
  • Be an automator - We’re using cutting edge technology to facilitate repeatability. We practise continuous integration and are working towards continuous deployment
  • Be a collaborator - You’ll be expected to forge deep bonds with our stakeholders and supporting teams in order to truly understand their needs. We work in an Agile environment.
  • Be a teacher - Be generous with your time and expertise to continue the development of our world-class engineering team.
  • Be an agilist - Actively contributing to the process of continual improvement, with regard to self, team and systems.
  • Supporting production systems as required outside of standard working hours and participating in 24x7 on-call rota.

Please note that we are looking for a team of people to work on a Fixed Term Contract.

We’d like to talk to you if you have:

  • Demonstrable hands-on experience of operating (and troubleshooting) kubernetes clusters in production environments 
  • Demonstrable hands-on experience working with Docker containers
  • Strong demonstrable programming skills with scripting languages (eg. bash, Python, Go, etc.)
  • Demonstrable hands-on experience operating (and troubleshooting) Linux clusters, core utils, Linux kernel, storage, networking, etc
  • At ease troubleshooting in complex environments using monitoring and logging tools

We also strongly like the following:

  • Demonstrable hands-on experience using git or similar revision control systems
  • Demonstrable hands-on experience operating (and troubleshooting) Istio
  • The inclination and ambition to “Automate Everything”
  • Strong knowledge of Linux networking - iptables, interfaces, routing etc
  • Strong sense of collaboration both within the team and across other infrastructure and development teams.
  • Comfortable adapting to change and learning quickly in a fast-paced environment
  • A passion for developer experience and helping people find their way with infrastructure

Kudos For: 

  • Public cloud exposure: Google / AWS
  • Some experience with TDD, design patterns and SOLID principles
  • Hands-on experience of build pipelines and software lifecycles
  • Used and have knowledge of common build tools, repositories and CD/CI tooling.
  • Experience using monitoring and alerting tools (e.g. NewRelic, Prometheus, Grafana)
  • Proven commercial development experience
  • Knowledge of SCRUM or other Agile methodologies

What we offer you...

Our employee benefits are designed for you, we care about people and we’ve ensured we have a wealth of benefits that focus on your well-being. Within our flexible environment we can offer technically stretching work, a competitive salary and share schemes. Benefits include pension scheme, train season ticket loan (interest-free), free shuttle bus from Hatfield train station and of course, healthy Ocado retail staff discounts. 

We also have regular divisional socials, sports clubs not to mention the Ocado Technology Academy for a packed schedule of courses, conferences and events such as discussion sessions, conference briefs and external guest speakers. If you think you have what it takes to make a difference, please submit your application below.

Due to the energising nature of Ocado's business, vacancy close dates, when stated, are indicative and may be subject to change so please apply as soon as possible to avoid disappointment. 

Please note: If you have applied and been rejected for this role in the last 6 months, or applied and been rejected for a role with a similar skill set, we will not re-evaluate you for this position. After 6 months, we will treat your application as a new one. 

Be bold, be unique, be brilliant, be you. We are looking for individuality and we value diversity above gender, sexual orientation, race, nationality, ethnicity, religion, age, disability or union participation. We are an equal opportunities employer and we are committed to treating all applicants and employees fairly and equally.

Job tags: AWS Bash CD CI Docker Git Go Grafana Kubernetes Linux Prometheus Python TDD
Share this job: