Site Reliability Engineer (12 month Fixed Term Contract)

Hatfield, Hertfordshire, UK

Ocado Technology logo
Ocado Technology
Apply now Apply later

Posted 2 weeks ago

“We are on a mission to transform the future of grocery retail through sustained technology innovation.”

Ocado Technology is putting the world’s retailers online using the cloud, robotics, AI, and IoT. We develop the innovative software and systems that power, the world’s largest online-only grocery retailer as well as the global ‘Ocado Smart Platform’. With everything from websites to fully autonomous warehouse that we design in-house, our employees need to be specialists in a wide range of technologies to help drive our business.

We champion a value-led culture to get our teams working at their very best and to help create a collaborative working environment that our people love. Core values of Trust, Autonomy, Craftsmanship, Collaboration and Learn Fast help drive our innovative culture.  But don’t just take our word for it, have a look at what our people are saying about us on Glassdoor.

About our team...

The Cloud Platform teams within the Private & Edge Cloud department, provision and maintain more than 15 kubernetes environments, have a very large portfolio, and is responsible for maintaining multiple UK and international CFC as well as supporting commissioning of new ones. To satisfy the recent reorganisation and meet all the company’s objectives, the team needs to reduce the portfolio (simplify the current solution, use fewer tools, etc), and clearly grow in size. The team has recently split into two teams with 5 engineers each to allow engineers to focus on different equally important priorities

The mission of the team is to deliver an ever more reliable and scalable on-premise ecosystem for low-latency services, automation and edge devices that is fully adapted to the complete gamut of CFC sizes and geographies (UK included), while ensuring smooth transition to OCEngO cloud platform and providing uninterrupted service for existing sites.

What will you do?

    • Developing and maintaining the kubernetes platform and associated  tooling in order to reduce unplanned work or aid in incident resolution e.g.
      • M&A | Code Changes | Self-healing automation | Upgrades
    • Fixing support escalation issues
    • Efficiency and Capacity Planning
    • Optimising on-call rotations (24x7) and process e.g.
      • Automation | Runbooks | Documenting “tribal” Knowledge | Tooling
    • Conducting Post incident reviews. 
    • Be an automator … We’re using cutting edge technology to facilitate repeatability. We practise continuous integration and are working towards continuous deployment 
    • Be a collaborator ... You’ll be expected to forge deep bonds with our stakeholders and supporting teams in order to truly understand their needs. We work in an Agile environment.
    • Be a teacher… Be generous with your time and expertise to continue the development of our world-class engineering team.
  • Actively contributing to the process of continual improvement, with regard to self, team and systems.

We'd like to hear from you if you have..

  • Are at ease troubleshooting in complex environments using monitoring and logging tools.
  • Demonstrable hands-on experience of operating (and troubleshooting) kubernetes clusters in production environments 
  • Demonstrable hands-on experience working with Docker containers
  • Strong demonstrable programming skills with scripting languages (eg. Python, Ruby Go, bash, etc.)
  • Demonstrable hands-on experience operating (and troubleshooting) Linux clusters, core utils, Linux kernel, storage, networking, etc
  • Demonstrable hands-on experience using git or similar revision control systems
  • Demonstrable hands-on experience operating (and troubleshooting) Istio
  • An ability to focus on the detail to rapidly identify and resolve issues
  • The inclination and ambition to “Automate Everything”
  • A passion for open source technologies
  • Strong sense of collaboration both within the team and across other infrastructure and development teams.
  • Comfortable adapting to change and learning quickly in a fast-paced environment
  • A passion for developer experience and helping people find their way with infrastructure

Bonus points if you have…

  • Public cloud exposure: Google / AWS /
  • Some experience with TDD, design patterns and SOLID principles
  • Hands-on experience of build pipelines and software lifecycles
  • Used and have knowledge of common build tools, repositories and CD/CI tooling.
  • Experience using monitoring and alerting tools (e.g. NewRelic, Prometheus, Grafana)
  • Proven commercial development experience
  • Knowledge of SCRUM or other Agile methodologies

What we offer you

Our employee benefits are designed for you, we care about people and we’ve ensured we have a wealth of benefits that focus on your well-being. Within our flexible environment we can offer technically stretching work, a competitive salary and share schemes. Benefits include pension scheme, train season ticket loan (interest-free), free shuttle bus from Hatfield train station and of course, healthy Ocado retail staff discounts. 

We also have regular divisional socials, sports clubs not to mention the Ocado Technology Academy for a packed schedule of courses, conferences and events such as discussion sessions, conference briefs and external guest speakers. If you think you have what it takes to make a difference, please submit your application below.

We are thrilled to welcome applicants from across the world. Please note that unfortunately we are unable to cover the cost of your visa at this time. We do cover the relevant company costs for visa sponsorship. For all employment offers made for UK roles, it is expected that you will be based in the UK in commutable distance, ready for your first day of work, so please keep this in mind. If you have any questions, please don't hesitate to ask. 

Due to the energising nature of Ocado's business, vacancy close dates, when stated, are indicative and may be subject to change so please apply as soon as possible to avoid disappointment. 

Please note: If you have applied and been rejected for this role in the last 6 months, or applied and been rejected for a role with a similar skill set, we will not re-evaluate you for this position. After 6 months, we will treat your application as a new one. 

Be bold, be unique, be brilliant, be you. We are looking for individuality and we value diversity above gender, sexual orientation, race, nationality, ethnicity, religion, age, disability or union participation. We are an equal opportunities employer and we are committed to treating all applicants and employees fairly and equally. 













Job tags: AWS Bash CD CI Docker Git Go Grafana Kubernetes Linux Open source Prometheus Python Ruby TDD
Job region(s): Europe
Job stats:  3  0  0
  • Share this job via
  • or