Senior Site Reliability Engineer
Careem is the internet platform for the greater Middle East region. A pioneer of the region's ride-hailing economy, Careem is expanding services across its platform to become the region's everyday Super App. Careem's mission is to simplify and improve the lives of people and build an awesome organisation that inspires. Established in July 2012, Careem operates in over 100 cities across 13 countries and has created more than one million employment opportunities in the region. Careem became a wholly-owned subsidiary of Uber Technologies, Inc. in January 2020.
About the role
We are looking for engineers who will work within the SRE team to focus on enabling Kubernetes and taking cloud-native technology to the next level within Careem. We need expert, execution-focused engineers to help shape the future of the Careem platform and to help us scale our already sizable effort greatly. As an SRE in Careem, you'll architect, build and maintain Kubernetes clusters and its corresponding ecosystem required to ensure resilience, reliability of our services and speed up deployments with the aim of improving our products used by millions of customers every day. Key responsibilities include:
- Make an impact from design phase, through development and operation of Kubernetes cluster and its ecosystem on AWS
- Build core services, tooling and create technical processes that simplify and enable engineers across multiple services
- Identifying and automating and scale systems without compromising on security and reliability
- Participate in on-call rotations and help improve incident response
What you’ll need
- 5+ years of software engineering and/or operations experience
- Expertise and experience in architecting, developing, operating and troubleshooting Kubernetes clusters and/or other highly available systems at scale
- Good knowledge of at least one of the following programming languages: Go, Python, Java, Rust, C++
- Experience with Cloud Infrastructure (AWS preferred)
- Experience with infrastructure automation (Infrastructure as Code)
- Strong Unix or Linux background, including topics around network stack and scripting
- Incident response and/or incident management experience is a plus
- Experience on DevOps topics such as monitoring, CI/CD, security is a plus
- Effective communication and collaboration skills: have the ability to drive and promote technical partnerships across teams
What will keep you busy?
- Work within a lean team in quick iterations on large impactful projects
- Collaborate with teams to deliver an efficient cloud-native platform
- Ensure infrastructure is scalable, responsive, and reliable
- Automate everything with tools and workflows
- Evaluate and select tools and technologies needed for efficient support of the systems
- Keep our solutions cost-effective by improving performance and increasing utilization
What do we offer you?
Working in an international environment with colleagues from 70+ nationalities, ownership culture, flexible working hours, unlimited (paid!) holidays and the latest technologies.
Careem gives equal opportunities. All aspects of ownership including the decision to hire, promote, discipline, or discharge, will be based on merit, competence, performance, and business needs. We celebrate diversity and are committed to creating an inclusive environment for everyone.