Site Reliability Engineer (Platform)


Applications have closed
Preply logo

Posted 3 months ago

We are currently looking for a Site Reliability Engineer (SRE) to join our Platform tribe.SRE role at Preply combines software development, operations and business skills to run large-scale, fault-tolerant, global language education platform. SRE ensures that Preply systems — have reliability, uptime appropriate to business's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system. This person is expected to work on core parts of our platform and help us to meet the challenges of growing the organization in terms of both traffic and the number of developers. While we have the DevOps team which is responsible for infrastructure in general, The SRE team is responsible for: system observability and alerting, managing and improving incident response processes, managing on-call rotations across the company.We work in small teams, thus you will be able to influence system design and contribute a lot in the company's growth, also we promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.We release our product 40-50 times per day by leveraging modern technologies like Kubernetes (Skaffold+Helm), Docker and top-notch CI/CD processes. We have diverse technical challenges (sometimes we write about them on our Engineering Blog) that will allow you to develop your skills across the stack. 

Your expected outcomes:

  • Be responsible for Preply's uptime record.
  • Own availability and performance of mission critical services and build automation to prevent problem recurrence.
  • Improve system observability and alerting.
  • Manage on-call rotations across company.
  • Improve incident response processes.
  • Establish credibility with the quality of the team's technical execution.
  • Practice sustainable incident response and blameless postmortems.
  • Collaborate with product teams to help them tackle technical issues and design new systems.

Your competency profile:

  • Expertise in problem solving and analyzing high loaded systems.
  • Proficiency with production troubleshooting.
  • Strong knowledge of Python, at least 2+ years experience or other programming language.
  • Hands-on experience with Django, Flask, Postresql, ElasticSearch, Celery, RabbitMQ, Redis, GraphQL, AWS, Kafka, k8s is a plus.
  • Business-oriented & data-driven person.
  • Minimum B2 English level.

What we offer:

  • Work in the same office with easy-going and open-minded people from all over the world.
  • Easy-to-reach location.
  • Health insurance.
  • Monthly bonus deposit for self-development on
  • Strong financial package, paid vacation and sick leave.
  • Ability to work remotely 1 day per week.
  • Possibility to become a part of a truly big story.
  • Active office life with biweekly gatherings.
Job tags: AWS CD CI Django Docker Elasticsearch GraphQL Kafka Kubernetes Python RabbitMQ Redis