Want to build Web 3 with us? The next few years in crypto, NFTs and Web3 belong to builders and believers — not short-term speculators. At Rarible, we believe Web 3 will only proliferate when teams build excellent infrastructure, gaps and solutions that serve communities and create a better internet for everyone.If that sounds like music to your ears, we’re looking for you.
Here’s why:We are looking for a Site Reliability Engineer.
You have experience and are culturally aligned with fast-moving small teams. You have experience at globally distributed startups. You are self-driven, are comfortable wearing many hats, and can deliver swiftly when needed. You can identify company priorities, own them, and iterate quickly to ship the best solution.
What you will do:
- Work closely with engineering teams to ensure Rarible well operated and monitored systems, which are designed and implemented for failure.
- Provide incident response and support for our production systems.
- Continuously work with engineering teams to improve MTTR (Mean Time to Recovery).
- Automate our operational processes as needed, with accuracy and in compliance with our security requirements.
- Improve tools and advocate operational excellence for continuous monitoring, self-healing systems and alert transparency.
- Work on tooling, documentation, playbooks and education needed to ensure that engineering teams could deliver and maintain reliable, observable and scalable systems in self-managed format.
- Make sure that reliability related metrics are calculated, communicated and continuously improved.
Who you are:
- You have 5+ years of relevant experience in ensuring reliability and scalability of production systems.
- You are proactive and good at communication.
- Monitoring and observability of the systems is one of your main skills, including usage of tracing, RUM and advanced alerts.
- Good in programming languages such as TypeScript/JavaScript and Java/Kotlin/Scala.
- Worked closely with Software Engineers on a day-to-day basis in ensuring together reliability of production systems and having incident response for both infra and software levels.
- Experience with CI/CD so you can improve deployment process and reduce risks.
- Deeply understand and worked with Kubernetes and LXC (Linux Containers).
- Managed: MongoDB, Postgresql, Elasticsearch, Kafka, JVM.