Lisbon, Portugal

Want to build Web 3 with us? The next few years in crypto, NFTs and Web3 belong to builders and believers — not short-term speculators. At Rarible, we believe Web 3 will only proliferate when teams build excellent infrastructure, gaps and solutions that serve communities and create a better internet for everyone.If that sounds like music to your ears, we’re looking for you.

Here’s why:We are looking for a Site Reliability Engineer.

You have experience and are culturally aligned with fast-moving small teams. You have experience at globally distributed startups. You are self-driven, are comfortable wearing many hats, and can deliver swiftly when needed. You can identify company priorities, own them, and iterate quickly to ship the best solution.

What you will do:

Work closely with engineering teams to ensure Rarible well operated and monitored systems, which are designed and implemented for failure.
Provide incident response and support for our production systems.
Continuously work with engineering teams to improve MTTR (Mean Time to Recovery).
Automate our operational processes as needed, with accuracy and in compliance with our security requirements.
Improve tools and advocate operational excellence for continuous monitoring, self-healing systems and alert transparency.
Work on tooling, documentation, playbooks and education needed to ensure that engineering teams could deliver and maintain reliable, observable and scalable systems in self-managed format.
Make sure that reliability related metrics are calculated, communicated and continuously improved.

Who you are:

You have 5+ years of relevant experience in ensuring reliability and scalability of production systems.
You are proactive and good at communication.
Monitoring and observability of the systems is one of your main skills, including usage of tracing, RUM and advanced alerts.
Good in programming languages such as TypeScript/JavaScript and Java/Kotlin/Scala.
Worked closely with Software Engineers on a day-to-day basis in ensuring together reliability of production systems and having incident response for both infra and software levels.
Experience with CI/CD so you can improve deployment process and reduce risks.
Deeply understand and worked with Kubernetes and LXC (Linux Containers).
Managed: MongoDB, Postgresql, Elasticsearch, Kafka, JVM.

Apply for this job

Please mention you found this job on Startup Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quickly

Be the first to apply. Receive an email whenever similar jobs are posted.

Prepare for your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's