Bitquery is hiring a

SRE/ Site Reliability Engineer (Middle / Senior)

Full-Time
Remote

Bitquery is an API-first product company dedicated to powering and solving blockchain data problems using ground truth, and on-chain data. Bitquery extracts and presents valuable data via APIs. These APIs are delivering solutions to multiple verticals like Decentralize Finance (DeFi), DEX Arbitrage Analytics, Crypto Surveillance & Forensics across all major blockchains like Bitcoin, Ethereum, EOS, and Tezos.

We are an international company of developers of software for the analysis of decentralized data (40+ chains). Bitquery is a distributed team. Currently, are looking for a full-time SRE engineer to further develop/monitor/support the infrastructure, and automation of various processes. Also, you can be on duty with shift time.


Roles & Responsibilities:

  • Ensuring the smooth operation of software, environments and company services
  • Analyzing and improving the performance and availability of products
  • Identification of bottlenecks in the architecture and in the infrastructure
  • Improvement of system alerting and incident management
  • Improvements of the monitoring systems based on SLI (Prometheus, Icinga, Grafana etc.)
  • Formalization of SLI under the main business requirements
  • Formation of SLO for services and infrastructure in general
  • Minimization of system recovery time (RPO and RTO)
  • Analysis of incidents in the prod environment
  • Capacity management

Requirements

  • 5+ years of work experience implementing, troubleshooting, and supporting infrastructure software and distributed systems
  • Develop in one or more languages (Golang, python, ruby) for at least 2 years
  • Worked with virtualization and containerization technologies (containerd, docker, k8s) for more than 2 years
  • Set up CI of varying complexity (Jenkins) with CD to different environments
  • Experience in creating and maintaining a fault-tolerant system, with log coverage, monitoring, and alerting
  • Understanding the principle of "infrastructure as code" and the ability to test it (Ansible Terraform)
  • Principles of organizing network security (IPsec, WAF, IPS)


Our Tech Stack:

  • Infrastructure: Bare-metal / AWS
  • Databases: Clickhouse / MySQL
  • SCM: git / GitHub
  • Message broker: Kafka
  • Repository: Nexus
  • CI/CD: Jenkins
  • Monitoring: Icinga 2, Grafana, Prometheus, Victoria metrics, ELK
  • Orchestration: k8s, Ansible, Terraform
  • Containers: LXC, Docker
  • Scripting: Python, Golang, Ruby, Groovy
  • OS: Debian/Ubuntu
  • Others: Docker compose, IPSec


Benefits

  • Opportunity to work & collaborate with a truly global team spread across 5 countries
  • Work from anywhere in the world
  • Choose your own work hours
  • Yearly trip with Bitquery team to any remote destination
  • A promise to finish the interview processes within 1-2 weeks

Being a startup we take decisions & move fairly fast, while giving candidates great experience with the interview process. We have a flat hierarchy in the organization where we empower individuals and provide an opportunity to deliver results as per his/her working style. Come and join a great culture and build Bitquery with us.

Apply for this job

Please mention you found this job on Startup Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quickly
Be the first to apply. Receive an email whenever similar jobs are posted.
Prepare for your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Apply for this job