Site Reliability Engineer

San Francisco

Applications have closed
BitMEX logo

Posted 4 months ago

The BitMEX Infrastructure team sits at the core of the business and is responsible for the reliability and scalability of all the services that power the platforms and its developers. In only a few years, BitMEX became the leading crypto-products trading platform worldwide, and handles ten of thousands low latency transactions per second, representing several billions of dollars traded every day. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.


  • Be on a Pager rotation to respond to BitMEX availability incidents and provide support for service engineers with customer incidents.
  • Run our infrastructure with Chef, Terraform and Kubernetes.
  • Make monitoring and alerting alert on symptoms and not on outages.
  • Document every action so findings turn into repeatable actions–and then automation.
  • Improve the deployment process to make it as boring as possible.
  • Design, build and maintain core infrastructure pieces that allow BitMEX scaling to support hundred of thousands of concurrent users.
  • Debug production issues across services and levels of the stack.
  • Plan the growth of BitMEX’s infrastructure.

About You: 

  • Think about systems - edge cases, failure modes, behaviors, specific implementations.
  • Have experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies
  • 6+ years of professional experience, with a proven track record of designing, implementing, managing, and testing infrastructure at scale on AWS for high value environments,
  • Strong engineering skill set with a firm grasp of fundamental Computer Science principles and a modular, maintainable, agile & test-driven approach to software development
  • Capacity to multitask and give equal attention to a variety of functions while under pressure
  • Strong technical troubleshooting, diagnosing and problem solving skills
  • Ability to adapt to changing priorities within a fast moving industry and startup culture
  • A Bachelor’s degree or equivalent work experience preferred 
Job tags: AWS Chef Docker Kubernetes Linux Nginx Terraform