Site Reliability Engineer: Database & Distributed Systems

San Francisco

Applications have closed
BitMEX logo

Posted 4 months ago

The Site Reliability Engineer: Database & Distributed Systems is responsible for full life cycle support of all databases. The database engineer  works within an agile/DevOps culture and environment of rapid growth. Effectively addresses and resolves database related issues and requirements as they emerge and collaborates with the Architecture team on design and development activities leveraging infrastructure as code best practices. Monitors the availability, stability, performance and overall health of the database environment(s). Works closely with development and customer support. Primarily supports database & distributed systems initiatives in the west coast office. May participate in strategic initiatives across the organization.

Key Responsibilities

  • Improves resiliency & reliability of the production databases
  • Participates in application development projects and responsible for the associated database & distributed systems architecture and design
  • Participates in SQL code review (to ensure queries are optimized and tuned to perform efficiently prior to production release)
  • Monitoring and uptime of production databases
  • Pro-active remediation of database operational problems
  • Pro-active development / improvement of procedures for automated monitoring, proactive intervention, and remediation of problems related to database availability/stability/data integrity
  • Database deployments and modifications in support of application development activities
  • Database capacity planning (storage, load, etc.)
  • Database backup and recovery
  • Performs query tuning and preventative maintenance
  • Implements process automation for improved efficiencies
  • Supports complex web-based financial applications
  • Bachelor's Degree in Computer Science or equivalent required; Masters Degree in Computer Science or equivalent preferred


  • 5-7 years of relevant experience with at least 4 years experience supporting production critical workloads on PostgreSQL
  • 3 years Docker Experience
  • PostgreSQL experience with version 11.x
  • Proven experience with other database or/and distributed platforms such as Cassandra, Kafka, etc
  • Strong AWS knowledge and experience with RDS / EC2 / Terraform
  • Strong Linux or UNIX knowledge
  • Experience working with offshore support teams
  • Experience with database architecture, logical and physical design, installations, catalog navigation, monitoring and tuning (system, db, resource contention), backup and recovery, replication, HA/DR
  • Experience with automation, documentation, shell scripting, PL/ SQL programming, query tuning, system tuning, resource contention analysis, backup and recovery, standby, replication, etc.
  • Experience with change management in DevOps environments
  • Familiarity with shell scripting (Bash, Python, etc.)
  • Familiarity with or knowledge of Terraform (or similar product)
  • Strong collaboration, analytical, verbal and written communication skills
  • Experience working with offshore support teams
  • S. in Computer Science, Engineering, Mathematics or equivalent work experience
  • Technical certifications for DBMS platforms, AWS, or Linux/Unix a plus
  • Utilizes sound decision making skills and communicates well with other team members and business users. Identifies problems and recommends solutions. Performance will be measured by his/her ability to deliver quality applications on time.
  • Works in a team environment, including cross-functional teams and teams with business users throughout the company. Interacts with all levels of management and staff across the organization


Job tags: AWS Bash Docker EC2 Kafka Linux PostgreSQL Python SQL Terraform Unix