Site Reliability Engineer (US - Remote)

US - Remote

Apply now Apply later

Sysdig is the secure DevOps company, and we’re at the forefront of the container, Kubernetes, and cloud revolution. We are passionate, technical problem-solvers, continually innovating and delivering powerful solutions to confidently run cloud-native applications. Our consistent contributions to open source software projects reflect our commitment to the open cloud movement.

We value diversity and open dialog to spur ideas, working closely together to achieve goals. And we're a great place to work too -- we were awarded the 2021 Bay Area Best Places to Work Award from San Francisco Business Times and the Silicon Valley Business Journal. We are looking for team members who share our commitment to customers and are willing to dig deeper, understand problems and deliver innovative solutions. Does this sound like the right place for you?

As a Site Reliability Engineer, you’ll be responsible for the availability, performance, and resilience of the Sysdig platform in our largest on-premise customer environments. You will collaborate with high-performing infrastructure and engineering teams both within Sysdig and customer organizations to help drive the scalability and stability of our platform.

Your Responsibilities

  • Participate in a globally distributed team of Site Reliability Engineers, supporting multiple Sysdig applications across our most critical on-premises customers.
  • Produce best-practice recommendations for on-premises customers to improve customer experiences.
  • Implement disaster recovery and reliability improvement initiatives, including performance tuning and infrastructure optimization.
  • Maintain and support the production environments and communicate directly with customer stakeholders.
  • Participate in an on-call rotation

Your Background

  • Required experience includes:
    • Deploying Kubernetes workloads in a production environment
    • Diagnosing and troubleshooting customer-facing production service outages
    • Writing applications or automation using Python/Golang or Bash
    • Using version control tools such as Git/Github
  • Working experience in managing one of the following database clusters is a must:
    • Cassandra, Elasticsearch, Kafka/Zookeeper, PostgreSQL
  • Knowledge of Helm, Terraform, Prometheus, Grafana is preferred
  • Knowledge of Kubernetes Operators is a big plus.
  • Strong sense of ownership and a focus on customer delight
  • Strong analytical and written skills
  • Ability to work independently and as part of a team

Key Technologies
Kubernetes, Golang, Python, Cassandra, Kafka, Elasticsearch, PostgreSQL, Terraform, Helm

Why work at Sysdig?

  • We’re a well funded startup that already has a large enterprise customer base
  • We have a pragmatic, approachable culture, from the CEO down
  • We have an organizational focus on delivering value to customers
  • Our open source tools ( are widely used and loved by technologists & developers

When you join Sysdig, you can expect:

  • Competitive compensation package
  • Top-notch health insurance coverage

Additionally, we offer a variety of benefits and perks, such as:

  • 401k with company matching up to 3%
  • Flexible vacation policy
Job region(s): Remote/Anywhere North America
Job stats:  5  2  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities