Manager, Site Reliability Engineering

San Francisco, CA

Virta Health logo
Virta Health
Apply now Apply later

Posted 3 weeks ago

Virta is the first company with a clinically-proven treatment to safely and sustainably reverse type 2 diabetes and other chronic metabolic diseases without the use of medications or surgery. Our innovations in nutritional biochemistry, data science and digital tools combined with our clinical expertise are shifting the diabetes treatment paradigm from management to reversal. Our mission - to reverse type 2 diabetes in 100 million people by 2025.

Virta is in a phase of rapid growth and we are investing heavily in our GCP-based Kubernetes infrastructure to ensure that we have a solid foundation on which to grow.  This role provides a key opportunity to help develop and instill the site reliability practices that will help scale our business to the next level, as well as ensure our patients have continuous access to our life-changing treatment.


  • As the Manager of Site Reliability Engineering at Virta, you will be supporting Virta’s patients and clinical staff by ensuring Virta’s systems are always available and performant. Some of the responsibilities will include:

    • Build and maintain monitoring systems and processes to ensure product engineers get actionable data for the components they maintain.
    • Coordinate with the product teams to enhance the scalability and reliability of our systems through analysis and observability improvements.
    • Engage in capacity planning with load testing and auto-scaling strategies.
    • Own the incident response process, including, development of sustainable practices, learnings, and ensuring blameless postmortems.
    • Work across the engineering team to encourage excellence in incident response and build a culture of site reliability engineering.
    • Efficiently troubleshoot issues across our systems and software to determine root causes and impact.

90 Day Plan

Within your first 90 days at Virta, we expect you will do the following:

  • Build and manage the site reliability engineering team required to tackle these challenges.
  • Learn Virta’s system and network architecture to take part in incident response and troubleshooting activities.
  • Begin to understand the current site reliability challenges and build a roadmap to drive maturity.


  • 6+ years of experience in site reliability or comparable roles working in a modern containerized cloud environment.
  • Experience leading a team of site reliability engineers and driving a culture of site reliability across an organization
  • Proficiency in at least one language (Python, Go, Ruby).
  • Experience implementing monitoring tools and alerting systems .
  • Excellent troubleshooting skills during incident response events.

Values-driven culture

Virta’s company values drive our culture, so you’ll do well if:

  • You put people first and take care of yourself, your peers, and our patients equally
  • You have a strong sense of ownership and take initiative while empowering others to do the same
  • You prioritize positive impact over busy work
  • You have no ego and understand that everyone has something to bring to the table regardless of experience
  • You appreciate transparency and promote trust and empowerment through open access of information
  • You are evidence-based and prioritize data and science over seniority or dogma
  • You take risks and rapidly iterate

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta's security and privacy procedures to ensure our patients' information remains strictly confidential. Security and privacy training will be provided.

Job tags: GCP Go Kubernetes Python Reliability engineering Ruby
Job region(s): North America
Job stats:  0  0  0
  • Share this job via
  • or

More DevOps and Cloud position highlights