Manager, Site Reliability Engineering
San Francisco, CA
Virta is the first company with a clinically-proven treatment to safely and sustainably reverse type 2 diabetes and other chronic metabolic diseases without the use of medications or surgery. Our innovations in nutritional biochemistry, data science and digital tools combined with our clinical expertise are shifting the diabetes treatment paradigm from management to reversal. Our mission - to reverse type 2 diabetes in 100 million people by 2025.
Virta is in a phase of rapid growth and we are investing heavily in our GCP-based Kubernetes infrastructure to ensure that we have a solid foundation on which to grow. This role provides a key opportunity to help develop and instill the site reliability practices that will help scale our business to the next level, as well as ensure our patients have continuous access to our life-changing treatment.
As the Manager of Site Reliability Engineering at Virta, you will be supporting Virta’s patients and clinical staff by ensuring Virta’s systems are always available and performant. Some of the responsibilities will include:
- Build and maintain monitoring systems and processes to ensure product engineers get actionable data for the components they maintain.
- Coordinate with the product teams to enhance the scalability and reliability of our systems through analysis and observability improvements.
- Engage in capacity planning with load testing and auto-scaling strategies.
- Own the incident response process, including, development of sustainable practices, learnings, and ensuring blameless postmortems.
- Work across the engineering team to encourage excellence in incident response and build a culture of site reliability engineering.
- Efficiently troubleshoot issues across our systems and software to determine root causes and impact.
90 Day Plan
Within your first 90 days at Virta, we expect you will do the following:
- Build and manage the site reliability engineering team required to tackle these challenges.
- Learn Virta’s system and network architecture to take part in incident response and troubleshooting activities.
- Begin to understand the current site reliability challenges and build a roadmap to drive maturity.
- 6+ years of experience in site reliability or comparable roles working in a modern containerized cloud environment.
- Experience leading a team of site reliability engineers and driving a culture of site reliability across an organization
- Proficiency in at least one language (Python, Go, Ruby).
- Experience implementing monitoring tools and alerting systems .
- Excellent troubleshooting skills during incident response events.
Virta’s company values drive our culture, so you’ll do well if:
- You put people first and take care of yourself, your peers, and our patients equally
- You have a strong sense of ownership and take initiative while empowering others to do the same
- You prioritize positive impact over busy work
- You have no ego and understand that everyone has something to bring to the table regardless of experience
- You appreciate transparency and promote trust and empowerment through open access of information
- You are evidence-based and prioritize data and science over seniority or dogma
- You take risks and rapidly iterate
As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta's security and privacy procedures to ensure our patients' information remains strictly confidential. Security and privacy training will be provided.
More DevOps and Cloud position highlights
- Explore open Data Platform Engineer Jobs
- Explore open Staff, Product Manager - Global Infrastructure Jobs
- Explore open Manager of DevOps & Engineering Infrastructure Jobs
- Explore open Linux Infrastructure Developer Jobs
- Explore open Principal Cloud Architect Jobs
- Explore open Senior Automation Engineer Jobs
- Explore open DevOps Infrastructure Engineer Jobs
- Explore open IT DevOps Engineer Jobs
- Explore open Site Reliability Engineer II Jobs
- Explore open Senior Cloud Architect Jobs
- Explore open Staff DevOps Engineer Jobs
- Explore open Software Development Engineer, AWS Security Jobs
- Explore open Reliability Engineer Jobs
- Explore open Senior Software Engineer - Site Reliability - Toronto Hub Jobs
- Explore open Sr Software engineer (Infrastructure) Jobs
- Explore open Senior Security Automation Engineer Jobs
- Explore open DevOps Engineer - Raleigh Hub Jobs
- Explore open Solutions Architect - VMware Specialist Jobs
- Explore open DevOps Engineer - Python/Ansible Jobs
- Explore open Senior Quality Automation Engineer Jobs
- Explore open Application Developer: DevOps Jobs
- Explore open Senior DevOps Engineer - Pleasanton Hub Jobs
- Explore open DevOps Engineer (Remote) Jobs
- Explore open Cloud DevOps Systems Engineer Jobs
- Explore open Senior Software Development Engineer, AWS Security Jobs
- Explore open REST-related jobs
- Explore open MySQL-related jobs
- Explore open Prometheus-related jobs
- Explore open CloudFormation-related jobs
- Explore open Jira-related jobs
- Explore open S3-related jobs
- Explore open Elasticsearch-related jobs
- Explore open Virtualization-related jobs
- Explore open High availability-related jobs
- Explore open VMware-related jobs
- Explore open Golang-related jobs
- Explore open EC2-related jobs
- Explore open Reliability engineering-related jobs
- Explore open Redis-related jobs
- Explore open MongoDB-related jobs
- Explore open JS-related jobs
- Explore open Grafana-related jobs
- Explore open PostgreSQL-related jobs
- Explore open Gitlab-related jobs
- Explore open Node-related jobs
- Explore open Perl-related jobs
- Explore open Web applications-related jobs
- Explore open Spark-related jobs
- Explore open Load Balancing-related jobs
- Explore open Node.js-related jobs