Site Reliability Engineer Lead

San Francisco

Tempo logo
Tempo
Apply now Apply later

Posted 4 weeks ago

Tempo is the next-generation home fitness system—and the first and only strength training solution that can track your motion and use that data to give you a richer, more effective, and safer workout in live and on-demand classes. Using 3D sensors and A.I., Tempo enables expert coaches to correct your form and provide personalized feedback in real-time. It captures motion by emitting pulses of infrared light 30 times a second, generating a 3D model of your body made up of 80,000 individual points. 
Tempo is thriving and we’re experiencing growth of over 300% in sales. Headquartered in San Francisco, Tempo's all-star team includes alumni from Google, YouTube, Netflix, Airbnb, Pixar, and Orangetheory, backed by $80 Million in funding from General Catalyst, Norwest Venture Partners, Founders Fund, Khosla Ventures, DCM, and Signal Fire.
Tempo is seeking a Sr. Site Reliability Engineer to join our Production Engineering team. You will develop and maintain Infrastructure tools and services that directly support the Tempo platform every day. This includes tools and libraries used for system automation, and apps that serve the customers. You will ensure that our systems are healthy, monitored, automated and designed to scale. You will use your engineering background to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues. 

What You'll Do

  • Gain deep knowledge of our complex applications.
  • Serve as a primary point responsible for the overall health, performance, and capacity of Tempo platform and applications.
  • Design, develop and support tools and libraries as part of Infrastructure Tooling & Automation
  • Develop automation tools to support growing infrastructure and provide reporting and APIs for various applications
  • Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale UNIX environment.
  • Troubleshoot and resolve issues with core infrastructure services
  • Incubate new ideas that can bring operational efficiency and support scaling of services
  • Lead internal working groups to evaluate, adopt and deploy new technology
  • Audit software for potential security and performance problems
  • Architect and develop configuration management policies
  • Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth.
  • Work closely with development teams to ensure that platforms are designed with "operability" in mind.
  • Function well in a fast-paced, rapidly-changing environment.
  • Participate in a 24x7 rotation for escalations.

About You:

  • 7+ years of professional software experience in Operations and Reliability Engineering 
  • Preferred having educational backgrounds in Management Information Systems (MIS), Computer Information Systems (CIS), Computer Science (CS), or Mathematics
  • Experience in public cloud solutions like AWS at application setup level and beyond (/GCP)
  • Experience working with Python, Flask, SQLAlchemy, and other frameworks
  • Experience working at scale with thousands of systems in a DevOps/SRE role
  • Experience with configuration management tools (Terraform, Cloudformation etc)
  • Python experience, specifically for systems automation.
  • Familiar with system hardening and server security best practices.
  • Knowledge of most of these: data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, APIs and related topics
  • Expertise automating system administration tasks with scripting tools (Python or shell preferred).
  • Experience with monitoring and automation tools such as DataDog, Sentry, Splunk, Ansible, Terraform etc.
  • Aptitude for analyzing and troubleshooting operating system, networking, configuration and performance problems.
  • Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP etc.
  • Ability to install, configure and maintain Linux hosts and popular open source applications such as Nginx, Apache HTTPd etc..
  • Strong interpersonal communication skills and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Strong desire to work in a fast-paced, start-up environment with short release cycles
  • Bonus
  • Experience with cloud platforms for test execution like SauceLabs etc. 
  • Passion for fitness
  • Working with remote engineering teams
What We Offer:- Competitive compensation package with meaningful equity- Collaborative start-up culture with a close-knit, all star team- Comprehensive health benefits and unlimited PTO- Health related perks - Team workouts and company fitness lab- 401k to eligible employee- Wellness Benefit
Job tags: Ansible Apache AWS CloudFormation GCP Linux Nginx Open source Python Reliability engineering Terraform Unix
Job region(s): North America
Job stats:  0  0  0
  • Share this job via
  • or