Senior Site Reliability Engineer
You will be part of the Engineering team at ServiceTitan to help improve our products and build new ones. We provide exciting opportunities for engineers to come in and have a huge impact on a rapidly growing startup. We build for perfection, use the most modern tools on the Microsoft .NET platform, have an amazing culture, and love to solve complex problems.
At ServiceTitan, the SRE team engages the entire lifecycle of software development from ideation to operating predictably at scale. As a Senior SRE at ServiceTitan, you will identify and build software to improve uptime, improve performance, and improve the overall customer experience. You will collaborate with architects and software engineers to deliver a highly available and highly automated infrastructure.
As our Senior Site Reliability Engineer, you will:
- Design, develop, and deliver the necessary software engineering solutions to manage Azure cloud environments to minimize failed customer interactions.
- Own reliability, availability, and performance of ServiceTitan’s SaaS.
- Proactively monitor, measure, and improve all areas of infrastructure and operations.
- Increase efficiencies through automation, service delivery, and process improvements.
To be successful in this role, you'll need:
- 4 years of experience in programming in Python, additional experience with PowerShell, Bash, and Golang a plus
- Experience leveraging cloud architecture and applying site reliability principles.
- Experience directly managing cloud infrastructure in AWS, Azure, or GCP.
- Experience designing and maintaining production services in Kubernetes environments.
- Experience with SQL and NoSQL databases.
- Experience with Teamcity CI/CD.
- BA/BS in Computer Science, Computer Engineering or in a related technical discipline or equivalent industry experience.
- Be able to craft beautiful infrastructure as code solutions.
- Demonstrated sensitivity to operational concerns.
- Demonstrated ability to debug code and troubleshoot outages.
- Full-stack troubleshooting skills across all software and hardware layers.
- Superb communication skills, both written and verbal.
- Passionate about solving complex infrastructure challenges.
- Excited about delivering a reliable high-quality product.
- Highly motivated, smart, independent person who thrives in a fast-paced innovative environment.
- Intensely eager to meet the needs of our customers and deliver best-of-breed SaaS solutions.
- Experience using telemetry to understand throughput, limitations, and constraints in a service.
- Understanding of architectural patterns to improve uptime.
- Able to Monitor and improve site stability.
- Passion for system, application and business metrics.