Site Reliability Engineer - Platform and Operations
At Tripadvisor, we believe there’s good out there. We understand that travel brings out the best in us, while lifting deserving businesses and strengthening the communities we meet along the way. We know that as the world’s largest travel site, we’re here to inspire and empower people to explore our world with confidence — and share their trip experiences with others. And every day, we see our ideas come to life in work that enables travelers to discover all the good the world has to offer. As part of the Tripadvisor team, you’ll have the opportunity to do the same.
The Service Platform team builds foundational services, tools and patterns used by the rest of the engineering organization to deploy services reliably and predictably. We manage the underlying infrastructure in collaboration with our TechOps team to provide an abstraction layer over on-premises and cloud resources. We train other engineering teams in techniques used to debug their distributed systems, optimize service performance and tolerate failures.
As a Site Reliability Engineer on the Service Platform team, you will act as a force multiplier for our engineering teams to optimize the development and operational experience of our microservices architecture.
What You’ll Do:
- Build and maintain the services that are central to our microservices architecture and the scalability and observability of the ecosystem
- Build and maintain a CI/CD pipeline to manage the deployment of Kubernetes-based microservices
- Use and extend tooling and monitoring to surface bottlenecks and pre-empt failures
- Leverage technologies like Java, Go, Kubernetes, Ansible, Terraform, Docker, PostgreSQL, Cassandra, Redis to provide a self-service platform for other engineering teams to utilize
- Participate in periodic on-call duties and ensure that incident root causes are identified, debugged and resolved to prevent recurrence
- Evangelize SRE best practices
Who You Are:
- BA or BSc Degree in Computer Science, related field, or equivalent experience
- Experience building and operating services in a distributed environment
- Experience working on an engineering team building software, preferably microservices
- Experience with managing systems in both AWS and on-premises preferred
- Experience with administering a RDBMS in a production environment, preferably PostgreSQL and/or Cassandra
- Strong knowledge of UNIX and TCP/IP network fundamentals.
- Experience with monitoring, metrics, and visualization tools (Icinga, Graphite, Prometheus, ELK, etc.)