Senior Manager, Site Reliability Engineering

Remote, United States

Applications have closed
Wayfair Inc. logo
Wayfair Inc.

Posted 3 months ago

Senior Manager, Site Reliability Engineering 

Wayfair is a leader in the e-commerce space for all things home. We live and breathe modern technologies. We are a “move fast, break things, rethink old standards” team with a startup feel but working with platforms at a massive scale. 

We’re looking for smart, driven and passionate engineering leaders who build and cultivate high caliber engineering teams responsible for building and supporting performance and scalable platform architectures in the Logging and Metrics DevOps disciplines.

The Logging and Metrics platforms at Wayfair are complex distributed systems and data pipelines built mainly using Elastic Stack (formerly ELK), InfluxDB and Apache Kafka. We collect 50+ billion log & metrics events per day, generated by 20,000+ systems and 500+ homegrown applications across multiple geo locales and GCP regions, while supporting searches against these datasets for the purposes of supporting engineering functions like monitoring, alerting, observability, high velocity software development and security incident event management.

What You’ll Do:

  • Manage mission critical platforms as a service for rapid growth and scale that enable a global developer community of 3,000 to write and deploy code multiple times/day to our complex, distributed e-commerce platform
  • Develop monitoring, define SLAs, SLOs and error budgets for said platforms while helping coordinate product launches and reliability exercises
  • Direct the platform team's participation in software development activities and API development supporting application use of the platform throughout the company
  • Help determine the future road-map of platforms and services in the logging, metrics, observability and security incident and event management disciplines
  • Identify resource gaps and lead architecture discussions within the organization
  • Create and maintain detailed documentation for both self-service and on boarding
  • Help build and grow our team by mentoring junior engineers and nurture and develop their skills while assisting them on a variety of projects

What You’ll Need:

  • BA/BS degree from a 4-year college or university
  • 5+ years managing, growing and mentoring a highly performing engineering teams
  • A good handle on Agile / Scrum / Kanban as methodologies
  • 2+ years of hands-on experience with technologies like Elastic Stack (ELK Stack) and Kafka 
  • 2+ years working with configuration management tools such as Puppet, Chef, Ansible
  • Experience in one or more programming languages used in the infrastructure space  - Python, golang, bash, etc. as well as familiarity with version control systems such as Git
  • Knowledge of modern DevOps practices (CI/CD, Microservices, Containers)
  • 2+ years of hands-on experience with IoT/sensor data technologies like InfluxDB
  • Experience working with public cloud platforms such as GCP or AWS and Infrastructure as code frameworks like Terraform


About Wayfair:

Wayfair is one of the world’s largest online destinations for the home. Whether you work in our global headquarters in Boston or Berlin, or in our warehouses or offices throughout the world, we’re reinventing the way people shop for their homes. Through our commitment to industry-leading technology and creative problem-solving, we are confident that Wayfair will be home to the most rewarding work of your career. If you’re looking for rapid growth, constant learning, and dynamic challenges, then you’ll find that amazing career opportunities are knocking.

No matter who you are, Wayfair is a place you can call home. We’re a community of innovators, risk-takers, and trailblazers who celebrate our differences, and know that our unique perspectives make us stronger, smarter, and well-positioned for success. We value and rely on the collective voices of our employees, customers, community, and suppliers to help guide us as we build a better Wayfair – and world – for all. Every voice, every perspective matters. That’s why we’re proud to be an equal opportunity employer. We do not discriminate on the basis of race, color, ethnicity, ancestry, religion, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, gender expression, veteran status, or genetic information.

Job tags: Ansible Apache AWS Bash CD Chef CI ELK GCP Git Golang Kafka Puppet Python Reliability engineering Terraform