Data Infrastructure Engineer, Reliability Tooling

Seattle or Remote, North America

Full Time
Stripe logo
Apply now Apply later

Posted 4 weeks ago

Build a reliability platform powering economic growth 

Stripe powers businesses all over the world. We process payments, run marketplaces, detect fraud, help entrepreneurs start an internet business from anywhere in the world, build world-class developer-friendly APIs, and more. If you’re a software engineer here, you’ll get to build the systems that power our products and enrich our customers’ experiences.

Stripe doesn’t process quite as many requests as Twitter or Facebook, but we do care a very great deal about reliability. Every request we process is very important to everyone involved and we handle large volumes of money transactions worldwide! We can’t go down because our users’ businesses depend on us.

You’ll be on a team that builds products and tools for other teams at Stripe, like reliability metric and incident communication tools.  You’ll make decisions with a significant impact on Stripe. There is a lot of work to do to make Stripe engineers’ work easier and our platform even more reliable than it is today, and we’d love for you to be part of it.  This team plays an important role in increasing users’ confidence in Stripe.   We’re close to the people using our systems, so we constantly get feedback that we can use to make them better.  Reliability is a fast growing organization, where you’ll work with all engineering teams at Stripe to help them build confidence that their offerings are and will continue to operate reliably.  

We’re looking for people with a strong background (or interest!) in developing new data solutions and in driving new processes.  We’d love to hear from you whether you’re a seasoned data engineer, software developer, or whether you’ve just learned you might like to solve big data problems.  Many of our software engineers work remotely, and we’d be happy to talk to you about the possibility of working remote.

You will:

  • Design, build, and maintain the core reliability products used by all of Stripes
  • Design, build, and maintain reliability-related metrics data and reporting platform used by both internally and--when appropriate--externally
  • Own tools and processes for Stripe’s incident program, including facilitating remediation, tracking resolution, streamlining communications and reporting on an incident’s final impact
  • Provide guidance and recommendations for other Stripes on how to improve their team’s reliability
  • Work closely with program managers and business partners to turn feedback into customer centric features

We’re looking for someone who has:

  • Experience with big data architecture and solutions
  • Passion in data driven decision making and analytics
  • Think about systems — their edge cases, failure modes, and lifecycles
  • Know your way around a Unix shell
  • Can debug complex problems across the whole stack
  • Focus on the needs of our users, both internal and external
  • Hold yourself and others to a high bar when working with production
Job tags: Go Unix
Job region(s): North America Remote/Anywhere
Job stats:  2  0  0
Share this job: