Infrastructure Engineer, Observability
Build a more reliable Stripe.
Stripe’s infrastructure powers businesses all over the world. We process payments, run marketplaces, detect fraud, help entrepreneurs start an internet business from anywhere in the world, build world-class developer-friendly APIs, and more. If you’re an infrastructure engineer here, you’ll get to build the systems that power our products.
The success of every single API request we process is critical to everyone involved! We can’t go down because our users’ businesses depend on us.
You’ll be on a team that maintains a product we provide to the rest of engineering, like storage or message queueing. You’ll make decisions with a significant impact on Stripe. There is a lot of work to do to make Stripe engineers’ work easier and our platform even more reliable than it is today, and we’d love for you to be part of it. We’re close to the people using our systems, so we constantly get feedback that we can use to make them better. The team will help all of engineering—from the CTO to our interns—by identifying, creating and automating engineering practices, processes and software that will be leveraged by the whole organization to improve reliability.
You’ll work with other infrastructure engineers as well as product engineers who use the systems you’re building.
We’re looking for people with a strong background (or interest!) in systems. We’d love to hear from you whether you’re a seasoned systems developer, or whether you’ve just learned you might like working with
Many of our infrastructure engineers work remotely, and we’d be happy to talk to you about the possibility of working remote.
- Design, build and maintain the core interfaces and infrastructure used by all of Stripe’s engineering teams
- Debug issues across services and levels of the stack
- Scale the observability infrastructure to support hundreds of terabytes of logs and hundreds of billions of metric data points daily
- Build a great customer experience for people using your infrastructure
We’re looking for someone who:
- Think about systems -- their edge cases, failure modes and life cycles
- Are comfortable operating infrastructure systems at scale
- Can debug complex problems across the whole stack
- Focus on the needs of your users
- Are able to write high quality code in a programming language (e.g. Ruby, Scala, Go)
- Worked with data pipelines moving around large sets of data, quickly
- Managed an on-premise logging installation (e.g. Splunk, ELK) or time series metric database (e.g. Prometheus, InfluxDB, M3DB)
- Familiarity with Splunk Apps and building efficient Splunk dashboards and saved searches
Projects you could work on:
We have a ton of important work to do, which is why we’re hiring! Our projects are of course changing all the time, but here are a few projects either that we’ve done in the past, so you can get an idea of the types of work we do. Technologies we use include: haproxy, nginx, consul, jenkins, signalfx, statsd, kafka, rabbitmq, storm, and many others.
- Plan and implement multi-region availability for our distributed job queuing infrastructure! All of our systems can sustain losing machines, and making our systems even more resistant to failure is a big theme for us. If you like thinking about distributed systems, you might find a good home here!
- Write easy-to-use and reliable client libraries for our Kafka or database systems. You’ll write abstractions and provide reasonable defaults around timeouts and error handling for a complex system.
- Move us to a region with no downtime.
- Build fantastic code review tools! If you love helping developers be more effective at their jobs, we have a ton of interesting projects in this area. Related projects: you could help us have better reproducible builds with Bazel and build great developer environments.
- We have a bunch of projects around deploying and running code: help us instantly roll back bad deploys so that we can recover quickly, and build infrastructure that lets us scale up our API workers in seconds in response to high API load.
- We need to scale our databases to handle 10x the load they can today. You could help us shard them more effectively, upgrade our database engines, and build great tools for developers so they can understand their slow queries more easily. A lot of our database projects are open source.
What’s it like to work at Stripe?
Stripe is helping the internet fulfill its potential as a platform for economic progress by building software tools that accelerate global economic access and technological development. Stripe makes it easy to start, run and scale an internet business from anywhere in the world.
Stripe is, at its heart, an engineering company. To provide a missing pillar of core internet infrastructure, we hire people with a broad set of technical skills (and from a wide variety of backgrounds) who are ready to take on some of the most challenging problems in the industry – from reliably handling 100M API requests per day, to building adaptive machine learning as a result of years of data science and infrastructure work, and enabling entrepreneurs worldwide to start a global internet business.
We look at Stripe as a constant work in progress and the same is true of our people; for all of us, we believe the best is yet to come. We’re here to support each other in our curiosity and creativity – which we pursue through thoughtful discussion and knowledge-sharing among a diverse set of peers and colleagues.
We encourage all engineers to transition teams once every year and a half and also take on short-term projects with other teams across Stripe. This enables engineers to learn how different parts of Stripe work while also establishing stronger ties and cross-pollination between groups.
We contribute to existing open-source projects and the people working on them, and we release several tools as open-source.
We want to work in a company of warm, inclusive people who treat their colleagues exceptionally well. The kind of people who are committed to going out of their way to help other Stripes in the short-term and pushing them to improve over the long-term (by helping them to get better at what they do).
We’re a highly cross-functional organization and view that as part of the fun: we design our space to encourage as much collaboration as possible. We have long tables in the kitchen for a reason (to enable everyone to meet new people and learn from them). We also have a culture of transparency that we carry through to email communication, ensuring that Stripes all around the world have the information they need to make good local decisions.
In both our products and our people, we aim to reflect, represent and advocate for all of our users, globally. Our users transcend geography, culture and language; what we share, collectively, is a drive to create a fairer, more economically interconnected world.