Software Engineer, Performance Infrastructure

Remote, United States

Applications have closed
Slack logo
Slack
Where work happens

Posted 1 month ago

Build the infrastructure powering work

Slack enables people all over the world to communicate and collaborate together. Teams of all scales — from the world’s largest public companies to the smallest of startups — use Slack to get work done, so we take performance and reliability very seriously. A taste of our scale:

  • The average user spends over 10 hours connected and 2.5 hours active in Slack every single day
  • 1.5 billion messages are sent per month, half of those outside the United States
  • Every day we see over 6 million simultaneously connected users, over a billion web requests, and 10s of billions of database queries.

For millions of people, Slack is the primary communication tool they use at work all day long. They expect it to be exceptionally reliable and fast, all the time.

Infrastructure at Slack

We operate at tremendous scale with systems that process millions of events per second. Our team maintains and builds the lower levels of our stack, including:

  • Edge services
  • Data Stores and Caches 
  • Real-time messaging
  • Asynchronous background job processing

We know we’ve done our job correctly when none of our users think about us. We don’t typically ship new user-facing features, but rather ensure our systems are incredibly performant, highly available, reliable, and scalable. In other words, Slack just works seamlessly.

Slack's API and web backend is built on PHP/Hack, and our backend services are written in Java and Go. Our data infrastructure is built on Kafka, Hadoop, Hive, Presto, Spark, and MySQL/Vitess. Our former Chief Architect, Keith Adams, spoke about our architecture at QCon in 2016. Bing Wei and Michael Demmer, Backend Engineers on our team spoke about Flannel (our homegrown application aware cache) at QCon in 2017 and Vitess/MySQL at  Percona Live in 2017 respectively. 

Our Team

The performance infrastructure team is small but mighty. We strive to help engineers identify critical performance problems early and build an inclusive performance-minded culture. We build tooling that enables anyone at Slack to spot bottlenecks quickly, including safe and reliable load testing systems, and pre-merge performance regression monitoring. (One of our engineers gave a talk about one of these tools at Strange Loop in 2019).

We rapidly iterate and work closely with other teams in engineering to ensure Slack stays snappy for teams of all sizes. We are ambitious, pragmatic team-players. If you want to work on a collaborative team solving big scale and performance problems, then look no further!

What you will be doing

  • Design, build, ship and maintain tooling that enables load testing of Slack’s core systems.
  • Collaborate with peers across Engineering to triage bugs and troubleshoot complex production issues across the stack, especially with respect to performance.
  • Whiteboard a fix to a scaling problem — and then make it happen!
  • Write, review, or provide feedback on a technical design proposal.
  • Work on with engineers working on projects such as FlannelScaling Job QueueReducing Slack’s memory footprint as well as scaling the MySQL/Vitess data tier.

What you should have

  • You’ve been building large scale systems professionally for 2+ years and can point to things you’ve worked on.
  • You possess strong Computer Science fundamentals: data structures, algorithms, programming languages, operating system, distributed systems, and information retrieval.
  • You are a very strong communicator. You’re excited to explain complex technical concepts and share your knowledge with different audiences.
  • You have experience building reliable and safe distributed systems and understand the trade-offs made when engineering a feature.
  • You know how the web works, are thoughtful about data architecture and MySQL/datastore performance tuning, and know what a good API looks like.
  • You can jump into situations with few guardrails and make things better.
  • You write code that can be easily understood by others with an eye towards clarity and maintainability.
  • You are curious how things work; when they break you are eager and able to help fix them.
  • You have a Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience.

Slack has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative. If this sounds like a good fit for you, why not say hello?

Slack is registered as an employer in many, but not all, States. If you are not located in or able to work from a State where Slack is registered, you will not be eligible for employment. Visa sponsorship may not be available in certain remote locations.

Visa sponsorship is not available for candidates living outside the country of this position.

 

Job tags: Go Hadoop HTML Java Kafka MySQL PHP Spark
Job region(s): North America Remote/Anywhere
Job stats:  2  0  0