Staff Software Engineer, Site Reliability

Dublin, Ireland

Full Time Senior-level / Expert
Slack logo
Slack
Where work happens
Apply now Apply later

Build the infrastructure powering work.    Our Team
Slack's Datastores team builds and operates the database platform powering Slack. We write software to manage thousands of stateful hosts, providing several petabytes of online database capacity. We are building one of the fastest-growing database platforms in the world. Our MySQL databases run in Vitess. You can read more about our migration to Vitess at: Scaling Datastores at Slack with Vitess   Background
Slack enables people all over the world to communicate and collaborate together. Teams of all scales — from the world’s largest public companies to the smallest of startups — use Slack to get work done, so we take performance and reliability very seriously. A taste of our scale:
  • The average user spends over 10 hours connected and 2.5 hours active in Slack every single day
  • 1.5 billion+ messages are sent per month, half of those outside the United States
  • Every day we see over 10M+ daily active users, over a billion web requests, and 10s of billions of database queries.
For millions of people, Slack is the primary communication tool they use at work all day long. They expect it to be exceptionally reliable and fast, all the time.   Infrastructure at Slack
We operate at tremendous scale with systems that process millions of events per second. Our team maintains and builds the lower levels of our stack, including:
  • Edge services
  • Data Stores and Caches 
  • Real-time messaging
  • Asynchronous background job processing
We know we’ve done our job correctly when none of our users think about us. We don’t typically ship new user-facing features, but rather ensure our systems are incredibly performant, highly available, reliable, and scalable. In other words, Slack just works seamlessly.
Slack's API and web backend is built on PHP/Hack, our backend services are written in Java and Go, and we use Vitess as our storage engine. Our architecture is constantly evolving to handle millions more users. You can read about how we scaled our datastores with Vitess, how we respond to incidents, and much more on our blog.   If you were to join Slack, here the types of things you would do over the course of a typical week:
  • Operate and enhance our large, highly-available database infrastructure, utilizing technologies such as MySQL and Vitess.
  • Develop tools to enable self-service and self-managing capabilities of our database infrastructure so that other teams can operate full-stack while rapidly building new features for our customers.
  • Collaborate with engineering teams on their database storage needs, and advise them throughout the development lifecycle.
  • Write code to capture database performance, and create tools and dashboards to provide actionable insight into that data.
  • Participate in our on-call rotation and collaborate with our operations team to triage and resolve production issues.
  • Support FedRAMP and DoD SRG activities.
You may be a fit for this role if you:
  • Have been working in Database, Site Reliability Engineering, or infrastructure-owning teams with increasing responsibilities for 7-10+ years.
  • Have professional experience using Python, Ruby, Go, or Java
  • Operated at least one distributed data storage system, at scale and in a team environment. Some examples include: a relational database like MySQL, a search engine like Solr, or a streaming message bus like Kafka.
  • Deployed server software on Linux, and then operated it at scale. You’ve debugged its problems, and analyzed and optimized its performance.
  • Are familiar with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
  • Have experience operating cloud infrastructure, especially AWS.
  • Are a very strong communicator. You’re excited to explain complex technical concepts and share your knowledge with different audiences.
  • Write code that can be easily understood by others with an eye towards clarity and maintainability.
  • Are curious how things work; when they break you are eager and able to help fix them.
 
Infrastructure is a diverse and inclusive team that treats their colleagues exceptionally well. We are happy to help you learn what you need to know; we encourage and support each other’s growth and thus it’s not expected that you would have expertise across all of these areas.
 
Come join us!

 

 

Job region(s): Europe
Job stats:  0  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities