Software Engineer, Site Reliability
The Interoperability team in Slack makes using productivity tools in Slack a much more pleasant and valuable experience for all our users. The team evolves to meet the various changing demands of Slack’s ever growing business and we’re on the lookout for new engineers to join the team.
The Interop SRE group is a small team embedded in the Interop(-erability) team that supports the day to day operation of services and also acts as an advisor to all the engineers for making the services more reliable. We are a team based entirely out of the India time zone, with some team members working from the Pune office, and some working remotely. We work with the underlying infrastructure teams as well as various product engineers to deploy services in Slack’s ecosystem of Kubernetes, Nebula, Envoy and Vitess. We drive the performance and operability of these services, and often act as first consumers of the latest infrastructure technologies at Slack.
Slack has transformed business communication. It’s the leading channel-based messaging platform, used by millions to align their teams, unify their systems, and drive their businesses forward. Only Slack offers a secure, enterprise-grade environment that can scale with the largest companies in the world. It is a new layer of the business technology stack where people can work together more effectively, connect all their other software tools and services, and find the information they need to do their best work. Slack is where work happens.
Ensuring a diverse and inclusive workplace where we learn from each other is core to Slack’s values. We welcome people of different backgrounds, experiences, abilities and perspectives. We are an equal opportunity employer and a pleasant and supportive place to work.
Come do the best work of your life here at Slack.
About the Role
What you will be doing
- Working on impactful projects within a small, agile team to advance Interop’s Calendar, Email, Calls, Files and other Productivity services.
- Developing on Slack’s underlying Cloud (AWS, Kubernetes, Terraform, Chef), Logging (ELK), Monitoring (Prometheus+Grafana) and other associated services.
- Participating in an on-call rotation for the services the team owns, triaging and addressing production as well as development issues
- Engaging with Slack’s engineering community to identify potential areas of improvement or pain points and making Slack’s systems safer and more pleasant to operate.
What you should have
- Curiosity about how things work and love to share that knowledge with others
- Familiarity with software engineering practices including unit testing, code reviews and design documentation
- Ability to build tooling, automation and/or services in one or multiple languages (e.g. Go, Bash, Java)
- An interest in building internal products with other engineers as your customers
- Experience deploying and operating services in a Cloud environment
- An understanding of fundamental Linux concepts
You probably tick some of these boxes:
- At least 3 years of professional experience in software engineering, working in a team environment
- Know your way around with one or more programming languages -- e.g., Python, Go, Ruby, or others
- About 2 years of experience deploying, operating and debugging server software on Linux
- Experience building and releasing software using Docker containers
- Experience with Kubernetes
- Experience with a large cloud provider (AWS preferred)
- Understanding of deployment automation/configuration management tools -- e.g, Chef, Terraform, Ansible, CloudFormation or others