Sr. Site Reliability Engineer, Business Technology
Remote, United States
Slack is looking for Site Reliability Engineers to design and build solutions focused on delivering a first-in-class experience for our employees. The Business Technology Engineering team builds, runs, and supports the core infrastructure that enables Slack employees to have a simple, pleasant, and productive experience at work.
This role will primarily focus on Slack’s internal cloud infrastructure systems - everything from building scalable user access processes to providing on-call support for critical production environments.
The ideal candidate for this role will blend their knowledge of systems administration and cloud computing to build, maintain, and support internal applications and services for Slack employees, such as identity infrastructure, monitoring and alerting systems, and data ETL services.
You are a hardworking self-starter with an ability to navigate a constantly evolving landscape. You enjoy and excel at crafting innovative and scalable infrastructure as well as collaborating across functions. Technical expertise and a real passion for innovation are crucial, but you’ll balance that with a desire to improve the user experience for the products and services we deliver to the Slack community.
Slack has a positive, diverse, and encouraging culture—we look for people who are curious, creative, and work to be a little better every single day. In our work together we seek to be smart, humble, hardworking and, above all, collaborative!
What you will be doing:
- Work with the Infrastructure Engineering team to help plan, build, and maintain systems and services running within Slack’s Business Technology cloud environments, including Heroku, Amazon Web Services, and Azure.
- Provide on-call support, critical issue remediation, and incident response for core infrastructure services.
- Act as a senior systems administrator for a variety of corporate applications running on Ubuntu and CentOS virtual machines.
- Collaborate with multi-functional teams to deploy services to cloud environments. Formulate requirements and select appropriate technology and platform (eg. low-code integration, Platform-as-a-Service, or Infrastructure-as-a-Service). Provide support to ensure the solution is deployed securely and efficiently.
- Build automation to reduce workload for routine tasks and assist with security and audit requirements.
- Spec, deploy, and maintain internal tooling such as centralized logging, monitoring/alerting, and configuration management.
What you should have:
- Have extensive experience running and maintaining a variety of cloud infrastructure providers, particularly Amazon Web Services.
- Have strong familiarity with Chef, Terraform, Linux system administration, and integrating with RESTful API web services.
- Are familiar with running and maintaining Tomcat, Python/Django, and PHP web applications.
- Have experience with CI/CD tooling such as Jenkins
- Know the importance of documentation and have a track record of writing clear and concise specifications and runbooks.
- Have worked to transform environments to “Infrastructure as Code” and templated and automated routine work.
- Have set up and deployed redundant logging, monitoring, and alerting systems for critical infrastructure, such as Prometheus and Grafana.
Slack is registered as an employer in many, but not all, states. If you are not located in or able to work from a state where Slack is registered, you will not be eligible for employment. Visa sponsorship may not be available in certain remote locations.
Visa sponsorship is not available for candidates living outside the country of this position.