Senior Site Reliability Engineer
Remote - US
Posted 4 months ago
As a Senior Site Reliability Engineer, you will be a part of the Tanium Cloud Engineering team. We have a focus on solving cloud operations problems and keeping our services online. We are looking for individuals who are just as passionate about troubleshooting issues with distributed systems as they are to automate, code and collaborate to solve problems. Here you’ll be responsible for identifying, troubleshooting and reporting platform problems to product engineers (or fixing the code yourself) in order to ensure that we are providing a stable and reliable service.What you’ll do:
- You will report and solve problems within the Tanium infrastructure services and collaborate on issues with product engineers.
- You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
- You will monitor the Tanium Cloud platform and cloud infrastructure, responding to incidents, correcting and improving systems to prevent incidents and planning capacity.
- You will manage cloud provider infrastructure, system deployments and product releases.
- You will be involved in resolving Tanium Cloud customer support issues.
- You will demonstrate and promote best practices for teams using cloud platforms.
- You will participate in 24x365 on-call schedules.
- Bachelor's or equivalent experience.
- CS Degree preferred.
- You have at least three years of experience creating public cloud-based services with AWS, GCP or Azure.
- You have at least five years of experience in a software development role.
- You have either a) helped lead the initial deployment of a new SAAS to a public cloud (AWS, GCP or Azure) OR b) been an integral member of an established and high-functioning SRE team for a reputable cloud-hosted SAAS.
- Proven track record of designing and building commercial software products in an Agile environment.
- You have used Ansible, Puppet, Chef or another config management suite, know where it's broken, and are open to trying new alternatives.
- Experience with modern software engineering development and automation tools like Git, Jenkins, Grunt, JIRA, etc.
- Experience managing cloud based infrastructure using infrastructure as code methodologies. Preferred tooling experience; CloudFormation or Terraform.
- Believes in the power of test driven development and the need for writing automated tests as part of development.
- Deliberate and demonstrates sound judgment for balancing between rapid development and long-term code maintainability and supportability.
- Skilled debugger who can put out fires under pressure when things go wrong in production environments.
- Relentless desire to automate and build software tools.
- Have a customer-centric work approach to drive positive experiences for Tanium customers
- Proven ability to work effectively in cross-functional teams.
- Ability to work efficiently and effectively in a remote work setting.
- Motivated self-starter
Tanium offers a proven platform for endpoint visibility and control that transforms how the world's largest and most sophisticated organizations manage and secure their computing devices with unparalleled speed and agility. There’s a reason why more than half of the Fortune 100, top retailers and financial institutions, and four branches of the US Armed Forces rely on Tanium.
At Tanium, we are stewards of a culture that emphasizes the importance of collaboration, respect, and diversity. In our pursuit of revolutionizing the way some of the largest enterprises and governments in the world solve their most difficult IT challenges, we are strengthened by our unique perspectives and by our collective actions.
Our unstoppable spirit, drive to do the right thing and win as a team attitude has earned us the rank of 7th on the Forbes list of “Top 100 Private Companies in Cloud Computing” for 2019 and 10th on FORTUNE’s list of the “100 Best Medium Workplaces.”