Sr Site Reliability Engineer
Redwood City, CA
WHAT IS BOX? Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, collaboration and workflow. We have an amazing opportunity to further establish ourselves as leaders in the space, and we need strong advocates to help us achieve that goal. By joining Box, you will have the unique opportunity to help capture a majority of this developing market and define what content management looks like for the digital enterprise. Today, Box powers over 99,000 businesses, including 70% of the Fortune 500 who trust Box to manage their content in the cloud. WHY BOX NEEDS YOU As an SRE for Box, you'll be part of a team responsible for driving the on-boarding of Box's services onto platforms like OpenStack, Kubernetes, AWS, GCP, Azure and many more public cloud vendors. In this role, you will be innovating, developing the automation, systems and processes that enable rapid prototyping by development teams, while ensuring tight alignment with our compliance and availability goals. As we evolve the infrastructure and move towards a service oriented architecture, it is imperative that we redefine how we think about operationalizing services in a scalable distributed system. You’ll be collaborating across Engineering and Product to ensure this is a core part of our software development life cycle. WHAT YOU'LL DO
- You will be constantly developing automations / tooling for any repetitive tasks you see.
- You will manage the stability and operation of several of Box's most critical production applications through application reviews, capacity planning, and performance tuning.
- You will work with cutting-edge technologies including Spinnaker, SmartStack, Kubernetes, Zookeeper, NGINX, Smartstack, Envoy, and Calico
- You will work with a variety of engineering teams to suggest and implement technical solutions for their unique challenges.
- You will improve our observability as both a developer of monitoring systems, and a mentor to our product development teams.
- You will participate in team's oncall rotation.
- You are a developer at heart with a passion to come up with innovative solutions for hard problems.
- You enjoy hands-on troubleshooting in a distributed Linux systems environment and are comfortable in tracing problems through applications, systems and networks.
- You have experience working in virtualized distributed environments.
- You have written automation/self-healing scripts in Python / BASH / RUST at scale to maintain critical applications.
- You have hands-on experience with modern cloud technologies like GCP, AWS, and Docker.
- You have experience with Spinnaker, SmartStack, Kubernetes, Zookeeper, NGINX, Envoy, and Calico.
- You have experience with deployment/configuration management tools (e.g., GIT, Jenkins, Puppet, Terraform).
- You have strong experience with the latest in monitoring and alerting best practices and tools (e.g., Sensu, Wavefront, Splunk).
- You have 8+ years of large-scale production operations experience and enjoy talking reliability engineering.
- Visit this webpage to check out all of our exciting healthcare benefits: https://join.collectivehealth.com/box
- For all other benefits, please check out: Box Benefits + Perks
Job tags: AWS Azure Bash Docker GCP Git Jenkins Kubernetes Linux Nginx OpenStack Puppet Python Reliability engineering Terraform
Job region(s): North America
Job stats: 0 0 0