Senior Site Reliability Engineer
Senior Site Reliability Engineer(1096)
(Monitoring, Automation & Kubernetes)
Acquia is the open source digital experience company. We provide the world's most ambitious brands with technology that allows them to embrace innovation and create customer moments that matter. At Acquia we believe in the power of community and collaboration - giving our customers the freedom to build tomorrow on their terms.
Headquartered in Boston, we have been named as one of North America’s fastest growing software companies as reported by Deloitte and Inc. Magazine, and have been rated a leader by the analyst community and named one of the Best Places to Work by the Boston Business Journal. We are Acquia. We are building for the future of the web, and we want you to be a part of it.
Site Reliability Engineering (SRE) is what you get when you treat operations as if it’s a software problem. Our mission is to improve, maintain, and provide for the software and systems behind all of Acquia’s services - with an ever-watchful eye on their availability, latency, performance, and capacity.
As a Senior SRE, you will be working on monitoring Kubernetes, coding in Python or Go and implementing DevOps CI/CD. You will also be given the opportunity to help refactor and integrate existing architecture for greater automation.
As the Senior Site Reliability Engineer, you will…
- Work in an Agile team designing, writing and delivering software to improve the availability, scalability, latency, and efficiency of Acquia’s services.
- Maintain an understanding of system functionality and architecture, with a strong focus on the operational aspects of the service (availability, performance, change management, emergency response, capacity planning, etc).
- Collaborate with your team members to review their work and have your work reviewed in turn.
- Work in a collaborative environment where teams own and operate the services they build.
- Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
You’ll enjoy this role if you…
- Know how to code.
- Are curious and like solving complex challenges for scalable, low latency systems.
- Enjoy creating software solutions for a Cloud native environment.
- Enjoy collaborating with multiple stakeholders.
- Have a passion for SRE, DevOps and related automation.
What you’ll need to be successful…
- BS degree in Computer Science or related technical field, or equivalent practical experience.
- Experience writing automation using Python/Go, Terraform and Unix Shell.
- Have been involved in designing, analyzing and troubleshooting large-scale distributed systems like Kubernetes.
- 3+ years of SRE/DevOps and/or build, release & rollback experience including delivery to production.
- 2-3 years managing monitor, logging and report systems, and building observability dashboards on application and server performance and scalability issues (examples: SignalFX, Sumologic, New Relic, or other observability tools).
- Availability to work in shifts, during both India or US daytime hours.
- Understanding of security best practices.
- Experience with automation/configuration management using Ansible, Chef or Puppet
- Experience on large scale administration of Linux servers.
- Knowledge of AWS/GCP products like EC2 or EKS/ECS
- Ability to provide after-hours support as needed for emergency or urgent situations.
Extra credit if you…
- Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
- Familiarity with running web services at scale; understanding of Unix systems internals and networking.
- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way.
- Networking: knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
- Systematic problem-solving approach, coupled with a strong sense of ownership and drive.
- Familiarity with other languages a part Python or Go, like Ruby or PHP.
Individuals seeking employment at Acquia are considered without regard to race, color, religion, caste, creed, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. Whatever you answer will not be considered in the hiring process or thereafter.
Explore more DevOps, Cloud and SRE career opportunities
- Open Cloud Infrastructure Architect Jobs
- Open Staff, Product Manager - Global Infrastructure Jobs
- Open IT DevOps Engineer Jobs
- Open Manager of DevOps & Engineering Infrastructure Jobs
- Open Senior Automation Engineer Jobs
- Open Data Platform Engineer Jobs
- Open Site Reliability Engineer II Jobs
- Open DevOps Infrastructure Engineer Jobs
- Open Senior Software Engineer - Site Reliability - Toronto Hub Jobs
- Open Principal Cloud Architect Jobs
- Open Staff DevOps Engineer Jobs
- Open Reliability Engineer Jobs
- Open Sr. Site Reliability Engineer Jobs
- Open Senior DevOps Engineer - Pleasanton Hub Jobs
- Open Senior Software Engineer, DevOps Jobs
- Open Sr Software engineer (Infrastructure) Jobs
- Open DevOps Engineer - Raleigh Hub Jobs
- Open Senior Security Automation Engineer Jobs
- Open Software Development Engineer, AWS Security Jobs
- Open QA Automation Engineer - Workforce Engagement Management Jobs
- Open Senior Software Development Engineer, AWS Security Jobs
- Open Senior Devops Engineer Jobs
- Open Cloud DevOps Systems Engineer Jobs
- Open Senior Cloud Architect Jobs
- Open Solutions Architect - VMware Specialist Jobs
- Open MySQL-related jobs
- Open REST-related jobs
- Open CloudFormation-related jobs
- Open Prometheus-related jobs
- Open S3-related jobs
- Open Jira-related jobs
- Open Elasticsearch-related jobs
- Open Virtualization-related jobs
- Open High availability-related jobs
- Open Golang-related jobs
- Open Reliability engineering-related jobs
- Open EC2-related jobs
- Open VMware-related jobs
- Open Redis-related jobs
- Open JS-related jobs
- Open MongoDB-related jobs
- Open Node-related jobs
- Open Grafana-related jobs
- Open Gitlab-related jobs
- Open PostgreSQL-related jobs
- Open Jenkins-related jobs
- Open Perl-related jobs
- Open Web applications-related jobs
- Open Spark-related jobs
- Open Load Balancing-related jobs