Site Reliability Engineer

Pune, India

Acquia, Inc. logo
Acquia, Inc.
Apply now Apply later

Site Reliability Engineer (1097)

(Monitoring, Automation & Kubernetes)

Acquia is the open source digital experience company. We provide the world's most ambitious brands with technology that allows them to embrace innovation and create customer moments that matter. At Acquia we believe in the power of community and collaboration - giving our customers the freedom to build tomorrow on their terms.
Headquartered in Boston, we have been named as one of North America’s fastest growing software companies as reported by Deloitte and Inc. Magazine, and have been rated a leader by the analyst community and named one of the Best Places to Work by the Boston Business Journal. We are Acquia. We are building for the future of the web, and we want you to be a part of it.

Site Reliability Engineering (SRE) is what you get when you treat operations as if it’s a software problem. Our mission is to improve, maintain, and provide for the software and systems behind all of Acquia’s services - with an ever-watchful eye on their availability, latency, performance, and capacity.

As an SRE, you will be working on monitoring Kubernetes, coding in Go, Yaml and Python and implementing reliable continuous deployment. You will also be given the opportunity to help refactor and integrate existing architecture for greater automation. 

As a Site Reliability Engineer, you will…

  • Work in an Agile team designing, writing and delivering software to improve the availability, scalability, latency, and efficiency of Acquia’s services.
  • Maintain an understanding of system functionality and architecture, with a strong focus on the operational aspects of the service (availability, performance, change management, emergency response, capacity planning, etc).
  • Collaborate with your team members to review their work and have your work reviewed in turn.
  • Work in a  collaborative environment where teams own and operate the services they build.
  • Influence and create new designs, architectures, standards and methods for large-scale distributed systems.

You’ll enjoy this role if you…

  • Know how to code.
  • Are curious and like solving complex challenges for scalable, low latency systems.
  • Enjoy creating software solutions for a Cloud native environment.
  • Enjoy collaborating with multiple stakeholders.
  • Have a passion for SRE, DevOps and related automation.

What you’ll need to be successful…

  • BS degree in Computer Science or related technical field, or equivalent practical experience
  • Experience writing automation using Python/Go, Terraform and Unix Shell
  • Have been involved in designing, analyzing and troubleshooting large-scale distributed systems  like Kubernetes 
  • 2+ years of SRE/DevOps and experience in delivery software into production
  • 1-2 years managing monitor, logging  and report systems, and building observability dashboards on application and server performance and scalability issues (examples: SignalFX, Sumologic, New Relic, or other observability tools)
  • Availability to work in shifts, during both India or US daytime hours
  • Understanding of security best practices
  • Experience with automation/configuration management using Ansible, Chef or Puppet
  • Experience on large scale administration of Linux servers
  • Knowledge of AWS or GCP products like EC2 or EKS/GKA/ECS
  • Ability to provide after-hours support as needed for emergency or urgent situations

 

Extra credit if you…

  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems
  • Familiarity with running web services at scale; understanding of Unix systems internals and networking
  • Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way
  • Networking: knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing)
  • Systematic problem-solving approach, coupled with a strong sense of ownership and drive
  • Familiarity with other languages a part Python or Go, like Ruby or PHP

Individuals seeking employment at Acquia are considered without regard to race, color, religion, caste, creed, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. Whatever you answer will not be considered in the hiring process or thereafter.



Job region(s): Asia/Pacific
Job stats:  1  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities