Site Reliability Engineer - Cloud

Bangalore, India

Applications have closed
Couchbase, Inc. logo

Couchbase, Inc.

Couchbase is the modern NoSQL cloud database for enterprise applications. Develop with agility. Perform at any scale. Manage with ease.

View all employer listings

Find more jobs like this

This role will have the primary accountability of designing, implementing, and operating Couchbase’s Cloud platforms. Golang knowledge is a huge plus! The team operates with a “run what you write” philosophy and each engineer is responsible for deploying and operating the code they write. 
A successful candidate must have demonstrable experience in at least one programming language (preferably Go), previous work in SaaS application development and operations. You will be working closely with the Support and Development team on the architecture and configuration of our AWS hosted infrastructure. You will be responsible to ensure the environment is built, deployed, configured, managed, and monitored correctly to support the business. You will drive decisions on the correct usage of cloud resources, troubleshoot performance issues, and ensure the highest level of reliability for the platform by tuning the environment for maximum scalability, cost efficiency, and security. Candidates must have experience developing and maintaining applications running on large public cloud platforms - ideally  AWS, Azure, and GCP. This role is also open to remote work (USA, UK, India & Canada) as our teams are globally distributed. We are a remote-first team. Prior experience working remotely is not required, however, we are looking for team members who perform well given a high level of independence and autonomy and will establish a cadence of on-time delivery with high-quality work.
This role is also open to remote work as our teams are globally distributed. We are a remote-first team. 


  • Design, deploy and maintain the requirements of a large scale cloud platform with a focus on the key pillars of the cloud: Reliability, Operational excellence, Security & Performance
  • Cost Optimization
  • Own and be responsible for best practice use of our cloud ecosystem from the cloud infrastructure through to the use of our application
  • Passionate about automating everything and proficient in at least one of the following languages (Golang, Python, Ruby) 
  • Understand why using infrastructure as code to efficiently provision infrastructure and services is the only way to build and maintain a large-scale cloud platform
  • Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, and other similar tools
  • Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, operating environment, network, and application
  • Evaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plans
  • Troubleshoot and solve customer issues on production deployments
  • Ensure that SLAs are met in executing operational tasks
  • Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code
  • Conduct periodic on-call duties
  • Working knowledge of information security issues
  • Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)


  • 3- 5 years related professional experience
  • Experience with alerting and monitoring tools like DataDog, Prometheus, Grafana & PagerDuty
  • Experience setting up CI/CD pipelines from scratch, preferably CircleCI and Spinnaker
  • Experience defining SLA, SLO, and SLI for a service
  • Public cloud provider certifications are great to have
  • Strong experience with Infrastructure as Code and Configuration Management tools. Preferably Terraform
  • Demonstrable experience of methods to promote the correct use of cloud platforms with multiple layers of abstraction and responsibility.
  • Experience using Kubernetes
  • Experience with automation tools/platforms
  • Experience working with NoSQL databases is a plus
  • Experience working in a highly distributed company is a plus
  • Experience writing backend applications is not required but definitely a plus
  • Align a portion of your day with the business hours of Pacific Time Zone - UTC -8
In three months, you have become the cloud SRE responsible for site availability, security, latency, system health, customer accounts, and billing. You’ll have taken on independent code review responsibilities and are collaborating on the design of new features
In six months, you have earned the trust of the team and are delivering tasks through the entire SDLC, from design through development with minimal guidance, and are helping to effectively mentor new engineers joining the team
In twelve months, you have established a cadence of predictable, on-time delivery without cutting corners
About CouchbaseAt Couchbase, we believe data is at the heart of the enterprise. We empower developers and architects to build, deploy, and run their most mission-critical applications. Couchbase delivers a high-performance, flexible and scalable modern database that runs across the data center and any cloud. Many of the world’s largest enterprises rely on Couchbase to power the core applications their businesses depend on. 
As a 2021 Bay Area Best Places to Work winner, Couchbase recognizes the need for time off when you need it. Enjoy unlimited time off (DTO), matching 401K contributions, ESPP and many other amazing benefits. See more of our recent awards to learn what makes Couchbase such a great company to work at. 
Learn more about Couchbase and our technical capabilities:
* Compare Couchbase vs. MongoDB * Compare Couchbase vs. Oracle * Browse the Developer Portal

Couchbase is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status or any other characteristic protected by law.
By using this website and submitting your information, you acknowledge our Candidate Privacy Notice and understand your personal information may be processed in accordance our Candidate Privacy Notice.
Job region(s): Asia/Pacific
Job stats:  2  0  0

Explore more DevOps, Cloud and Digital Infrastructure career opportunities