Watson AI Site Reliability Engineer
Bangalore, Karnataka, IN
Software Developers at IBM are the backbone of our strategic initiatives to design, code, test, and provide industry-leading solutions that make the world run today - planes and trains take off on time, bank transactions complete in the blink of an eye and the world remains safe because of the work our software developers do. Whether you are working on projects internally or for a client, software development is critical to the success of IBM and our clients worldwide. At IBM, you will use the latest software development tools, techniques and approaches and work with leading minds in the industry to build solutions you can be proud of.
Your Role and Responsibilities
Ready to grow your career in the cloud? Do you like the feeling that you are making a difference?
This is your chance to be an integral part of a dynamic team of talented professionals deploying and maintaining innovative, industry-leading, cloud-based software.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE is a key role in our growing and dynamic IBM Watson Cognitive AI business on Cloud. This technical role is focused on deploying, maintaining, and automating wide ranges of operational tasks for the IBM Watson Cognitive AI services on IBM Cloud environments. You will work collaboratively with the entire cloud organization and IBM vendors to support, maintain, and operationally improve the reliability of the application.
Watson AI Site Reliability Engineer responsible for:
- Providing Production environment support and deployment for IBM Cloud public regions and dedicated environments.
- Developing SLA/SLOs for the Watson AI services by monitoring availability and taking a holistic view of system health.
- Driving incident management process and support a blameless post-mortems culture.
- Partnering with development teams to improve services via rigorous testing and release procedures.
- Developing automation for deployments, upgrades and self-remediation.
Required Technical and Professional Expertise
- 4+ Years of experience in IT Industry
- Experience with software engineering, software development, or system operations
- Experience with troubleshooting issues in production systems
- Experience with cloud technologies such as Docker, Kubernetes and Open Shift
- Experience working with IBM Cloud (Bluemix) UI/CLI
- Knowledge of IBM Cloud stack (IAM, CloudFoundry, ALB, Ingress, Cerberus, etc)
- Knowledge of COS and ICD database services (e.g. Postgres, etcd, RabbitMQ, Redis, Elastic,)
- Knowledge of Networking (HTTP, DataPower, TLS, Akamai, DNS) to troubleshoot network issues
- Hands-on experience using source control (Git, GitHub) and CI/CD pipeline (Jenkins, Ghenkins, Tekton, etc),
- Strong communication skills - ability to communicate (often via slack and webex) observations and ideas for diagnosing and preventing issues or improving SRE processes to shorten diagnosis and resolution.
Preferred Technical and Professional Expertise
- Experience with DevOps engineering or SRE
- Experience with developing monitoring for production components and instrumenting code for observability using New Relic, LogDNA, Sysdig, Prometeus
- Experience automating infrastructure, testing, and deployments using tools like Ansible, Chef, or Terraform
- Experience with PagerDuty
- Experience using Watson AI services
About Business Unit
IBM’s Cloud and Cognitive software business is committed to bringing the power of IBM’s Cloud and Watson/AI technologies to life for our clients and ecosystem partners around the world. IBM provides you with the most comprehensive and consistent approach to development, security and operations across hybrid environments—with complete software solutions for business and IT operations, development, data science, security, and management. Our experts and software capabilities help organizations develop applications once and deploy them anywhere, integrate security across the breadth of their IT estate, and automate operations with management visibility. With IBM, you also have access to new skills and methods, governance and management approaches, and a deep ecosystem of industry experts and partners.
Your Life @ IBM
What matters to you when you’re looking for your next career challenge?
Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.
Impact. Inclusion. Infinite Experiences. Do your best work ever.
IBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.
For additional information about location requirements, please discuss with the recruiter following submission of your application.
Being You @ IBM
IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Job region(s): Asia/Pacific