Watson AI Site Reliability Engineer

Krakow, Malopolskie, PL

IBM logo
Apply now Apply later

Posted 1 month ago

Ready to grow your career in the cloud? Do you like the feeling that you are making a difference?
This is your chance to be an integral part of a dynamic team of talented professionals deploying and maintaining innovative, industry-leading, cloud-based software.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE is a key role in our growing and dynamic IBM Watson Cognitive AI business on Cloud. This technical role is focused on deploying, maintaining, and automating wide ranges of operational tasks for the IBM Watson Cognitive AI services on IBM Cloud environments. You will work collaboratively with the entire cloud organization and IBM vendors to support, maintain, and operationally improve the reliability of the application.

Your Role and Responsibilities
Watson AI Site Reliability Engineer responsible for:
providing Production environment support and deployment for IBM Cloud public regions and dedicated environments.
developing SLA/SLOs for the Watson AI services by monitoring availability and taking a holistic view of system health.
driving incident management process and support a blameless post-mortems culture.
partnering with development teams to improve services via rigorous testing and release procedures.
developing automation for deployments, upgrades and self-remediation.

Required Technical and Professional Expertise
You should demonstrate a mix of experience and skills in following areas:
  • 3+ yrs experience in software engineering/development, or system operations
  • Troubleshooting in production systems
  • Cloud technologies (Docker, Kubernetes and Open Shift)
  • IBM Cloud (Bluemix) UI/CLI
  • IBM Cloud stack (IAM, CloudFoundry, ALB, Ingress, Cerberus, etc)
  • COS and ICD database services (e.g. Postgres, etcd, RabbitMQ, Redis, Elastic)
  • Networking (HTTP, DataPower, TLS, Akamai, DNS) to troubleshoot network issues
  • Automation programming (Go, Python, Node.js, JavaScript, Ruby, etc)
  • Source control (Git, GitHub) and CI/CD pipeline (Jenkins, Ghenkins, Tekton, etc),
  • Strong communication skills
  • Ability to bring back learning and improvements to troubleshooting documentation or software
  • Capability to work in a global, multicultural and diverse environment
  • Ability to work for EU shift hours (06:00-14:00 UTC March to October, 07:00-15:00 UTC November to February)
  • Ability to work weekends on rotation basis

Preferred Technical and Professional Expertise
In addition knowledge/experience in any of the following would be an adventage:
  • Experience with DevOps engineering or SRE
  • Experience with developing monitoring for production components and instrumenting code for observability using New Relic, LogDNA, Sysdig, Prometeus
  • Experience automating infrastructure, testing, and deployments using tools like Ansible, Chef, or Terraform
  • Experience with PagerDuty
  • Experience using Watson AI services

About Business Unit
IBM’s Cloud and Cognitive software business is committed to bringing the power of IBM’s Cloud and Watson/AI technologies to life for our clients and ecosystem partners around the world. IBM provides you with the most comprehensive and consistent approach to development, security and operations across hybrid environments—with complete software solutions for business and IT operations, development, data science, security, and management. Our experts and software capabilities help organizations develop applications once and deploy them anywhere, integrate security across the breadth of their IT estate, and automate operations with management visibility. With IBM, you also have access to new skills and methods, governance and management approaches, and a deep ecosystem of industry experts and partners.

Your Life @ IBM
What matters to you when you’re looking for your next career challenge?

Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.

Impact. Inclusion. Infinite Experiences. Do your best work ever.

About IBM
IBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.

Location Statement
For additional information about location requirements, please discuss with the recruiter following submission of your application.

Being You @ IBM
IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

Job tags: Ansible CD Chef CI CloudFoundry Docker Git Go JavaScript JS Kubernetes Node Node.js Postgres Python RabbitMQ Redis Reliability engineering Ruby Terraform
Job region(s): Europe
Share this job: