Site Reliability Engineer

Dallas, Texas, US

IBM logo
IBM
Apply now Apply later

Posted 3 weeks ago


At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk.


The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and industrial research capabilities, no other company is as well positioned to address the full opportunity of cloud computing.

We are looking for a Senior Site Reliability Engineer in Dallas, Texas to join our IaaS SRE Team, who is responsible for the availability and reliability of our cloud platform and ensures that it meets the requirements of our internal and external users. The team is dedicated to ensuring that IBM Cloud is at the forefront of cloud technology, using software and systems engineering to build and maintain large-scale, massively distributed, fault-tolerant systems. We are enhancing IBM's cloud platforms to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.

In this Senior Site Reliability Engineer role, you will work closely with the entire Cloud organization and IBM vendors to support, maintain and operationally improve the cloud infrastructure. You focus on the following key responsibilities:

  • Accountable for maintaining the reliability, stability, and efficiency of the platform

  • Balance feature development velocity and reliability with well-defined SLOs

  • Run the Production environment by monitoring availability and taking a holistic view of system health

  • Partner with development teams to improve services via rigorous testing and release procedures

  • Participate in system design consulting, platform management, and capacity planning

  • Proactively influence architecture of the platform so stability and reliability are maintained

  • Create sustainable systems and services through automation.

  • Ensure systems are Secure and Compliant




  • Minimum of 10 years' experience in hands-on production administration of large system environment

  • Experience in establishing, following, and improving upon procedures within a mission critical environment

  • Experience with algorithms, data structures and software design

  • Experience with operating systems internals and networking

  • Experience with distributed systems design, maintenance, and troubleshooting

  • Hands-on experience with debugging and optimizing code, as well as automation

  • Excellent written and verbal communication skills

  • Comfortable operating in a fast-paced environment

  • 5+ years' experience with software development and version control

  • 4+ years’ experience with cloud and virtualization environments

  • 4+ years' experience with configuration management systems (Ansible/Chef) and log analytics (Splunk/ELK)




  • Proven experience in driving the stability of cloud platforms




Digitization is accelerating the ongoing evolution of business, and clouds - public, private, and hybrid - enable companies to extend their existing infrastructure and integrate across systems. IBM Cloud provides the security, control, and visibility that our clients have come to expect. We are working to provide the right tools and environment to combine all of our client’s data, no matter where it resides, to respond to changing market dynamics.


What matters to you when you’re looking for your next career challenge?

Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.

Impact. Inclusion. Infinite Experiences. Do your best work ever.


IBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.


For additional information about location requirements, please discuss with the recruiter following submission of your application.


IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.








Job tags: Ansible Chef ELK Virtualization
Share this job: