Sr. Manager, Site Reliability Engineering

Austin, Texas, United States

Full Time Senior-level / Expert
WellSky logo
Apply now Apply later

As part of the team at WellSky -Clearcare you will be joining a cross-disciplinary group of engineers and product managers working to change the way seniors age and how homecare agencies manage the business of aging. 

You will be covering SRE, DevOps Manager and CICD duties. You will report to the VP of Engineering and drive reliability, availability, security, and operational initiatives, to ensure continued efficiently delivered, end user experiences with our products. 

Your will be responsible for 25-40 people and several teams that support our site reliability, infrastructure, deployments, on-call support, and CICD.  Responsibilities include the design, development and implementation of all of the company’s cloud-based production systems.  Additionally, you will promote a strong culture of reliability, documentation monitoring and reporting by utilizing many existing tools as well as incorporating the latest technologies.    


  • Provide leadership and manage a group of SRE engineers in an innovative and fast-paced environment. 
  • Ensure 24 X 7 availability, performance, and scalability by leading the architecture, deployment, automation, maintenance, and management of mission-critical cloud-based production systems. 
  • Automate cloud services using technologies like AWS CloudFormation, Serverless, Terraform, Jenkins etc. 
  • Create and manage monitoring and alerting processes and procedures to ensure actionable alerts are generated, responded to, and eliminated before impacting customers. 
  • Drive root cause analysis and remediation during any disruption of services and consequently improve the day-to-day operations of the organization using validated problem analysis methodology and tracking all elements of the RCA to closure. 
  • Create and evolve runbooks to facilitate efficient management and troubleshooting. 
  • Optimize AWS billing, including the selection and management of Reserved Instances. 
  • Plan, implement, monitor, and test systems and procedures for best practice Business Continuity and Disaster Recovery (BCDR). 
  • Identify and follow key trends and emerging technologies that can enhance or impact the solution architecture to drive efficiency. 
  • Design and implement appropriate Agile platform architecture and DevOps processes, manage engineering tool integrations, and help define the performance engineering strategy for products. 
  • Seek to standardize the development environment and automate the integration and delivery processes to improve predictability, efficiency, and ease of maintenance. 
  • Serve as the company’s subject matter expert to support other teams for purposes of client, partner, and vendor development and relations. 
  • Guide the continued growth and success of the team through technical and professional development. 
  • Foster a culture that encourages open communication, accountability, creativity, and innovation. 


  • 7+ years of Cloud Operations related experience managing highly scalable, customer-centric web and mobile applications 
  • 4+ years of leading and managing Cloud Operations in AWS with proven technical expertise and leadership experience in driving Cloud Operations standards and procedures 
  • Experience architecting AWS infrastructure with Amazon AWS services like Aurora, DynamoDB, ElastiCache, Lambda, Step Functions, API Gateway, CloudFormation, CloudWatch , CloudTrail, S3, RedShift etc 
  • Strong understanding of networks, firewall systems, and routing protocols. 
  • Proficient with DevOps tools and environments like Jenkins, Git, Terraform. 
  • Proficient with scripting languages like Python, Shell, Bash. 
  • Experience with centralized logging services like Loggly, SumoLogic, Splunk, ELK. 
  • Experience with monitoring tools like NewRelic, Graphite, Nagios, DataDog, CloudWatch. 
  • Experience in Docker/Kubernetes deployment, configuration, scaling and management of containerized applications. 
  • Passionate about quality, performance, reliability, and scalability. 
  • Experience facilitating incident management and system recovery efforts. 
  • Bachelor of Science in Computer Science, Information Technology, or related IT field 
  • Experience with building messaging, event-driven architectures 
  • Ability to multi-task in an ambiguous and very dynamic start-up environment 
  • Demonstrated success working in short-cycle, agile, iterative development culture 
  • Effective collaboration skills with a proven ability to work cross functionally in order to establish and meet shared business goals 
  • Manage individual performance, employing ongoing performance feedback and coaching 


  • Experience scaling from startup into an enterprise company; scaling both the product and the team 
  • Experience within the Healthcare industry 

About WellSky

WellSky is a leading supplier of software and services solutions that help acute, post-acute, and human service providers improve efficiency, support business growth, and provide intelligent care to patients and people in need. WellSky is headquartered in Overland Park, KS with 1,800 teammates across the U.S., Canada, and the U.K. WellSky serves more than 20,000 client sites around the world - including the largest hospital systems, blood banks and labs, in-home care agencies, post-acute care facilities, government agencies, and human services organizations. WellSky's software and services address the continuum of health and social care - helping businesses, organizations, and communities solve touch challenges, improve collaboration for growth, and achieve better outcomes through predictive insights that only WellSky solutions can provide. Informed by 40 years of providing software and expertise, WellSky anticipates providers' needs and innovates relentlessly to help people thrive. Our purpose is to empower care heroes with technology for good, so that together, we can realize care's potential and maintain a healthy, flourishing world.

We're looking for talented individuals who want to use their skills to build a strong, technology-driven company. We offer competitive salaries, great benefits, a new Health Savings Account with a generous employer contribution and a casual and fun environment that encourages quality, creativity and excellence. Enjoy all we have to offer. We invite you to join us. Apply today!

WellSky provides equal employment opportunities to all people without regard to race, color, national origin, ancestry, citizenship, age, religion, gender, sex, sexual orientation, gender identity, gender expression, marital status, pregnancy, physical or mental disability, protected medical condition, genetic information, military service, veteran status, or any other status or characteristic protected by law. WellSky is proud to be a drug-free workplace.

Applicants for U.S. based positions with WellSky must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire. Visa sponsorship is not available for this position.

Job region(s): North America
Job stats:  0  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities