Senior Site Reliability Engineer

New York City - NY, Remote

Full Time Senior-level / Expert
Hyperscience logo
Hyperscience
Apply now Apply later

Posted 2 weeks ago

Company DescriptionHyperscience is a technology company blazing a new path in enterprise automation with a reimagined approach to building and powering processes. The Hyperscience Platform is the world's first Software-Defined, Input-to-Outcome Automation platform used by top public companies and government organizations around the world to build and run mission-critical processes with ease and speed.
Hyperscience helps enterprises quickly build and roll out new business processes with built-in automations, reduce manual errors, increase high- and low-skilled employee productivity, and eliminate the need for costly transformation. Hyperscience’s Intelligent Document Processing solution has been implemented at some of the world's leading financial services, insurance, healthcare and government organizations, including TD Ameritrade, QBE Insurance Group Limited and Voya Financial, helping them lower costs, reduce error rates by 67% and increase employee capacity by 10x.
Since its founding in 2014, Hyperscience has grown to more than 250 employees with offices in New York City, Sofia, Bulgaria, and London, UK, and has consistently been recognized as one of the best places to work, with a collaborative and innovative culture and best-in-class benefits.
As the Site Reliability Engineer, you will lead capabilities that are critical to Hyperscience’s success and the success of our clients.Hyperscience serves our customers on-premise (including client-managed private clouds) and is currently building out our Cloud SaaS offering. Our SRE Team is responsible for defining the technology stack, tooling, automation, standards and practices for the SRE capability focused on our clients’ operations and infrastructure. 
You will collaborate with our CloudOps team on our SaaS and general Cloud incident management, including definition and adherence to effective SLAs. You will partner with our application engineering teams to ensure alignment with the Product roadmap and technical roadmap.
For our on-premise customers, you will execute on standards, practices, and processes for on-premise incident management and escalation, including Tier 2 and Tier 3 support, as well as an SRE roadmap for tooling and automation to make on-premise operations and serviceability simple for our customers. You will partner with the Customer Success (CX) team to ensure effective collaboration in support of our clients.
As with Google’s definition of SRE, our focus is on automation. This means that strong software development and scripting practices and capabilities are central to the team.
This is an exciting time for Hyperscience’s product and business. You will have the opportunity to influence and deliver on a bold vision for transforming the way organizations model and execute their business processes, and there will be many opportunities for growth along the way.

Responsibilities

  • Partner with our product management and software engineering teams to implement services and utilities that help our Customer Experience teams drive better customer satisfaction across incidents, escalations, and operations.
  • Respond to technical support escalations for troubleshooting and diagnosing advanced technical issues, and provide solutions to restore Hyperscience services levels in accordance to defined SLAs.
  • Act as an incident response commander for high severity issues by organizing and driving the cross-functional incident response team to recover from service degradation and deliver remediation in a fast-paced environment. 
  • Lead postmortem analyses to maintain high transparency across the organization regarding the process of incident handling and drive continuous improvement.
  • Partner with Customer Experience and Product Engineering teams to evolve Hyperscience products and services reliability, serviceability and operations, for both on-premise deployment and our Cloud based  SaaS offering.
  • Align with Product Engineering teams to establish SLIs and SLOs for their feature components, continuously evolving our thresholds holds and metrics to meet business requirements. 
  • Influence designs, architectures, standards and methods for large-scale distributed systems for reliability, scalability and serviceability, collaborate with a world-class engineering team to propose features that solve recurring patterns of customer complaints.
  • Build and maintain a site reliability engineering toolbox in order to provide automated utilities to proactively help operations and technical support teams to sustain a high level of customer satisfaction.

Qualifications

  • Bachelor’s degree in software engineering, computer science, computer engineering, or related technical field OR equivalent practical experience
  • 5+ years of experience related to site reliability engineering
  • Strong troubleshooting and analytical skills 
  • Experience in full stack development with Python/Django and Javascript/Typescript/React
  • Strong experience with a SRE tech stack including tooling and automation strategy and implementation to improve incident response effectiveness across Engineering and CX
  • Experience in operations, including automation, monitoring, alerting, and incident management
  • Experience in database systems (SQL and NoSQL) troubleshooting, performance tuning, development and migration
  • Experience with cloud infrastructure, ideally AWS or Azure
  • Experience with both SaaS products and on-premise delivery
  • Excellent verbal and written interpersonal and teamwork communication skills

Nice to haves

  • Experience with agile development methodologies

Location

  • Ideally this position is in New York City, but working remotely elsewhere on the US East Coast is an option.
Benefits- Top notch healthcare for you and your family- 30 days of paid leave annually to help nurture work-life symbiosis- A 100% 401(k) match for up to 6% of your annual salary- Stock Options- Wellness stipend- Pre-tax transportation and commuter benefits- 6-month parental leave (or double salary to pay for your partner's unpaid leave)- Free travel for any person accompanying a breastfeeding mother and her baby on a business trip- A dependent care stipend up to $3,000 per month, per child, under the age of 21 for a maximum of $6,000 per month total- Daily catered lunch, snacks, and drinks- Budget to attend conferences, train, and further your education- $1,000 one-time-use WFH stipend and $75 monthly WFH stipend- Relocation assistance
We are an equal opportunity employer. We welcome people of different backgrounds, experiences, abilities and perspectives. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status.
Job tags: AWS Azure Django JavaScript Python React Reliability engineering SQL
Job region(s): North America Remote/Anywhere
Job stats:  2  0  0
Share this job: