Site Reliability Engineer - Client Management

Remote

Nava logo
Nava
Apply now Apply later

Posted 1 month ago

Nava is at the forefront of reimagining how our government serves people, and we’re looking for an experienced Infrastructure Engineer to help drive this mission forward.You'll work with our internal Nava team as well as our partners to roadmap long term, scalable solutions to issues that millions of Americans face every day.This role has two parts: an internal-facing role, and an external-facing role.
The internal-facing role involves working across multiple cross-functional scrum teams, helping to implement and improve the infrastructure practices used for our work with our government partners.
The external-facing role involves helping stakeholders to define and shape the Infrastructure practices used across the government partner agency and also providing guidance to other teams within the government partner agency on an as-needed basis.
You care deeply about working on technology that affects people’s lives, and are passionate about building and maintaining large-scale systems that are well-designed, scalable, secure, and reliable.
Role and Responsibilities
Internal-facing- Provide guidance to teams as they prepare new systems for production launch and operation- In collaboration with our partners, develop an incident response process used by scrum teams across multiple systems- Collaborate with other engineers and delivery managers to prioritize infrastructure roadmap- Work with others to develop common infrastructure standards across engineering teams related to security, release management, monitoring, and incident response.- Work with teams and stakeholders to define and run tabletop exercises- Work with others to document SRE best practices as an example for other teams throughout the government partner agency.
External-facing- This role will involve extensive collaboration with various stakeholders within state government. You'll need to co-write recommendations and proposals on how to set up infrastructure and teams in a way that is secure, reliable, and maintainable long-term, with clear delineations of roles and responsibilities. You'll need to be able to advocate for these solutions — and in some cases persuade skeptics — before starting implementation, which will often involve kicking off process changes for existing teams and organizations.- Work with our partners to build technical support capacity for applications- Consult with stakeholders and other teams on an as-needed basis on how to properly configure and work with monitoring, logging and alerting systems - Consult with stakeholders and other teams on how to setup processes, such as an incident response and escalation process, across multiple vendors and agencies- Speak to stakeholders as the main SRE representative on operational support and practices- Proactively socialize, iterate on, and generate buy-in for SRE initiatives with the team, government partners, and other contractors

Skills and Attributes

  • A deep understanding of production systems and the operational model around them, especially with regards to reliability, security, and incident response.
  • Excellent written and verbal communication skills, technical and otherwise
  • Ability to communicate complex technical topics to a range of audiences, from highly technical to non-technical
  • Ability to develop clear, repeatable processes, and to produce documentation and runbooks that are accessible to a range of audiences
  • Experience cultivating strong relationships with external stakeholders and partners
  • Experience with multi-vendor environments, with a bonus for experience within government contracting or professional service firms
  • Adaptable and collaborative problem solving skills for working with several organizations, processes, cultures, and technologies
  • Solid SRE instincts and an ability to create order from divergent needs
  • Ability to work with incomplete information and to form consensus on solutions
  • Ability to collaborate with other infrastructure engineers on an existing codebase
  • Experience with the following systems a plus: Amazon Web Services, Terraform, New Relic, Splunk, Docker (ECS Fargate), Lambda, alerting systems (e.g. PagerDuty, VictorOps, OpsGenie)
We're a small team working to radically improve our government, so everyone that joins us has a direct impact on the direction and success of Nava. We are stewards – we hold a deep responsibility towards the systems that we work with. We are a community – we value collaboration both within our teams and with the many hardworking people within government. We offer generous compensation and equity packages and value a healthy work/life balance.We care deeply about diversity and inclusion at Nava. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.Note that we participate in E-Verify and upon hire, will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S.
We're a small team working to radically improve our government, so everyone that joins us has a direct impact on the direction and success of Nava. We are stewards – we hold a deep responsibility towards the systems that we work with. We are a community – we value collaboration both within our teams and with the many hardworking people within government. We offer generous benefits and equity.
We care deeply about diversity and inclusion at Nava. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Note: We participate in E-Verify. Upon hire, we will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S. This role is required to work from the contiguous United States.
Job tags: Docker Lambda Terraform