Site Reliability Engineer (SRE)

Remote - U.S.

Full Time logo
Apply now Apply later

Posted 1 month ago

Site Reliability Engineer (SRE)

We are looking for a SRE to join our growing team!

Why Splice?

We’re building a creative ecosystem for music producers. With this ecosystem, we’re cultivating a global community of creators that fosters inspiration, connection, focus, and growth.

Our work environment is no different. We champion collaboration, big ideas, helping where we can and asking for assistance when we need it. We aim for steady, measured expansion through experimentation and iteration. We encourage optimism, inclusion, and transparency in the workplace. We aren’t afraid to stumble, because every stumble can teach us something about our processes, strategies, and even ourselves.

We don’t just hire people who mirror our culture. We hire people who add to it.

Why Splice SRE?

We are a small team who are passionate about automation, efficient measurement, scaling of systems, monitoring and alerting, capacity planning. We’re tackling interesting problems and making our infrastructure more modern, resilient. In 2020, we’re specifically looking to employ use of ECS and Fargate in our infrastructure, consolidate and update systems, as well as explore various next gen environments work. If these are foci you might be interested in, please consider applying today!


  • Partnering with engineering teams to automate & optimize service availability, scalability, performance, monitoring & alerting
  • Educating & empowering service teams to think operationally when designing services
  • Developing and maintaining methodologies of iteratively deploying Splice’s cloud-based architecture (SOA, microservice, CI/CD knowledge)
  • Building resilient and self-scaling systems 
  • Once on-boarded with the team, taking part in a weekly 24/7 On-Call rotation 


  • In-depth knowledge of AWS, with some experience in Google Cloud and/or Azure Cloud Platforms
  • Experience with containers and container-related technologies (Docker, Kubernetes, Fargate, App Mesh, etc.)
  • Programming experience using Ruby, Node, Go, or other modern programming languages
  • Deep understanding of configuration management and automation tools (Terraform, Ansible, etc.)
  • Experience with Observability and Monitoring tools like Datadog, CloudWatch, or Prometheus
  • Experience with CI tools like Jenkins, CodeBuild, CodePipeline, Fastlane
  • Experience working in an Agile/Scrum development environment
  • Experience performing Root Cause Analysis in Production Software
  • 2+ years prior working experience in an Ops, DevOps, or SRE role

And it would be amazing if you have...

  • In-depth knowledge of Google Cloud, and/or Azure Cloud Platforms
  • Knowledge of Security best practices and procedures (Secrets Management, Threat Modeling, etc.)
  • Experience building and maintaining Stateful Infrastructure, and providing for downstream Data Sources, BI, Analytics 
  • Experience in Network Engineering, including VPC Peering, Intrusion Detection, and Networking Analytics 
  • Experience facilitating cross-functional change through an RFC process
  • Experience in enabling systems for Disaster Recovery, providing Redundancy and Resiliency in Architecture

Equal Opportunity Employer:
Splice is an equal opportunity employer, committed to diversity and inclusion. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

Job tags: Ansible AWS Azure CD CI Docker Go Kubernetes Node Prometheus Ruby Terraform
Job region(s): North America Remote/Anywhere
Share this job: