Site Reliability Engineer (SRE)

Remote - U.S. logo
Apply now Apply later

Posted 3 weeks ago

We are looking for a SRE to join our growing team!

Why Splice?

Music starts at Splice. As a collection of artists, producers, creators and collaborators, Splice sweats every detail involved in the creative process. From our expansive library of content, to the tools we provide, to our company work culture, we’re constantly evolving towards being the best advocates for our artists and employees. If you work at Splice you’ll be asked, “what does music mean to you?” That’s because music is at the core of everything we do. It’s why we hire trailblazers to help us solve problems, navigate uncharted territory, and change the industry for the better. It’s why we seek out diversity in who we hire, represent and collaborate with to ensure that we’re growing towards a more inclusive and open minded reality. And it’s why we hold ourselves accountable for our part in shaping music creation.

Why Splice SRE?

We are a small team who are passionate about automation, efficient measurement, scaling of systems, monitoring and alerting, capacity planning. We’re tackling interesting problems and making our infrastructure more modern, resilient. In 2020, we’re specifically looking to employ use of ECS and Fargate in our infrastructure, consolidate and update systems, as well as explore various next gen environments work. If these are foci you might be interested in, please consider applying today!


  • Partnering with engineering teams to automate & optimize service availability, scalability, performance, monitoring & alerting
  • Educating & empowering service teams to think operationally when designing services
  • Developing and maintaining methodologies of iteratively deploying Splice’s cloud-based architecture (SOA, microservice, CI/CD knowledge)
  • Building resilient and self-scaling systems 
  • Once on-boarded with the team, taking part in a weekly 24/7 On-Call rotation 


  • In-depth knowledge of AWS, with some experience in Google Cloud and/or Azure Cloud Platforms
  • Experience with containers and container-related technologies (Docker, Kubernetes, Fargate, App Mesh, etc.)
  • Programming experience using Ruby, Node, Go, or other modern programming languages
  • Deep understanding of configuration management and automation tools (Terraform, Ansible, etc.)
  • Experience with Observability and Monitoring tools like Datadog, CloudWatch, or Prometheus
  • Experience with CI tools like Jenkins, CodeBuild, CodePipeline, Fastlane
  • Experience working in an Agile/Scrum development environment
  • Experience performing Root Cause Analysis in Production Software
  • 2+ years prior working experience in an Ops, DevOps, or SRE role

And it would be amazing if you have...

  • In-depth knowledge of Google Cloud, and/or Azure Cloud Platforms
  • Knowledge of Security best practices and procedures (Secrets Management, Threat Modeling, etc.)
  • Experience building and maintaining Stateful Infrastructure, and providing for downstream Data Sources, BI, Analytics 
  • Experience in Network Engineering, including VPC Peering, Intrusion Detection, and Networking Analytics 
  • Experience facilitating cross-functional change through an RFC process
  • Experience in enabling systems for Disaster Recovery, providing Redundancy and Resiliency in Architecture

Equal Opportunity Employer:
Splice is an equal opportunity employer, committed to diversity and inclusion. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

Job tags: Ansible AWS Azure CD CI Docker Go Jenkins Kubernetes Node Prometheus Ruby Terraform
Job region(s): North America Remote/Anywhere
Job stats:  15  0  0
  • Share this job via
  • or

More DevOps and Cloud position highlights