Site Reliability Engineer (SRE)

United States (Remote)

Applications have closed
Pantheon logo
Pantheon

Posted 1 month ago

The Role

Pantheon is looking for an experienced Site Reliability Engineer to join our team, either remote or onsite at our SF or Minneapolis (on US hours.) We’re expanding an impressive and growing platform that powers hundreds of thousands of websites, millions of containerized resources, billions of monthly page views, and development tools that professional website developers use on a daily basis.  

Along the way, we’ve written tools to manage containers at scale, built a massive multi-tenant distributed file system, contributed to open source communities (WordPress, Drupal, Fedora, Chef, Terraform, Vault, systemd, cURL, Sensu…) and a whole lot more.

We have a lot more growing to do though! Your expertise will enable us to innovate and manage complexity as we expand our datacenter footprint, and add platform features.

Cool Stuff You'll Do

  • Improve visibility into how distributed services interact and scale in production
  • Implement shared infrastructure used by all engineering teams
  • Close collaboration with other engineering teams to deliver platform improvements and provide subject-matter-expertise for other technical initiatives
  • Continuous improvements to our standard of engineering excellence by implementing best practices for coding, testing, deploying and communication
  • Execute disaster recovery drills
  • Implement proof-of-concepts, prototypes, and systems at scale

What You Bring To The Table

  • Knowledge of large-scale, high traffic platforms and the design of scalable, robust services in the real world
  • Experience with Infrastructure as code tooling (e.g., Terraform, Chef, Puppet, Ansible, Pulumi, Vault, etc)
  • Experience programming in one or more of the following: Go, Python, Ruby, C, C++, Java, etc
  • Experience with Unix/Linux operating systems internals (e.g., filesystems, system calls, namespaces, containers)
  • Knowledge of large-scale, high traffic platforms and the design of scalable, robust services in the real world
  • Experience with analyzing and troubleshooting systems.
  • Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, IP Subnetting, and Load Balancing
  • Passion, integrity, and humor that makes our team better as a whole

Bonus points for:

  • Experience developing software services spanning multiple data centers
  • Experience with Google Cloud Platform (GCP)
  • Experience with Linux containerization technology at scale (cgroups, namespaces, podman, runc, Docker)
  • Experience with using (or choosing not to use!) Kubernetes, Cassandra, systemd, Elasticsearch, Twisted Python, Redis and MariaDB
  • Polyglot chops: we code in Go, Python, Ruby, and C
  • Diversity of thought and experience

Cool Stuff You'll Get in Return

We have all the usual benefits and perks you'd expect from a high tech company, but what we can really offer you is a fantastic work environment powered by an amazing team.

In addition, Pantheon's Engineering team contributes to many Open Source projects, including Terraform, Vault, Systemd, Sensu, and Drupal. Our core mission is based on Open Source Content Management Systems, and our stack is built upon Open Source. See more about our contributions at https://pantheon.io/open-source

And oh yeah, some other things to note:

  • fun at WordPress and Drupal community events
  • discounts on custom bicycles - the founders of Pantheon also founded Mission Bicycle
  • dog friendly office
  • fully loaded kitchen and daily catered lunches
  • monthly gym and book allowance
  • kombucha on tap and omg...did you say Its-Its are in the freezer?

Pantheon complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

 

#LI-LP1

#LI-REMOTE

Job tags: Ansible C Cgroups Chef Docker Elasticsearch GCP Go Google Cloud Platform High traffic Java Kubernetes Linux Load Balancing MariaDB Open source Puppet Python Redis Ruby Terraform Unix Vault
Job region(s): North America Remote/Anywhere
Job stats:  4  0  0