Senior Site Reliability Engineer

Remote or San Francisco

Full Time Senior level / Expert
Netlify logo
Netlify
Apply now Apply later

Posted 1 month ago

Company Overview

At Netlify, we’re building a platform to empower digital designers and developers to build better, more elaborate web projects than ever before. We’re aiming to change the landscape of modern web development. Netlify currently serves more than 1,000,000 developers worldwide.

Netlify is a diverse group of incredible talent from all over the world. We’re ~44% woman or non-binary, and are composed of more than a fourth as many nationalities as we are team members.

We recently raised $63M in Series C funding to bring forward the next generation of tooling for a more accessible web. Among our investors are Andreessen Horowitz, Kleiner Perkins, EQT Ventures as well as the founders of GitHub, Slack, Figma  and Yelp. This latest round brings Netlify’s funding raised in total to $107M to date.

About the Opportunity:

The mission of our SRE team is to scale Netlify’s infrastructure for the next million users. Our team is dedicated to ensuring application resiliency and delivering the compute and network platform at scale. 

As a member of the SRE team, you will design, develop and deliver solutions that enhance the scalability, availability, and efficiency of our products. Our tech stack includes (but is not limited to) Kubernetes, AWS, GCP, Kafka, CDN and Golang based microservices. Whether you're a seasoned systems developer or a software developer that wants to focus on systems, we want to hear from you!

Our team consists primarily of senior level engineers that you’ll be continuing to learn from. We are a remote-first, globally distributed team and are biased towards asynchronous planning and communication, meaning less meetings and more execution. We take documentation seriously and place our values of transparency, empowerment, and commitment at the forefront of everything we do. We’re driven by passion and we make sure that everyone on the team knows their value, feels ownership over their work, and can quickly see the impact of their efforts. Beyond just hiring smart, empathetic team members, we foster a culture where there are no dumb questions and our team can get access to the resources that they need to continue to learn. As a remote-first company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Netlify is the type of company where you can balance great work with great life.

What you’ll bring: 

  • You are a software engineer at heart, with a compulsion to automate everything! 
  • Production-level experience operating Linux systems and ability to methodically diagnose system, network, and application issues
  • Experience participating in on-call rotations and have effectively coordinated incident response across globally distributed teams 
  • Passionate about creating performant and reliable systems
  • Extensive experience with HTTP, DNS, CDN and TLS and lot of hands-on exposure to at least one of the major cloud providers (Amazon, Google, Microsoft)
  • Advanced programming experience (Golang, Python  etc.)
  • You love partnering with other engineers to solve problems and are passionate about leading projects 
  • Excellent communication skills and ability to collaborate in a multi-disciplinary team 
  • Systematic problem-solving approach, coupled with a strong sense of ownership and drive
  • Understanding of MongoDB scaling and high availability strategies. Bonus if you have had experience upgrading and migrating Mongo databases
  • Experience applying SRE principles to release engineering 

A great match for our team won’t be content with filling in service support tickets, but instead, will be driven and excited about building systems and self-service portals to automate our solutions! We believe that just writing scripts to solve problems isn’t the answer, we want team members that are passionate about using their engineering chops to write robust applications and build tools that empower our engineers! With our team, you’ll be bringing your experience working on a CDN or highly available system and applying that to help us up level our reliability and observability as a company. While you’re helping take us to the next level, we’ll be providing you with the stepping stones to launch your career and help you grow. 

Within 1 month you’ll: 

  • You’ll begin the journey of understanding the complexities around our business, customer, and engineering needs. We believe strongly that it’s essential for you to take the time to become familiar with our space & how we operate! 
  • Have one-on-ones and pairing sessions with some of the people that you’ll be working closely with, including members of the Platform, Data, and Site Reliability teams. 
  • Identifying opportunities for improvement and defining a roadmap of how to solve any gaps  
  • Learn from the team during weekly syncs 

Within 3 months, you’ll: 

  • Troubleshoot issues that originate from possible bugs in the source code of our many applications
  • Develop relationships with product teams, helping define their SLAs and improve their reliability
  • Write documentation (best practice guides, RCAs, test plans, etc.)
  • Review pull requests constructively & identify performance bottlenecks
  • Be ramped up in our tech stack, make regular PRs, and contribute to tooling recommendations
  • Partner with our manager around align with your goals and passions with projects on the team roadmap
  • Make improvements through designing, building, and maintaining the core infrastructure

Within 6 months, you’ll: 

  • Work across teams to manage SLO/SLA’s
  • Create self-healing infrastructures, such as automating DNS and BGP routing changes
  • Develop applications for circuit breaking, performance testing, and workflow automation
  • Participate in helping us grow the team by conducting interviews and partnering with leadership to strategize future hiring needs 
  • Manage the release pipeline to ensure a highly resilient deployment strategy
  • Introduce new frameworks and tools to help optimize and elevate the work of the team

Within 12 months, you’ll:

  • Build capacity planning and testing frameworks
  • Migrate services between cloud infrastructure
  • Have shaped how we view reliability here at Netlify and contributed to us becoming the leader in reliability 
  • Have attended a conference with our training budget to help expand your knowledge base

At Netlify, we are a growing company that is constantly evolving so this timeline is intended to show you an example of what you can expect from the role. Keep in mind we're always iterating, learning, and growing, thus expect these guidelines to continue to evolve as we expand. We're excited for you to join us on the journey!

About Netlify: 

Of everything we've ever built at Netlify, we are most proud of our team.

We believe that empowered, engaged colleagues do their best work. We’ll be giving you the tools you need to succeed and looking to you for suggestions to improve not just in your daily job, but every aspect of building a company. Whether you work from our main office in San Francisco or you are a remote employee, we’ll be working together a lot—paring, collaborating, debating, and learning. We want you to succeed! About 60% of the company are remote across the globe, the rest are in our HQ in San Francisco.

To learn a bit more about our team and who we are, make sure to visit our about page.

Applying

Not sure you meet 100% of our qualifications? Please apply anyway!

When applying please include: A resume or short listing of your job history & skills. (A link to a LinkedIn profile would be fine). A cover letter explaining why you would enjoy working in this role and why you’d like to work at Netlify would be great, though not required & will not impact your application. When we receive your application we’ll get back to you about the next steps.

Netlify is an Equal Opportunity Employer. We are devoted to building a team of people with diverse backgrounds and lifestyles. We believe that the unique contributions of all Netlifolks is the driver of our success. We are all responsible for bringing on people from all walks of life. Driving equality empowers our team, enables us to innovate, and helps us maintain a more inclusive environment. We don’t discriminate against employees or applicants based on gender identity or expression, sexual orientation, religion, age, race, military/veteran status, citizenship, pregnancy status, or any other differences. If we can do anything to provide a better interview, i.e. accommodate a disability, then please let us know.

Job tags: AWS C GCP Golang High availability Kafka Kubernetes Linux MongoDB Python REST
Job region(s): North America Remote/Anywhere
Share this job: