Principal Site Reliability Engineer

Remote, US

Applications have closed
Segment.io, Inc. logo
Segment.io, Inc.

Posted 1 month ago

At Segment, we believe companies should be able to send their data wherever they want, whenever they want, with no fuss. Unfortunately, most product managers, analysts, and marketers spend too much time searching for the data they need, while engineers are stuck integrating the tools they want to use. Segment standardizes and streamlines data infrastructure with a single platform that collects, unifies, and sends data to hundreds of business tools with the flip of a switch. That way, our customers can focus on building amazing products and personalized messages for their customers, letting us take care of the complexities of processing their customer data reliably at scale. We’re in the running to power the entire customer data ecosystem, and we need the best people to take the market.    The Infrastructure Engineering group is central to Segment’s Platform strategy. The ecosystem of tools that your team creates and supports are the foundation for the services built by Product teams. In order to maintain our leadership position in the customer engagement space we must continue to build innovative services that support our developers in seamlessly delivering value to customers.  You will partner with some of the brightest minds in the industry to push the boundaries of web-scale service delivery.   As a member of the Site Reliability Engineering (SRE) team, you’ll help to empower our entire R&D organization. Alongside a diverse distributed Infrastructure group you’ll participate in building the next iteration of our service platform; focusing on the reliability, operability, observability, flexibility, and cost-effectiveness of our production infrastructure.

What you’ll do

  • Write software to build, maintain, automate, and introspect our production systems
  • Mentor teams to reliably and cost effectively operate and maintain their services
  • Build the next version of Segment’s Service Platform (focused on deployment and observability) to support teams in deploying hundreds of services across a multi-region cloud environment
  • Take proactive steps to improve our availability, reliability, and efficiency
  • Participate in driving Segment as a market leader in the development of Open Source Software like kafka-go, chamber, kubeapply, etc.
  • Participate in an on-call rotation to support our business-critical infrastructure

What you’ll bring

  • Minimum of 5 years experience as a Software Engineer, Systems Administrator, Operations Engineer, Site Reliability Engineer, or another similar role
  • A systematic problem-solving approach, coupled with good communication skills, sense of ownership, and drive
  • Experience operating large-scale, distributed systems on top of cloud infrastructure such as Amazon Web Services (AWS) or Google Compute Platform (GCP)
  • Experience programming in one or more of the following: Go, Python, Node.js, Bash, or similar languages
  • A proven grasp of Linux systems administration and programming concepts

We’re especially excited about candidates who:

  • Have hands-on experience with container orchestration frameworks (e.g. Kubernetes, EKS, ECS)
  • Have hands-on experience in operating event-based systems (e.g. Kafka) capable of processing millions of events per second and petabytes of data each month
  • Possess a broad understanding of the Linux kernel internals and networking protocols
  • Are proficient in metrics tooling such as Datadog and Prometheus
  • Have lead teams, large projects, or been the owner of an important system
  We encourage you to apply if this role excites you - even if you think you may not meet all of the qualifications. At Segment we live by four values: karma, drive, tribe and focus. We are always looking for outstanding individuals with diverse backgrounds and perspectives who embody these values. To learn more about life at Segment and our commitment to diversity, equity, and inclusion, visit our LinkedIn page. We’re excited to meet you!   Segment is an equal opportunity employer. We believe that everyone should receive equal consideration and treatment in all terms and conditions of employment regardless of sex, gender (including pregnancy, childbirth, breastfeeding or related medical conditions), sexual orientation, gender identity, gender expression, race, color, religion, creed, national origin, ancestry, age (over 40), physical disability, mental disability, medical condition, genetic information, marital status, domestic partner status, military or veteran status, height, weight, AIDS/HIV status, and any other protected category under federal, state or local law. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.   #LI-Remote
Job tags: AWS Bash GCP Go JS Kafka Kubernetes Linux Node Node.js Open source Prometheus Python Reliability engineering
Job region(s): North America Remote/Anywhere
Job stats:  2  0  0