Site Reliability Engineer (Service Mesh)

Redmond, WA, United States

Full Time
SpaceX logo
SpaceX
Apply now Apply later

Posted 1 week ago

SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.

SITE RELIABILITY ENGINEER (SERVICE MESH)  

SpaceX is looking for an experienced Site Reliability Engineer to operate and scale custom-built mission-critical products for engineering, test, and satellite fleet operations. Examples of these products include vehicle command interfaces, automated data analysis systems, continuous integration systems for satellite and simulation software, test infrastructure, vehicle communications proxies, and vehicle configuration sign-off tools, among others.

Specifically, the Starlink SRE team is responsible for implementing and scaling the Kubernetes platform, including the Istio service mesh, to support the ground control plane for Starlink, as well as working with the software engineers to ensure that the software being deployed is reliable and has a stable platform on which to run. This SRE hire will bring Istio experience to the table with the ultimate goal of commoditizing Istio across the Starlink and SpaceX enterprise while disseminating knowledge and determining best-practices.

We have no shortage of hard problems and challenges. The ideal candidate will be flexible, possess broad skills across product operations and software development, and flourish in a fast-paced and challenging environment.  He or she should be a self-starter, self-motivator and possess ingenuity to excel at this position.

RESPONSIBILITIES:

  • Mature and scale Istio service mesh solution
  • Deploy, upgrade, operate/maintain, and scale our suite of mission critical products and services
  • Closely collaborate with Software Engineers to create highly operable and maintainable products
  • Manage the underlying infrastructure in collaboration with IT
  • Engage in and improve the whole lifecycle of services -- from inception and design, through deployment, operation and refinement
  • Practice sustainable incident response and blameless postmortems
  • Provide end-user support to engineering for products

BASIC QUALIFICATIONS:

  • 3+ years of Site Reliability or DevOps type experience
  • 3+ years of experience with Linux operating systems
  • Experience with Terraform, Ansible, or other automation frameworks
  • Automation skills in bash, Python, and/or other languages
  • Experience with Istio or other service mesh platform

PREFERRED SKILLS AND EXPERIENCE:

  • Bachelor's degree in computer science, information systems/IT or engineering
  • 5+ years of Systems Administration, Site Reliability Engineering, or DevOps experience
  • 3+ years of experience with Python and Python-based development frameworks
  • Strong understanding of Docker, Vagrant, and Kubernetes, or similar technologies
  • Strong understanding of virtualization and hypervisor technologies
  • Understanding of databases and data modeling
  • Experience with automatically managing dozens or hundreds of servers
  • Focus on performance bottlenecks and performance improvement techniques
  • Experience with workflow and issue management tools such as JIRA
  • Strong networking knowledge of TCP/IP
  • Must be comfortable working with mission critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
  • Excellent communications skills with the ability to communicate with customers, peers, management etc. in both formal and informal situations

ITAR REQUIREMENTS:

  • To conform to U.S. Government space technology export regulations, applicant must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C. 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State. Learn more about ITAR here.

SpaceX is an Equal Opportunity Employer; employment with SpaceX is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.

Applicants wishing to view a copy of SpaceX’s Affirmative Action Plan for veterans and individuals with disabilities, or applicants requiring reasonable accommodation to the application/interview process should notify the Human Resources Department at (310) 363-6000.

Job tags: Ansible Bash C Docker HTML Jira Kubernetes Linux Python Reliability engineering Terraform Virtualization