GNC Site Reliability Engineer
Hawthorne, CA, United States
SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.
GNC SITE RELIABILITY ENGINEER
SpaceX is looking for a site reliability engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). The GNC team performs trajectory design and vehicle simulation and participates in recurring mission critical launch operations. This position will work with the GNC team to maintain and improve a set of GNC-focused tools. Examples of these products include Monte Carlo simulations on a high-performance computing cluster, automated data analysis systems, continuous integration systems for rocket and simulation software, GNC analysis infrastructure, and vehicle configuration verification tools. The ideal candidate will be flexible, possess broad skills across product operations and software development, and flourish in a fast-paced and challenging environment.
- Deploy, upgrade, operate/maintain, and scale a suite of mission critical GNC products and services
- Provision and maintain virtual and physical servers
- Work with SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster.
- Closely collaborate with GNC software engineers to create highly operable and maintainable products
- Add monitoring for webapps and respond to outages
- Manage the underlying computational infrastructure of GNC in collaboration with IT
- Engage in and improve the whole lifecycle of services: from inception and design, through deployment, operation and refinement
- Make recommendations for future hardware purchases
- Practice sustainable incident response and postmortems
- Provide end-user support to GNC engineering for products by becoming an expert on analysis applications and support users in troubleshooting and pointing to features
- Configure automated deployment pipelines for webapps
- Develop or improve GNC webapps and tools for better usability, maintainability, and robustness
- Demo and document new software changes such as operating system upgrades, shared filesystem changes, or major tool rollouts
- Bachelor’s degree in computer science, information systems/IT, computer engineering, electrical engineering, math, or scientific discipline
- 2+ years of site reliability or DevOps experience
- 2+ years of experience with Linux operating systems
- 2+ years of professional experience with Python and Python based development frameworks
- Experience performing automation tasks in shell, bash or Python
- Experience with version control in Git, continuous integration, and continuous delivery concepts
PREFERRED SKILLS AND EXPERIENCE:
- 5+ years of systems administration, site reliability engineering, or DevOps experience
- 5+ years of experience with Python and Python-based development frameworks
- 5+ years of Linux experience
- Expertise with Docker, Vagrant, and Kubernetes or similar technologies
- Extensive Experience with configuration management tools such as Ansible, Puppet, Terraform
- Experience with build systems (Make, Bazel / Pants / Buck, Gradle) and package management tools (pip, npm)
- Strong understanding of virtualization and hypervisor technologies
- Understanding of databases and data modeling
- Experience with Ansible, or other configuration management frameworks
- Experience with automatically managing dozens or hundreds of servers
- Strong networking knowledge of TCP/IP
- Experience scaling web applications and optimizing applications for performance
- Solid understanding of UI/UX design to provide intuitive applications
- Experience with high performance computing systems or large-scale data analysis systems
- Deep understanding of one or more modern persistency systems
- Excellent communications skills with the ability to communicate with customers, peers, management etc. in both formal and informal situations
- Focus on performance bottlenecks and performance improvement techniques
- Great creative and innovative problem-solving skills
- Initiative and the ability to work independently and in a team
- Must be comfortable working with mission critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
- To conform to U.S. Government space technology export regulations, including the International Traffic in Arms Regulations (ITAR) you must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C. 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here.
SpaceX is an Equal Opportunity Employer; employment with SpaceX is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.
Applicants wishing to view a copy of SpaceX’s Affirmative Action Plan for veterans and individuals with disabilities, or applicants requiring reasonable accommodation to the application/interview process should notify the Human Resources Department at (310) 363-6000.