Senior Site Reliability Engineer, Gradient Platform (Remote, US)

New York City

Full Time Senior-level / Expert
Paperspace logo
Apply now Apply later

About Paperspace
Paperspace builds tools and infrastructure to make accelerated computing simple and accessible.
Paperspace is backed by leading investors including Y Combinator, Initialized Capital, Battery Ventures, and Intel Capital.
The Role
The Gradient Platform team is responsible for the underlying platform that powers our MLOps platform, Gradient. The team manages products like Notebooks, Workflows, and Deployments as well as platform entities such as compute scheduling, storage providers, datasets, secrets, models, and metrics. The team’s goal is to provide a strong foundation so that application teams can efficiently develop MLOps products.
What we're looking for
• Strong interest in development platforms, MLOps, CI/CD, infrastructure, or making products for technical teams• 6+ years relevant industry experience in a fast-paced, high growth tech environment managing and scaling internal platforms using Javascript, Typescript, or Go• Experience with systems, linux OS, networking, storage, monitoring, and alerting
What you'll be doing
• Work with Python and Go• Proactively address reliability, scalability, and security concerns by adding alerts, monitoring, and new processes with high autonomy• Manage our Kubernetes clusters that provide services for all of Gradient• Manage CI/CD automation for Gradient services• Work with the Cloud Platform team to triage issues related to Gradient and cloud services involving storage, networking, and cache services• Participate in an on-call rotation• Collaborate with other engineers to find elegant architectures and solutions
Technical problems the team has worked on
• Implemented prometheus-compatible metrics store for monitoring and alerting• Created a docker registry pull-through cache to lower outbound traffic• Implemented monorepo CI/CD integration to deploy services and helm charts• Scale kubernetes API to support multi-tenancy and high traffic growthOur Team 
Paperspace values technical excellence in an open and inclusive environment. The team is primarily based in NYC, but we have a strong remote/hybrid team. Communication is paramount and mutual respect is at the core of our collaborative work environment. We are also committed to building a team that represents a variety of backgrounds, perspectives, and skills. We believe creating a more diverse team directly impacts our ability to collaborate effectively, build a better community, and produce better products.
• Multiple health care insurance options with premium plans in addition to vision and dental insurance plans• 401(k) Plan with employer matching• Commuter benefits with a contribution from the company • Responsible Time Off Policy • Generous and flexible parental leave• Fitness & wellness benefit• Remote friendly and hybrid office environment for New York team members
We are an equal opportunity employer that values and welcomes diversity. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
Job region(s): Remote/Anywhere North America
Job stats:  0  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities