Staff Site Reliability Engineer
SecurityScorecard is an industry-leading cybersecurity company backed by Google, Sequoia, and Riverwood. Our mission is to make the world a safer place. We measure your and your vendors' cyber-health by assigning a security rating of A through F based on outside-in, non-intrusive data. Our Comprehensive security ratings, advanced data analytics, and actionable insights discover Third-Party Vulnerabilities & Security Gaps In Real-Time. Headquartered in NYC with over 200+ employees globally, raised over $110M USD, used by 1,000+ enterprise customers, and rating 1.5 million companies. We have created a new category of enterprise software, and our culture has helped us be recognized as one of the 10 hottest SaaS startups in NY for two years in a row. Our vision is to create a new language for companies and their partners to communicate, understand, and improve each other’s security posture.
About the Role
We are seeking a Site Reliability Engineer (SRE) with a knack for solving complex problems. You will combine your acumen for product operations and engineering to help build high-quality solutions which elevate our platform.
More specifically, you will be responsible for availability, latency, performance, efficiency, monitoring, emergency response, and infrastructure planning of SecurityScorecard’s platform. On a daily basis, you will both resolve problems as they arise and then design infrastructure and automation to eliminate or iteratively fix these incidents going forward. Any reactive fix you encounter will motivate and propel you towards creating key infrastructure improvements.
What you will do
- Work with the rest of the team to improve the reliability of the product and its individual services
- Identify and automate solutions to improve the performance, monitoring, scalability, and overall stability of our platform
- Troubleshoot and identify service level issues
- Collaborate with engineering teams to design, maintain, and support backend applications
- Service operational tickets that deal with existing issues, and identify places where automation could be used to limit/eliminate future incidents
- Participate in capacity/infrastructure planning and implementation of small and large-scale distributed systems
- Research new technology and methodologies for improvement projects
- Be on-call periodically
- 6+ years of overall experience in software engineering, systems administration, DevOps, SRE, or related disciplines
- 5 years experience managing enterprise applications in AWS
- Continuous Integration/Deployment pipelines experience (Eg. Jenkins)
- Experience in Logging & Monitoring solutions and techniques (Eg. Datadog, Splunk, logstash)
- Experience with Containers (Docker, ECS)
- Configuration Management (Terraform, Ansible) experience preferred
- Experience using Golang or Python
- Strong troubleshooting and analytical skills
- Working knowledge of industry standard network protocols and services
We offer a competitive salary, stock options, a comprehensive benefits package, including health and dental insurance, unlimited PTO, parental leave, tuition reimbursements, and much more!
SecurityScorecard embraces diversity. We believe that our team is strengthened through hiring and retaining employees with diverse backgrounds, skillsets, ideas, and perspectives. We make hiring decisions based upon merit and do not discriminate based on race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.