Senior Reliability Engineer

Park Offices Drive, Durham, NC 27709

Onna logo
Apply now Apply later

Posted 3 weeks ago

We’re Onna: A passionate, hard-working team solving one of the biggest challenges facing today’s businesses — knowledge fragmentation. We’ve built the world’s first Knowledge Integration Platform to make enterprise knowledge more accessible, useful, and private. We help some of the world’s leading companies like Facebook, Slack, Electronic Arts, and Fitbit, to bring together fragmented knowledge from today’s most popular workplace applications. With our platform, teams can unify, protect, search, automate, and build on top of their organization’s proprietary knowledge, allowing them to leverage it in new and intuitive ways. 

Onna is engaged in an active phase of growth and we are hiring across our offices in New York City, Barcelona, Raleigh-Dunham, San Francisco, London & Toulouse. We are thrilled to be able to welcome new team members from across the world to a work environment which is lighthearted, fast-paced, exciting, and flexible. We provide our people with the tools, resources, and outstanding leadership to take their career to the next level. If this sounds like an exciting opportunity, we want to hear from you!

We’re currently hiring a Senior Reliability Engineer to join or Ops team in our Raleigh-Durham office. The ideal candidate is an enthusiastic multi-tasker who feels comfortable in a challenging, fast-growing environment.

You will be responsible for keeping our production infrastructure and SaaS platforms perfectly tuned to achieve great levels of availability, reliability and resilience, while at the same time guaranteeing that the platform is seamlessly scalable and conforming to the highest standards of security.

You’ll also focus in continuously improving our monitoring, alerting and logging platforms, helping to define the metrics and criteria that will be used to measure our Service Level Objectives and the overall service quality.

Our product runs in GCP, AWS and on-prem, and it’s deployed in Kubernetes. We use best in class products such as Elasticsearch, Kafka, Scylla, or Postgres. We love open source products and to contribute to them.

Our work environment is lively, fast-paced, exciting, and flexible. We’ll provide you with the tools, resources, and outstanding leadership to take your career to the next level. If you’re looking for an invigorating work place where you can help us to seek ways to process terabytes of data every day in the most efficient way and aren't afraid of a work hard, play hard mindset, we want to meet you!

If you have a penchant for operational excellence and love to pay attention to little details that make the difference, then join us and have a direct impact on a product used by major clients to solve global problems in knowledge management. Come join us to make a difference!

What you’ll do:   

  • Maintain the day to day requirements of our Kubernetes based infrastructure and persistent layers.
  • Maintain and improve our monitoring, alerting and logging systems.
  • Analyze performance metrics and implement solutions for optimizing resource usage
  • Develop and implement new large scale systems. Automation.
  • Work closely with the development and DevOps teams in an Agile environment
  • Provide technical support and guidance to our internal teams and clients
  • Working independently with minimal supervision
  • Take ownership of projects and propose innovative solutions to maximize Security, Availability, Reliability and Scalability of the platform.
  • Be open to being on call to solve off-hour incidents in the production systems.

Who you are:

  • An SRE or SysAdmin with 2 years of experience in maintaining large scale systems in the cloud. 
  • Experience in Cloud and on-prem solutions, Kubernetes, Relational and/or NoSQL databases, monitoring and automation tools
    • (our tech stack: GCP, AWS, GKE, EKS, Kubernetes, Istio/Linkerd, Elasticsearch, Kafka, Cassandra/Scylla, Postgres, Prometheus, Grafana, Kibana, Terraform, Puppet/Ansible, Docker, Linux.) 
  • Solid understanding of IP networking, load balancing, firewalls, and network security.
  • Understanding of microservice architectures.
  • Strong skills in debugging and troubleshooting problems in production environments.
  • Knowledge of high-level programming languages (python preferred)   
  • Highest levels of responsibility maintaining critical platforms
  • Strong written and verbal communication 

Benefits we offer:

  • Comprehensive medical, vision, and dental coverage
  • 401(k) with matching contribution
  • Flexible vacation and PTO policies
  • Monthly gym membership stipend
  • Professional development stipend
  • Monthly group activities
  • Commuter perks (location-specific)
  • Dog-friendly office (location-specific)

About the product & Onna’s funding: 

Our growing list of integrations includes the likes of G-Suite, Slack, Microsoft 365, Box, Dropbox and more. Our open API allows us to integrate with any cloud-based or on-premise platform, for optimal control and visibility into your most critical knowledge. Once connected to Onna, the potential use cases are limitless: Information Governance, eDiscovery, Compliance, Knowledge Management, are just a few ways Onna can empower organizations and their employees. 

In 2019, we closed an $11M Series A led by Dawn Capital with the participation of our integration partners Slack Fund and Dropbox, and in 2020 we closed a $27M Series B led by Atomico with participation from Glynn Capital, as well as follow-up investments from Dawn Capital, Nauta Capital, and Slack Fund. 


Onna is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. You must have authorization to work in the location the position is posted.

All of your information will be kept confidential according to EEO guidelines. To view our privacy policy, please visit here.

Job tags: Ansible AWS Docker Elasticsearch GCP Grafana Kafka Kubernetes Linux Load Balancing Open source Postgres Prometheus Puppet Python Terraform