Site Reliability Engineer

Vancouver, British Columbia, Canada

Bananatag logo
Bananatag
Apply now Apply later

Posted 1 week ago

What if everyone got on the same page?

This is something we ask ourselves every day at Bananatag. What if everyone got on the same page within organizations of all sizes. We believe great communication is at the heart of every business and we look to help them improve communication to increase employee engagement!

Who we are

Bananatag is an internal communication platform that allows employees to create a beautifully designed message, segment and distribute that message to multiple communication channels such as E-mail, Slack, Enterprise Social Networks and then measure the overall impact of the messaging. Our customers include some of the most recognizable brands in the world.

Requirements

We are looking for a Site Reliability Engineer to join our growing product team. This is a unique opportunity to shape the growth of Bananatag and an entire industry by transforming Bananatag into the one-stop-shop for internal communications within large enterprise companies.

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. As we have scaled our product and the team over the past few years, we have realized that we need to hire software developers with a strong focus on SRE to fill this gap for us. Since we are not a size (yet) to have dedicated SRE roles, depending on our workload, you should expect to spend time working closely with and taking on tasks for the product teams.

As a Site Reliability Engineer, you will help us monitor, build, iterate and improve our infrastructure systems. In the first 3 to 6 months, you are expected to learn and contribute to infrastructure as code as that project needs the most immediate attention. You will also help us meet the best practices, own the incident response process and guide the rest of the engineering team. We look for self-motivated leaders who have the itch to learn, grow and challenge the status quo. You will help us plan our technology roadmap, launch reliable products and services and play a key role in improving our in-house systems. You’ll work closely with leadership and the engineering team as a whole to research and apply the latest and greatest technology to our pipeline and infrastructure.

Bananatag has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative. If this sounds like a good fit for you, drop us a line!

What you will be doing

For the organization:

  • Contributing to our infrastructure as code (we use AWS CDK with Typescript) to bring it to parity with the current systems and educate and distributing the knowledge to the teams in order to federate role-based programmatic access to our infrastructure while preventing accidental mutations
  • Helping engineering teams plan effective infrastructure for our projects to deploy onto and ensure that projects are scalable, resilient, and reliable in support of growing products.
  • Supporting Engineering teams with their infrastructure resources by system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Iterating on internal processes to improve our ability to ship fast while maintaining high quality systems that we can depend on.
  • Enhancing tools and automation to fill the gaps in our current systems as well as building entirely new ones as we face bigger and more complex challenges.
  • Taking on-call shifts and additionally provide support during incidents as needed
  • Supporting engineering team with their incident response strategy through policies and storybook
  • Owning the post mortem process and being accountable for in-depth root cause analysis of all failures to ensure we are always improving.

Skills needed

  • At least 3 years of experience working either in Site Reliability Engineering, Software Engineering (backend focused), Ops/DevOps engineering or a similar field
  • Valid AWS certification (Associate or Professional level) or willingness to certify within the first 6-months of the job
  • You need to know how software applications work on a high level and can leverage cloud resources to design scalable systems.
  • You are comfortable coding with at least one general-purpose programming language (bonus points for Typescript experience).
  • You have experience with Linux and the Unix Shell (bash)
  • You have cloud native application experience (AWS, Azure, ...) and are familiar working with orchestration systems like Docker Swarm, Kubernetes or Fargate.
  • You have some experience with infrastructure as code to build and test automation tools for infrastructure provisioning by leveraging tools such as Chef, Ansible, Puppet, Terraform or AWS CDK (Cloudformation)
  • You have experience building and working on deployment systems and CI/CD.
  • You have experience collecting and processing metrics from tools such as Prometheus/Datadog/NewRelic, and can walk teams through setting up SLO and SLI targets.
  • You are comfortable with responding to production incidents and can fight fires with a calm and level head and escalate and engage with other on-call engineers as needed, leveraging post mortems to apply lessons learned.
  • You are comfortable diving into an unfamiliar system and finding your way around.
  • You have strong written, verbal and presentation skills and you possess a strong ability to collaborate with cross-functional teams and build solid working relationships with everyone in the organization.
  • You know how to prioritize the most impactful tasks on the fly and you believe in processes and the power of planning.

The interview process will consist of some coding (with your language of choice) and bash scripting, scenario-based problem solving and behavioral questions.

Benefits

Benefits section

  • Competitive salaries with regular compensation reviews plus company stock options.
  • Comprehensive extended benefits package (health, dental, and vision).
  • Employee Assistance Program with Babylon by Telus (local doctors, therapists, and dietitians).
  • RSP program.
  • 4 weeks of vacation.
  • Flexible remote working policy.

Bonus Benefits:

  • A healthy and supportive environment for lifelong learning.
  • Budget for extracurricular learning like going to conferences or taking classes as well as weekly learning and knowledge sharing sessions.
  • Authority, accountability, and autonomy to succeed at your own pace. Not everyone is a rocket ship, and not everyone needs to be a rocket ship. Sometimes a rock-solid launchpad is just as important.
  • A team that embraces knowledge sharing and wearing different hats. There are no lordships and fiefdoms here.
  • A safe work environment where everyone can feel comfortable being the best version of themselves. Our teams are made of a diverse group of people, and we're always looking to increase that diversity!
  • Leaders who act as thought partners who work with you through problems rather than micromanagers who bring you solutions to problems.
  • A team that embraces change from large processes and methodologies to technologies and infrastructure.
  • A financially solid organization with a strong vision and clear opportunities for both company and your career growth.
Job tags: Ansible AWS Azure Bash CD Chef CI CloudFormation Docker Kubernetes Linux Prometheus Puppet Reliability engineering REST Terraform Unix