Site Reliability Engineering Manager

São Paulo

Applications have closed
Loadsmart, Inc. logo
Loadsmart, Inc.

Posted 1 month ago

Who we are:Loadsmart aims to move more with less. We combine great people and innovative technology to more efficiently move freight throughout North America. Our focus is on designing and building the best tools for our team and our customers, using machine learning models to connect freight with trucks. We automate with algorithms and scale with integrations to better match supply and demand. In doing this we reduce wasted fuel and lost time, cutting out empty miles for motor carriers and providing cost savings and instant booking for shippers.
Where we are:Loadsmart was founded in New York and is currently headquartered in Chicago, IL. Our teams operate remotely from different parts of the United States as well as in several locations across Latin America.
Who you are:You believe in game-changing innovations and are excited about reimaging a 700 billion dollar industry. You are an analytical person with a solid understanding of business and the impact that analysis plays in a company's growth.
The role:We are looking for a Site Reliability Engineering Manager to work remotely based in Brazil or in LATAM. You should have experience and proven ability to analyze, propose and implement safer systems and processes. You will work closely with engineering squads across platform engineering to ensure our applications are reliable. 

Key Responsibilities:

  • Responsible for the design, deployment, and operation of Loadsmart critical systems while balancing reliability, costs and agility;
  • Partner with and support all members of our creative, tight-knit development team;
  • Exercise your intuitive ability for creative problem solving and contagious positive passion to solve challenging and exciting problems and inspire those around you;
  • Move seamlessly from high-altitude thinking to the tangible and practical supporting our lean software development team;
  • Collect metrics and understand how each metric correlates to the business and inspire the team to do the same;
  • Troubleshoot and root-cause analysis of system operation issues;
  • Accountable for the platform Service Level Agreements and Objectives;
  • Be available for application support during off-hours, as needed;
  • Take ownership of software infrastructure projects;
  • Seek, give, and receive constructive feedback to teammates through code and specification reviews;
  • Define and manage KPIs and other measurements to indicate the health of reliability-related programs;
  • Work directly with engineers and product managers to influence the product requirements.

Qualifications:

  • 10+ years of experience in Cloud Computing, SRE/DevOps and software release;
  • Excellent communication. You’re very comfortable communicating in English (both written and spoken) - you will work in an international team with native and non native English speakers;
  • Detail-oriented and demonstrate initiative and high self-motivation;
  • Have a good understanding on how things work under the hood from a Software Eng. and DevOps point of view;
  • Familiar with microservices architecture and container orchestration, preferably with Kubernetes; 
  • Deep knowledge on how modern networking and operating systems works;
  • Being extremely comfortable managing Linux servers (either RedHat or Debian-based distros);
  • Experience in working with AWS, Cloud environments, Containers, Kubernetes, Docker - DevOps Engineering environment with owning tests, CI/CD pipelines;
  • Experience with automation and provisioners like Terraform, Ansible or Chef;
  • Troubleshooting and system engineering exposure in UNIX/Linux production environments;
  • Experience with monitoring, alerting and incident management;
  • Good coding expertise in languages like Python, Go, Ruby or Java;
  • Experience automating tasks with scripting languages such as Python, Bash, and JavaScript;
  • Postgres and DBA experience /exposure is a plus;
  • Experience with chaos engineering, scale testing and/or disaster recovery is a plus;
  • Experience with Big Data and streaming technologies is a plus;
  • BS or MS degree in Computer Science, Engineering or related field or equivalent experience.
We are an international company, so only accept resumes in English.
At Loadsmart, we believe our biggest asset is our people. We are proud to be an equal opportunity employer, hiring and developing individuals from diverse backgrounds and experiences to add to our collaborative culture. Loadsmart treats all candidates and employees with respect and does not discriminate in our recruiting, hiring, and promoting processes, including on the basis of race, color, religion, sex, age, sexual orientation, gender identity and/or expression, national origin, veteran status, or disability.
Job tags: Ansible AWS Bash CD Chef CI Debian Docker Go Java JavaScript Kubernetes Linux Postgres Python RedHat Reliability engineering Ruby Streaming Terraform Unix
Job region(s): South America
Job stats:  0  0  0