Cloud Infrastructure Engineer, Monitoring and Observability
Remote - St. Louis Park, Minnesota, United States
We are looking for a Cloud Infrastructure Engineer for our Monitoring and Observability efforts. This role is responsible for leading development, deployment, and operation of systems for collecting/storing/visualizing metrics, distributed logging, monitoring, alerting, and tracing. You will work closely with our engineering teams to design and build the next generation of systems monitoring infrastructure to deliver availability, performance, and efficiency at scale.
This position will lead technical efforts for implementing a platform-wide monitoring and observability program to be leveraged by the legacy platform and new microservices as they are stood up. This role is critical at this inflection point of our growth to both being able to quickly identify and remediate problems as well as to establishing and reporting on key success metrics for platform health and growth. The work you do will empower tens of thousands of financial services leaders to help their customers make critical financial decisions that shape their lives.
Exciting work you’ll do:
- Perform deep dives into system and latent reliability issues, service performance, and capacity modeling; work across the organization to produce and roll out fixes.
- Identify opportunities to improve automation; scope and create automation for deployment, management, and visibility of our services.
- Analyze complex problems in the application space relating to resilience.
- Create operational tooling for monitoring and self-healing infrastructures.
- Write libraries and APIs that provide a simple, unified interface to other developers when they use our monitoring, logging and event processing systems.
- Help guide architectural decisions and direct solutions that enhance our product reliability.
- Partner with development to identify anti-patterns and create fallback experiences to critical scenarios.
What we look for:
- Experience building and deploying monitoring and observability systems.
- Interest in learning/vetting new technologies.
- 2+ years of experience with enterprise level infrastructure designs, implementation, and support
- 2+ years of experience working in an AWS environment.
- 2+ years of experience with application monitoring tools
- You are an experienced developer and comfortable with PHP and Python.
- A degree in computer science, software engineering, a related field, or equivalent work experience
- Systematic problem-solving approach coupled with a strong sense of ownership and drive.
- A passion for creating performant, reliable, and scalable applications.
- Solid track record of building relationships and collaborating at all levels of the organization.
- Ability to work independently paired with a desire to learn and grow.
- Will thrive in high growth culture: Has a forward-thinking mindset, resilient, adaptable, and curious.
- Strong verbal and written communication skills.
- Strong analytical and problem-solving skills.
Prefer experience with tools such as: InfluxDB, Datadog, New Relic, Promoetheus, Grafana
We believe that living a balanced life leads to more creativity and productivity. Here’s what you and your family get for helping us build what’s next.
- Medical, Dental & Vision Coverage
- Prescription Drug Coverage
- Health Advocate Program
- Flexible Time Off Program
- Health Savings Account Flexible Spending Accounts Disability Protection
- Life & Voluntary Life Coverage Voluntary Benefits
- Paid Parental Leave
- Pet Insurance
- 401(k) Retirement Savings Plan
- Employee Referral Bonus
Total Expert is a high-growth, venture-backed SaaS company who is the Experience Platform for the financial services industries. Hundreds of banks, credit unions, and lenders throughout the U.S. use our Experience Platform to create customers for life. We enable our customers to build more human connections by creating relevant, engaging, and meaningful customer experiences.
At Total Expert, we strive for excellence, innovation, and customer success in everything we do. We are determined to reimagine the way people and technology work together so that we can allow our customers to build more meaningful, human connections with their customers.
Simply put, we believe that we are all a part of building something awesome and are committed to creating a world-class team and culture to do it.