Site Reliability Engineer
Remote - Mexico City, Mexico City, Mexico
Zipdev is looking to add a remote Site Reliability Engineer to its team of LatAm developers! As a Site Reliability Engineer, you will work as an integrated member of product teams to help build, deploy and reliably monitor cloud services. You will work on complex software development projects to keep important, revenue-critical services up. You will actively develop code and build frameworks to monitor the services deployed in production to drive reliability and performance across massive scale.
We're looking for a talented Site Reliability Engineer who can work under minimal supervision, define test procedures, and collaborate closely with Developers, Designers, Customer Support, and Engineering Leadership.
What you will do
- Build systems and infrastructure to monitor complex, large-scale distributed systems
- Identify stability/performance issues and collaborate with developers to triage critical issues in
- production systems.
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- Devise ways to actively monitor system throughput, capacity and reliability.
- Ability to debug complex systems and evolve a running environment without downtime.
- Engage in service capacity planning and demand forecasting, software performance analysis and system tuning
- Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
- Bachelor’s degree in Computer Science or equivalent work experience as System Administrator with programming skills.
- Fundamental knowledge of technologies across a broad range of disciplines: virtualization, storage, networking, server and security
- Understanding of systems and application design, including the operational trade-offs of various designs.
- Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
- Experience in analyzing logs and troubleshooting large-scale distributed systems.
- Excellent organization, time management, and communication skills
Nice to have
- Experience with instrumenting and monitoring production systems (ELK stack, Zabbix, Nagios,Statsd/Graphite, APM, etc.)
- Experience with Amazon AWS Infrastructure (EC2, S3, VPC, Security Groups, RDS) and related services desired
- A working understanding of Docker, Vagrant, Ansible/Chef/Puppet.
- Experience with one or more general purpose programming/scripting languages including butnot limited to: Python, Bash, Perl or Go.
- Work Remote Monday - Friday, 40 hours a week (no weekends)
- Vacation: 10 business days a year
- Holidays: 5 National Holidays a year
- Company Holidays: 5 Company Holidays a year (Christmas Eve, Christmas Day, New Years Eve, New Years Day, Zipdev Day)
- Major Medical Insurance
- Active Lifestyle/Gym Reimbursement
- Quarterly Home Office Reimbursement
- Performance-based Bonus
- Continuous Education Bonus
- Access to Training and Professional Development Platforms
- Did we mention its REMOTE
Explore more DevOps, Cloud and SRE career opportunities
- Open Cloud Infrastructure Architect Jobs
- Open Staff, Product Manager - Global Infrastructure Jobs
- Open IT DevOps Engineer Jobs
- Open Manager of DevOps & Engineering Infrastructure Jobs
- Open Senior Automation Engineer Jobs
- Open Site Reliability Engineer II Jobs
- Open Data Platform Engineer Jobs
- Open DevOps Infrastructure Engineer Jobs
- Open Senior Software Engineer - Site Reliability - Toronto Hub Jobs
- Open Principal Cloud Architect Jobs
- Open Reliability Engineer Jobs
- Open Senior DevOps Engineer - Pleasanton Hub Jobs
- Open Senior Software Engineer, DevOps Jobs
- Open Sr. Site Reliability Engineer Jobs
- Open Sr Software engineer (Infrastructure) Jobs
- Open DevOps Engineer - Raleigh Hub Jobs
- Open Senior Security Automation Engineer Jobs
- Open Staff DevOps Engineer Jobs
- Open DevOps/Configuration Management Specialist Jobs
- Open Software Development Engineer, AWS Security Jobs
- Open QA Automation Engineer - Workforce Engagement Management Jobs
- Open Lead Site Reliability Engineer Jobs
- Open Senior Software Development Engineer, AWS Security Jobs
- Open Cloud DevOps Systems Engineer Jobs
- Open Solutions Architect - VMware Specialist Jobs
- Open MySQL-related jobs
- Open REST-related jobs
- Open CloudFormation-related jobs
- Open Prometheus-related jobs
- Open S3-related jobs
- Open Jira-related jobs
- Open Elasticsearch-related jobs
- Open Virtualization-related jobs
- Open High availability-related jobs
- Open Golang-related jobs
- Open Reliability engineering-related jobs
- Open EC2-related jobs
- Open VMware-related jobs
- Open Redis-related jobs
- Open JS-related jobs
- Open MongoDB-related jobs
- Open Grafana-related jobs
- Open Node-related jobs
- Open Gitlab-related jobs
- Open PostgreSQL-related jobs
- Open Jenkins-related jobs
- Open Perl-related jobs
- Open Web applications-related jobs
- Open Spark-related jobs
- Open Vault-related jobs