Senior Site Reliability Engineer
About the team
An engineer in our team works with a global scale infrastructure and has great impact in millions of players. To guarantee the best experience possible, we count with several Kubernetes clusters spread around the world and connected to each other. We are in the cutting edge of open-source infrastructure technology, we adopted Kubernetes in production little after the project was launched and today we use technologies such as eBPF and Cilium in our network stack.
We handle billions of logs daily and have hundreds of nodes and thousands of containers to serve more than 1 million requests per minute. We know this number will only grow and we're looking for engineers that can help with the challenges of provisioning and operating infrastructure at large scale.
About the role
Wildlife Studios is searching for infrastructure/site reliability engineers to join our team. We seek an engineer with solid programming, network and operational systems knowledge. Since we are always looking for new tools and technologies that better solve our problems, we value professionals that like to learn new things, are autonomous and proactive to bring and implement their ideas.
We'll need you to understand our systems flows, diagnose problems in production environment, identify points of improvement and automation, and guarantee that we have the necessary infrastructure to create the best games in the world.
More about you
- Player focused. We are player oriented and infrastructure has a great impact in their experience. You have empathy with our players and focus on ensuring they have an amazing experience. You aim for a top-level infrastructure, guaranteeing the highest availability possible.
- Automation is key to scaling. We look for engineers that have a history of projecting and executing automation projects in order to get rid of any manual and repetitive tasks.
- Calm and pragmatism. When everything seems to be falling apart around you, you have a plan and keep calm.
- Bleeding edge. You are curious and like to study new technologies, test new solutions and measure the impact brought by changes. We want to ensure we are using the best stack possible
What you’ll do
- Develop, monitor and optimize infrastructure clusters (Kubernetes, Elasticsearch, MongoDB, Kafka...).
- Define monitoring and observability patterns.
- Troubleshoot and manage incidents in production.
- Automate and improve infrastructure provisioning (Infrastructure as Code).
What you'll need
- Bachelor's degree in Computer Science, Computer Engineering or equivalent experience.
- Linux knowledge. You should be able to discuss in detail what happens under the hood (SO, kernel, network).
- Solid knowledge in at least one programming language. We work mostly with Go and Python.
- Experience with large scale production systems and technologies.
- Experience with Kubernetes.
- Experience with monitoring systems (eg: Datadog, Statsd, Grafana, etc).
- Experience with infrastructure as code tools (eg: Ansible, Terraform, etc).
- Experience with messaging systems such as Kafka and Emqtt.
- Experience with database management (Postgres, MongoDB, Cassandra, Redis, ElasticSearch).
- Experience with CI/CD pipelines (eg: Jenkins, Travis, etc).
We welcome people from all backgrounds who seek the opportunity to help build the best gaming company, where everyone thrives.