Senior Software Engineer - Site Reliability
Bangsar South, Kuala Lumpur, Malaysia
Our vision is to become the Skyscanner for online shopping, the place for South East Asia’s 600M people start their e-commerce journey. The place where consumers discover the latest trends, compare prices and get the best deal. With more than 500 million offers on our platform; 15 million monthly shoppers; and, a reputation as an independent authority when it comes to insights about the regional e-commerce sector, iPrice is at the centre of South East Asia’s e-commerce revolution.
Why join us?
Surround yourself with people who are ambitious, passionate, dynamic and constantly looking for ways to improve themselves. Join us and help us drive iPrice’s continued success, our monthly double-digit growth, and be a part of a diverse environment which combines the talent and insights of the 7 countries we that are operating in!
- Organization backed by world class investors like Line, Ventura and 500 Start-ups
- Management team comprised of international managers with experience from world-class organizations
- Performance focused culture championing ownership, transparency and open communication
- Diverse colleagues (25 nationalities and counting)
- Coaching and world-class trainings with access to key experts in your field
We are looking for Site Reliability Engineers (SREs) who are responsible for keeping all user-facing services and other production systems running smoothly. SREs ensure that iPrice's services have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SREs will keep an ever-watchful eye on our systems capacity and performance.
You’ll have the opportunity to manage the complex challenges of scale which are unique to iPrice, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
If you are a blend of pragmatic operator and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and codebase, then you are the right person!
- Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation and refinement.
- Collaborate with engineering teams on their infrastructure needs, and advise them throughout the development lifecycle.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, within our Service Level Objectives.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless post-mortems.
- Debug production issues across services, databases and levels of the stack.
- Design, develop and manage monitoring tools to provide performance dashboards, alerts, and collect data required to proactively identify issues and/or recommend improvements.
- A Bachelor's Degree/Diploma in Computer Science, Information Technology or a related subject.
- Minimum of 7 years of experience in provisioning environments, deploying applications, and maintaining infrastructures.
- Professional experience using Python, Go, or Ruby.
- Strong familiarity with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
- Possess experience with cloud environments – AWS, GCP or Azure.
- Have extensive experience building scalable platforms leveraging containers in a production environment.
- Great to have: Operated distributed data storage systems at scale, especially Elasticsearch and SQL Azure.
- Have experience with logging and telemetry services.
- Solid knowledge of continuous integration, continuous delivery, automated testing and all phases of the software development lifecycle.
- Experience of working in an agile and multi-cultural environment across many SCRUM teams at the same time.
- A Kaizen mindset and spirit of continuous improvement on a personal level and always up to date with the latest technology trends professionally.
- Ability to identify problems before they happen and implement solutions that detect and prevent outages.
- Expertise in designing, analysing and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code and automate routine tasks.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
- Understanding of CI/CD principles, Linux fundamentals, networking concepts and IP protocols.
- Comprehensive healthcare including outpatient/in-patient benefits for you and your family
- Work permit / visa coverage for expatriates to work in Malaysia
- Team building activities including strategy weekends; team getaways and departmental activities
- Extra perks including claimable grab rides; dinner if you’re in the office past 7 pm; stocked pantry; subsidised vending machine, beers and snacks every Friday and plenty of iPrice events