Site Reliability Engineer, Opsgenie

Bengaluru, India

Applications have closed
Atlassian logo
Atlassian

Posted 1 month ago

Atlassian is continuing to hire for all open roles with all interviewing and on-boarding done virtually due to COVID-19. Everyone new to the team, along with our current staff, will temporarily work from home until it is safe to return to our offices.
Atlassian’s mission of “Unleash the potential of every team” is the guiding light behind what we do. We have developed well-known products (Jira, Confluence, Trello, etc.), that fit in the fabric of teamwork across different types of teams, and the processes to help every team succeed. 
One of these products is OpsGenie - a modern incident management platform for operating always-on services, empowering Dev & Ops teams to plan for service disruptions and stay in control during incidents. OpsGenie centralizes alerts, notifies the right people reliably, and enables them to collaborate and take rapid action. OpsGenie also has a track record of operating at five-nines availability. This means reliability is built into our processes, systems, tools, and mindset.We live and breathe reliability
You will be required to deeply understand technology landscapes, and evaluate the use of new technologies. You will be influential within your team and work with peers and senior leaders to define and revise the standards for operational excellence across Atlassian. You will consistently tackle abstract issues that span multiple functional areas and drive your team to push for improvements that can scale across other teams, services, and platforms. 
We'd love it if you brought a deep understanding of modern Cloud infrastructure, programming expertise, operational experience, and a desire to change the status quo. We'll support you with robust backend systems, mature processes, and a motivated team with a strong desire to not f*** the customer. We're looking for an engineer who can analyze and help improve our monitoring and processes to get us to an even higher level of availability, scalability, and reliability.

On your first day, we'll expect you to have:

  • Expertise with software development, ideally Python/Java/Go/etc
  • Understanding of Linux and Networking systems
  • Hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, IAM, RDS, S3, DynamoDB, Kinesis - or equivalents, e.g. in GCP)
  • Experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, etc...
  • Strong organizational and interpersonal skills

It would be great, but not mandatory, if you had:

  • A deep understanding of Observability (monitoring, logging, and tracing) best practices
  • Experience with front end development including React
More about our team:
Atlassian Site Reliability Engineering is a recently formed and rapidly growing group within the organization. We are in the process of building our teams, tools and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently planning to be involved with every technical team across Atlassian. We work side by side with the product family and platform developers to maintain and improve services and performance. We live our values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilizing a variety of data collection, enrichment, analytics and visualizations to learn about our complex systems.
Atlassian is growing fast. Our teams, products and services are evolving at an astonishing rate, and so the SRE challenge is to grow at the right speed in the right way. Our vision includes moving to ever more automated systems, using our love of analytics and focus on metrics to both feedback to us what is happening in the production and delivery pipelines, as well as drive decisions about where our pain points are and how we fix them. We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams.
More about our benefits
Whether you work in an office or a distributed team, Atlassian is highly collaborative and yes, fun! To support you at work (and play) we offer some fantastic perks: ample time off to relax and recharge, flexible working options, five paid volunteer days a year for your favourite cause, an annual allowance to support your learning & growth, unique ShipIt days, a company paid trip after five years and lots more.
More about Atlassian
Creating software that empowers everyone from small startups to the who’s who of tech is why we’re here. We build tools like Jira, Confluence, Bitbucket, and Trello to help teams across the world become more nimble, creative, and aligned—collaboration is the heart of every product we dream of at Atlassian. From Amsterdam and Austin, to Sydney and San Francisco, we’re looking for people who want to write the future and who believe that we can accomplish so much more together than apart. At Atlassian, we’re committed to an environment where everyone has the autonomy and freedom to thrive, as well as the support of like-minded colleagues who are motivated by a common goal to: Unleash the potential of every team.
Additional Information
We believe that the unique contributions of all Atlassians is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.
All your information will be kept confidential according to EEO guidelines.
Job tags: AWS CloudFormation EC2 GCP Go Java Jira Linux Python React Reliability engineering S3