Senior Site Reliability Engineer (Kubernetes)


Full Time Senior-level / Expert
Vox Media logo
Vox Media
As the leading independent modern media company, Vox Media is dedicated to getting the future right. Our editorial networks ignite conversations and influence culture through journalism, storytelling and commentary on current events, lifestyle,...
Apply now Apply later

As the leading independent modern media company, Vox Media ignites conversations and influences culture. Across digital, podcasts, TV, streaming, live events, and print, we tell stories that affect our audience's daily lives and entertain as much as they inform.

Our portfolio features influential and respected editorial properties including Vox, New York Magazine, The Verge, The Cut, Eater, Vulture, The Strategist, Polygon, SB Nation, Intelligencer, Curbed, Grub Street, and Recode. Off-platform, the Vox Media Podcast Network offers one of the largest collections of popular podcasts, and Vox Media Studios produces and distributes the award-winning nonfiction shows. Powered by innovative technology that scales quality, the Chorus publishing platform and Concert advertising marketplace answer the always-changing needs of modern audiences, creators and marketers.

Vox Media has been named one of Fast Company’s “Most Innovative Companies in Media,” an Inc. “Company of the Year,” Digiday’s “Best Company for Parents,” and one of the Best Places to Work for LGBTQ Equality by the Human Rights Campaign.

About Coral:

This is a dangerous time to be a journalist on the internet. Online comments are often filled with rumors, insults and threats, pushing away readers and reducing community engagement. It doesn't have to be this way.

The Coral team at Vox Media believes that healthy online conversation can exist, given the right systems and tools – and that a strong democracy depends on it. The Coral community platform now supports journalists on more than 180 news sites, helping them engage with their communities, share knowledge, empower discussions, and reduce the impact of trolls. And it doesn't share or sell anyone's data while doing it.

Coral users include The Washington Post, the Wall Street Journal, The Financial Times, New York Magazine, and the LA Times.

About the role:

Under general supervision of Coral’s SRE Engineering Manager, the Senior Site Reliability Engineer is responsible for the scaling, performance, availability and security of Coral’s hosted client platform, websites, applications and services. The Senior SRE is also responsible for managing the tools and infrastructure that support the above. They will have a primary role in the leadership and execution of infrastructure initiatives from conception to production.

Our stack:

What you’ll bring:

  • Familiarity with our stack:
    • Kubernetes, GKE, Google Cloud, Terraform, Docker
    • MongoDB Atlas, Redis
    • Nodejs, Go, Python, GraphQL
  • Ability to right-size and capacity plan for high-availability, high-traffic SaaS infrastructure
  • Experience owning, managing, scaling, optimizing and monitoring high volume Kubernetes and cloud SaaS infrastructure
  • Experience managing and utilizing GitOps and DevOps workflows
  • Experience managing MongoDB, including familiarity with data exports, imports and ETL

What you’ll do:

  • Monitor and improve service stability and performance of Coral’s hosted platform, website, applications and services
  • Implement and automate tools and processes to improve reliability and efficiency of Coral’s hosted platform, websites, applications and services.
  • Participate in on call rotation, respond to service interruptions and stability and performance alerts
  • Develop custom tools or replace existing tools when necessary to facilitate or improve monitoring, automation, performance and stability
  • Assist with the development and implementation of contingency and disaster recovery plans
  • Assist in the development of capacity and budget planning and forecasts
  • Build out customer facing hosted infrastructure to ensure reliability, availability, efficiency and cost-effectiveness of technical requirements
  • Configure and operate Google Cloud, GKE, Kubernetes, and other cloud tools and services
  • Utilize and develop GitOps workflows to update and maintain Kubernetes deployments in GKE
  • Utilize Terraform to declare, provision, and maintain GC resources
  • Enhance, monitor and troubleshoot storage and backup systems to ensure reliability, performance and durability of data
  • Manage and assist customers through their integration process
  • Troubleshoot customer issues and update or create documentation where necessary to correctly address questions and concerns
  • Investigate and reproduce customer and internally reported bugs and issues
  • Reproduce and document steps that lead to unexpected behavior, and recommend fixes to dev team where appropriate
  • Evaluate existing software, applications and systems on a regular basis to ensure that critical security and stability patches or upgrades are applied

Are you passionate about this opportunity, but worried that you don't have 100% of the experience we're looking for? We still want to hear from you!

About working at Vox Media:

This is a permanent, full-time position with excellent benefits—including flexible hours and generous parental leave. Vox Media strives to provide comprehensive healthcare options for our employees and to ensure that our healthcare and other benefits are LGBTQ-inclusive. You'll be joining a group of focused, hard-working, creative people who are passionate about doing work that's challenging and fun—and who strive to maintain a healthy work/life balance.

Vox Media is committed to building an inclusive environment for people of all backgrounds and everyone is encouraged to apply. Vox Media is an Equal Opportunity Employer and does not discriminate on the basis of race, color, gender, sexual orientation, gender identity or expression, religion, disability, national origin, protected veteran status, age, or any other status protected by applicable national, federal, state, or local law.

Job region(s): Remote/Anywhere
Job stats:  3  0  0
  • Share this job via
  • or

Explore more DevOps, Cloud and SRE career opportunities