Senior Site Reliability Engineer, Coral

Remote

Vox Media logo
Vox Media
Apply now Apply later

Posted 3 weeks ago

As the leading independent modern media company, Vox Media ignites conversations and influences culture. Across digital, podcasts, TV, streaming, live events, and print, we tell stories that affect our audience's daily lives and entertain as much as they inform.

Our portfolio features influential and respected editorial properties including Vox, New York Magazine, The Verge, The Cut, Eater, Vulture, The Strategist, Polygon, SB Nation, Intelligencer, Curbed, Grub Street and Recode. Off-platform, the Vox Media Podcast Network offers one of the largest collections of popular podcasts, and Vox Media Studios produces and distributes the award-winning nonfiction shows. Powered by innovative technology that scales quality, the Chorus publishing platform and Concert advertising marketplace answer the always-changing needs of modern audiences, creators and marketers.

Vox Media has been named one of Fast Company’s “Most Innovative Companies in Media,” an Inc. “Company of the Year,” Digiday’s “Best Company for Parents,” and one of the Best Places to Work for LGBTQ Equality by the Human Rights Campaign.

About the team:

This is a dangerous time to be a journalist on the internet. Online comments are often filled with rumors, insults and threats, pushing away readers and reducing community engagement. It doesn't have to be this way.

The Coral team at Vox Media believes that healthy online conversation can exist, given the right systems and tools – and that a strong democracy depends on it. The Coral community platform now supports journalists on more than 180 news sites, helping them engage with their communities, share knowledge, empower discussions, and reduce the impact of trolls. And it doesn't share or sell anyone's data while doing it.

Coral users include The Washington Post, the Wall Street Journal, The Financial Times, New York Magazine, and the LA Times.

About the role:

Under general supervision of the Director of Software Reliability Engineering, the Senior Site Reliability Engineer is responsible for the scaling, performance, availability and security of Coral’s hosted client platform, websites, applications and services. The Senior SRE is also responsible for managing the tools and infrastructure that support the above. This person will have a primary role in the leadership and execution of infrastructure initiatives from conception to production.

What you'll do:

  • Monitor and improve service stability and performance of Coral’s hosted platform, website, applications and services
  • Implement and automate tools and processes to improve reliability and efficiency of Coral’s hosted platform, websites, applications and services.
  • Participate in on call rotation, respond to service interruptions and stability and performance alerts
  • Integrate software and applications with existing services
  • Install and maintain off-the-shelf commercial and open source software and/or services
  • Develop custom tools or replace existing tools when necessary to facilitate or improve monitoring, automation, performance and stability
  • Collect, analyze, and report on metrics related to system and application performance
  • Assist with the development and implementation of contingency and disaster recovery plans
  • Assist in the development of capacity plans, budget plans, and forecasts for both
  • Build out client hosted infrastructure to ensure reliability, availability, efficiency and cost-effectiveness of technical requirements
  • Configure and operate Google Cloud, GKE, Kubernetes, and other cloud tools and services
  • Utilize and develop GitOps workflows to update and maintain Kubernetes deployments in GKE
  • Utilize Terraform to declare, provision, and maintain GC resources
  • Evaluate customer data dumps to ensure compatibility with Coral import tools and application
  • Perform ETL migrations of sensitive customer data from various sources into MongoDB Atlas
  • Manage improvements to existing import tools and data migration infrastructure 
  • Optimize large data imports to MongoDB for speed and efficiency
  • Enhance, monitor and troubleshoot storage and backup systems to ensure reliability, performance and durability of data
  • Manage and assist clients through their integration process
  • Investigate and reproduce customer and internally reported bugs and issues
  • Reproduce and document steps that lead to unexpected behavior, and recommend fixes to dev team where appropriate
  • Evaluate existing software, applications and systems on a regular basis to ensure that critical security and stability patches or upgrades are applied
  • Create, implement and monitor high-availability and failover systems
  • Maintain vendor relationships, evaluate vendor solutions, participate in vendor selection
  • Participate in development of new features, products and services, and improvements to existing features, products and services

What you'll bring:

  • Familiarity with our stack:
    • Kubernetes, GKE, Google Cloud, Terraform, Docker
    • MongoDB Atlas
    • Python, Go, Nodejs, GraphQL, Redis
  • Ability to right-size and capacity plan for high-availability, high-traffic SaaS infrastructure
  • Experience owning, managing, scaling and monitoring large-scale Kubernetes and cloud SaaS infrastructure
  • Experience managing and utilizing GitOps and DevOps workflows
  • Experience managing MongoDB, including familiarity with data exports, imports and ETL

About working at Vox Media:

This is a permanent, full-time position with excellent benefits—including flexible hours and generous parental leave. Vox Media strives to provide comprehensive healthcare options for our employees and to ensure that our healthcare and other benefits are LGBTQ-inclusive. You'll be joining a group of focused, hard-working, creative people who are passionate about doing work that's challenging and fun—and who strive to maintain a healthy work/life balance.

Vox Media is committed to building an inclusive environment for people of all backgrounds and everyone is encouraged to apply. Vox Media is an Equal Opportunity Employer and does not discriminate on the basis of race, color, gender, sexual orientation, gender identity or expression, religion, disability, national origin, protected veteran status, age, or any other status protected by applicable national, federal, state, or local law.

Job tags: Docker Go GraphQL High-traffic Kubernetes MongoDB Open source Python Redis Reliability engineering Streaming Terraform
Share this job: