Software Engineer - Data Infrastructure

Remote - Anywhere

Full Time
Quora logo
Quora
Apply now Apply later

Posted 1 month ago

[As of June 2020, Quora has become a remote-first company.  This position can be performed remotely from anywhere in the world, regardless of any location that might be specified above.]

 

About Quora:

The vast majority of human knowledge is still not on the internet. Most of it is trapped in the form of experience in people's heads, or buried in books and papers that only experts can access. More than a billion people use the internet, yet only a tiny fraction contribute their knowledge to it. We want to democratize access to knowledge of all kinds — from politics to painting, cooking to coding, etymology to experiences — so if someone out there knows something, anyone else can learn it. Our mission is to share and grow the world's knowledge, and we're building a world-class team to help us achieve this mission.

About the Team:

Our small engineering team works on challenging problems every day. We have a culture that's rooted in constantly learning and improving, and our engineers are encouraged to think big and experiment with new ideas. Using continuous deployment, we quickly see our changes in the product and make fast iterations. Our engineers focus on creating polished products and writing high quality code by designing APIs and abstractions that are extensible and maintainable. Everyone on the engineering team has a huge impact on our product and our company.

About the Role:

Our data infrastructure team maintains, operates and expands data ecosystem @Quora which includes Data warehousing, Streaming infrastructure, Distributed cluster-computing framework, Distributed query engines, Messaging systems, Data pipelines & Automation Tools. In this role you will be responsible for contributing to different aspects of data pipeline development and operational stability of the production big data systems. We leverage existing open source technologies like Spark, Flink, Kafka, HDFS, Hbase, Hive, Presto, Airflow and also build our own systems for experimentation & time-series analysis. As a member of our team you would spend time designing and scaling our distributed data systems, working closely with other teams to identify and execute on new use cases & evangelize the correct use of data at the company. We are looking for someone who will be excited by the prospect of optimizing, enhancing or even re-designing our company’s data architecture/pipelines to support our next generation data initiatives.

Responsibilities:

  • Design, implement, maintain and optimize data pipelines, architectures and data sets.
  • Collaborate with data scientists, platform engineers and business partners to understand data needs and drive key data infrastructure decisions.
  • Bring your expertise to help model structured & unstructured data. Own these data models at a high level & be a data consultant for partner teams.
  • Own the data definitions & lineage across different data platforms, maintain systems of record for operational and non operational data stores.
  • Engineer reusable capabilities, abstractions & resilience in data pipelines for DML, DDL, ETL & Data flows which can be leveraged across teams.
  • Be a data mentor & a team player with strong communication, prioritization, and adaptability skills.

Minimum Qualifications:

  • Ability to be available for meetings and impromptu communication during Quora's coordination hours (Mon-Fri: 9am-3pm Pacific Time).  Learn why here
  • Proficiency in any/all of the programming languages: Python/Java/Scala & strong query authoring skills in SQL.
  • Must have 2+ years of experience building data pipelines, including data ingestion, cleaning, processing, transforming, staging & loading.
  • Proficiency with big data processing frameworks: Spark, Flink, Hive, Hadoop, Kafka, EMR, Presto.
  • Operational mindset with ability to do Problem diagnosis, Root cause analysis, SLA compliance, Performance tuning and Incident Management in Data Infrastructure.
  • Experience building data-intensive applications (high velocity/high volume).
  • Experience with SQL/NoSQL data store & data lake operations.

Preferred Qualifications:

  • Flexible and positive team player with outstanding interpersonal skills.
  • Passion for Quora's mission and goals.
  • Hands-on experience with AWS technologies like S3, Redshift, EMR/EC2, Athena, Snowflake.
  • Familiarity in designing and operating a streaming platform (eg. Kafka, Flink, Spark)
  • Data wrangling & Data tooling ability

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

California Consumer Privacy Act (CCPA) disclosure

 

Job tags: Airflow AWS EC2 Hadoop Java Kafka Open source Python Redshift S3 Scala Spark SQL Streaming
Share this job: