Back Home

ETL Pipeline for Job Market Data

Let's face it—landing a remote tech job in 2025 might feel like a daunting task, especially for someone like me who hasn't yet broken into the industry as a developer. Adding to the challenge, I just turned 40, and ageism is a reality I can't ignore. To make things even tougher, I don't have the extensive professional work history that often commands high salaries for senior developers. However, I've spent time building my own products, and I'm ready to pivot.

So, what's the plan? Start with data.

Fortunately, Remotive, a remote-first job board, offers a free API. I used it to fetch all software development jobs posted in the last 24 hours, exporting the data—job titles, companies, and associated technologies—into a JSON file. My goal was to identify the most common job titles, excluding seniority indicators like “Principal,” “Staff,” and “Lead,” to find opportunities where I could position myself effectively.

Rather than relying on regex, I decided to create an ETL (Extract, Transform, Load) pipeline for better efficiency and visualization. As someone new to ETL pipelines, I knew I needed a database and a tool like Tableau or Power BI. While I considered Supabase, their free tier allows only two projects, and I didn't want to waste a slot on an experimental project. Instead, I chose BigQuery for its generous free tier and my familiarity with SQL.

Fetching the data in Node.js and inserting it into BigQuery was straightforward, thanks to the BigQuery Node Client. The only hiccup was that the free tier doesn't support streaming inserts, so I adapted by saving the data to a temporary JSON file, converting it to NDJSON, and batch-loading it into BigQuery.

Once the pipeline was up and running, I shifted focus to the visualization. Tableau's free tier doesn't support direct BigQuery connections, so Power BI became the default choice. Connecting Power BI to BigQuery was simple, though I learned that JSON key files need to be formatted as a single line for the connection to work.

ETL Pipeline Visualization

The resulting treemap revealed the top 10 job titles: “Software Engineer” led the pack, followed by “Backend Engineer,” “Data Engineer,” “Data Scientist,” and “Machine Learning Engineer.” These insights were telling—positioning myself as a software engineer would allow me to apply for the most exact-match job posts. However, considering data-related titles like “Data Engineer” or “Data Scientist” might offer a less competitive path into tech.

“Data Analyst” didn't make the top 10, but it's often a stepping stone to roles like Data Engineer, especially for self-taught individuals like myself. This project was a refreshing break from web app development, and I thoroughly enjoyed working with data, building the pipeline, and creating visualizations. Moving forward, I'm considering presenting myself as a data analyst candidate to explore this potential pathway.

The code for this pipeline is open-source and available on my GitHub. I'd love to hear your thoughts—do you think data is an easier way to break into tech?

https://github.com/koller-m/remote-jobs-data