projects.jpg
Finding job ads that never reached the major platforms — by combining large-scale web crawling with machine learning.

OVERVIEW

The client's goal was to create a world-leading job platform for both employers and job seekers. A key challenge was that many Swedish companies published job ads only on their own websites, making them invisible on the major job platforms. That meant valuable opportunities were being missed.

To solve this, the client needed a way to discover, analyze, and classify job pages at scale across the Swedish web. The challenge was not just technical volume — it was how to separate actual job ads from all the other pages that exist on company websites.

We designed a solution that combined smart data collection with machine learning-based analysis. Using the Common Crawl project as a starting point, we filtered out Swedish .se domains and identified pages that could potentially contain job listings.

The platform was built with a modern cloud-native setup on AWS and supported by Python-based data and ML workflows — resulting in a robust pipeline for finding hidden job opportunities and turning unstructured web content into structured business value.

OUR APPROACH


We combined large-scale web discovery with a trained machine learning model to separate real job ads from everything else.
gauge-dashboard-1

Web Crawling at Scale

Using the Common Crawl project as a starting point, we filtered Swedish .se domains and identified pages with the potential to contain job listings — processing data at web scale.

office-work-wireless

ML Classification

We trained a LightGBM gradient boosting model on more than 13,000 job pages sourced from Arbetsförmedlingen and Blocket to recognize real job ad patterns at scale.

lab-flask-experiment

Cloud-Native Architecture

The platform was built on AWS with Python-based data and ML workflows, creating a robust and scalable pipeline that could grow with the client's needs.

crypto-currency-bitcoin-imac

Structured Output

Unstructured web content was turned into structured, classified job listings — making it possible to surface opportunities that had previously been invisible to the market.

OUR RESULTS


A scalable foundation for discovering job opportunities that traditional platforms miss.

Hidden Jobs Surfaced

The client gained a scalable way to uncover job ads that were not available through the major platforms, creating the foundation for a stronger and more differentiated job marketplace.

Broader Market Coverage

By combining web-scale discovery with machine learning, the solution made it possible to find relevant opportunities earlier and deliver more value to both employers and job seekers.

ML at Scale

A model trained on 13,000+ real job pages gave the classifier a strong foundation for distinguishing actual job ads from the noise of company websites — at web scale.

Data to Product

The case demonstrates how the right mix of data engineering, ML competence, and practical execution can turn a difficult discovery problem into a usable product capability.