
OVERVIEW
The client's goal was to create a world-leading job platform for both employers and job seekers. A key challenge was that many Swedish companies published job ads only on their own websites, making them invisible on the major job platforms. That meant valuable opportunities were being missed.
To solve this, the client needed a way to discover, analyze, and classify job pages at scale across the Swedish web. The challenge was not just technical volume — it was how to separate actual job ads from all the other pages that exist on company websites.
We designed a solution that combined smart data collection with machine learning-based analysis. Using the Common Crawl project as a starting point, we filtered out Swedish .se domains and identified pages that could potentially contain job listings.
The platform was built with a modern cloud-native setup on AWS and supported by Python-based data and ML workflows — resulting in a robust pipeline for finding hidden job opportunities and turning unstructured web content into structured business value.
OUR APPROACH
We combined large-scale web discovery with a trained machine learning model to separate real job ads from everything else.
Web Crawling at Scale
Using the Common Crawl project as a starting point, we filtered Swedish .se domains and identified pages with the potential to contain job listings — processing data at web scale.
ML Classification
We trained a LightGBM gradient boosting model on more than 13,000 job pages sourced from Arbetsförmedlingen and Blocket to recognize real job ad patterns at scale.
Cloud-Native Architecture
The platform was built on AWS with Python-based data and ML workflows, creating a robust and scalable pipeline that could grow with the client's needs.
Structured Output
Unstructured web content was turned into structured, classified job listings — making it possible to surface opportunities that had previously been invisible to the market.
OUR RESULTS
A scalable foundation for discovering job opportunities that traditional platforms miss.
Hidden Jobs Surfaced
The client gained a scalable way to uncover job ads that were not available through the major platforms, creating the foundation for a stronger and more differentiated job marketplace.
Broader Market Coverage
By combining web-scale discovery with machine learning, the solution made it possible to find relevant opportunities earlier and deliver more value to both employers and job seekers.
ML at Scale
A model trained on 13,000+ real job pages gave the classifier a strong foundation for distinguishing actual job ads from the noise of company websites — at web scale.
Data to Product
The case demonstrates how the right mix of data engineering, ML competence, and practical execution can turn a difficult discovery problem into a usable product capability.