SkillAI

Role: Solo Builder : dataset pipeline, two-layer model architecture, Captum explainability integration, IBM WatsonX report generation, full-stack deployment

PythonXGBoostPyTorchDeep Neural NetworksCaptumIBM WatsonXReactNode.jsExpress.jsO*NET Dataset

View on GitHub

Overview

Most career recommendation tools match keywords to job titles. SkillAI goes deeper, it uses a two-layer AI architecture trained on the 2025 U.S. Department of Labor O*NET dataset to predict the top 10 most suitable careers for a user's specific skill set. An XGBoost model first clusters the occupation space, then a deep neural network searches within the identified cluster for the most precise career matches. Captum-based explainability maps exactly which skills drove each recommendation, and IBM WatsonX translates that into a plain-language career report.

The Problem

Career indecision costs people years. Most tools that try to solve this are either keyword matchers that map "Python" to "Software Engineer" without nuance, or personality quizzes dressed up with tech branding. Neither approach actually models the relationship between a skill set and an occupation at the depth that the labour market operates. The U.S. Department of Labor's O*NET database is the most comprehensive occupational skills taxonomy in the world, 1000+ occupations, granular skill requirements, regularly updated. SkillAI was built to use that data properly: a hierarchical ML architecture that first understands where in the occupation space you belong, then finds your best-fit roles within it.

How It Was Built

The O*NET dataset covers over 1,000 distinct occupations. A single flat classifier trained on all of them faces a brutal problem — skill overlap between adjacent occupations creates enormous noise. The decision boundary between a Data Scientist and a Machine Learning Engineer is far more subtle than between either of them and a Civil Engineer. SkillAI solves this with a two-layer architecture that matches how the occupation space is actually structured.

The Two-Layer Architecture

Layer	Model	Task
1	XGBoost	Takes the user's skill set, identifies which occupation cluster their profile belongs to
2	Deep Neural Network	Activated for the identified cluster only — finds the top 10 most suitable careers within it

Layer 1 — Cluster Identification (XGBoost)

The full O*NET dataset is pre-clustered into occupation groups where skill profiles are internally similar. XGBoost was chosen for the same reason as in AgriVerse: non-linear pattern recognition, robustness to noisy skill inputs, and interpretability compatibility.

Layer 2 — Career Identification Within Cluster (DNN)

Once the cluster is identified, a DNN specific to that cluster is activated. Each cluster has its own specialised model, trained only on occupations within that cluster — its entire decision space is relevant. It has never seen occupations from other clusters.

This hierarchical approach dramatically reduces prediction noise. The XGBoost handles coarse separation; the DNN handles fine-grained discrimination. Neither model is asked to do more than it's good at.

The Explainability Layer

SkillAI uses Captum — PyTorch's native interpretability library — to generate feature attributions for every career recommendation. For each of the top 10 predicted careers, Captum maps which input skills contributed positively to the match and which worked against it, with magnitude.

This is the career equivalent of AgriVerse's SHAP engine — the same design philosophy applied to a different domain.

A user doesn't just see *"Software Developer, 87% match."* They see:

"Your Python proficiency, systems thinking, and problem-solving orientation strongly support this recommendation. Your limited project management experience slightly reduces the fit for senior roles."

IBM WatsonX then takes the Captum attribution data and generates a full plain-language career report for each recommendation — readable, actionable, and specific to the user's actual skill profile.

The Data

The model is trained on the 2025 version of the U.S. Department of Labor's O*NET Skills and Occupation dataset — the most authoritative occupational skills taxonomy available.

Property	Detail
Source	U.S. Department of Labor — Occupational Information Network (O*NET)
Version	2025 (current labour market, not five-year-old data)
Coverage	1,000+ occupations, hundreds of standardised skill descriptors
Update cadence	Annual

Tech Stack

Layer	Technology
Frontend	React — structured skill taxonomy selector (clean input, not free text)
Backend	Node.js + Express.js — orchestrates two-phase prediction pipeline via Python shell execution
ML Pipeline	Python — XGBoost cluster ID → DNN career prediction → Captum explanation, sequential
Output	Structured JSON payload returned to frontend
Narrative	IBM WatsonX — plain-language career report per recommendation, final step before response

The structured skill taxonomy on the frontend ensures clean, consistent input to the ML pipeline — users pick from predefined skills rather than typing free text, eliminating input noise at the source.

Results & Impact

Deployed career recommendation engine predicting top 10 job matches from 2025 O*NET data using a two-layer XGBoost + DNN pipeline with full Captum explainability.

View on GitHub