Back to Projects

SkillAI

Role: Solo Builder : dataset pipeline, two-layer model architecture, Captum explainability integration, IBM WatsonX report generation, full-stack deployment

PythonXGBoostPyTorchDeep Neural NetworksCaptumIBM WatsonXReactNode.jsExpress.jsO*NET Dataset
SkillAI

Overview

Most career recommendation tools match keywords to job titles. SkillAI goes deeper, it uses a two-layer AI architecture trained on the 2025 U.S. Department of Labor O*NET dataset to predict the top 10 most suitable careers for a user's specific skill set. An XGBoost model first clusters the occupation space, then a deep neural network searches within the identified cluster for the most precise career matches. Captum-based explainability maps exactly which skills drove each recommendation, and IBM WatsonX translates that into a plain-language career report.

The Problem

Career indecision costs people years. Most tools that try to solve this are either keyword matchers that map "Python" to "Software Engineer" without nuance, or personality quizzes dressed up with tech branding. Neither approach actually models the relationship between a skill set and an occupation at the depth that the labour market operates. The U.S. Department of Labor's O*NET database is the most comprehensive occupational skills taxonomy in the world, 1000+ occupations, granular skill requirements, regularly updated. SkillAI was built to use that data properly: a hierarchical ML architecture that first understands where in the occupation space you belong, then finds your best-fit roles within it.

How It Was Built

Why Two Layers?

The O*NET dataset covers over 1,000 distinct occupations. A single flat classifier trained on all of them simultaneously faces a brutal problem, the skill overlap between adjacent occupations creates enormous noise, and the decision boundary between a Data Scientist and a Machine Learning Engineer is far more subtle than the boundary between either of them and a Civil Engineer.

The two-layer architecture solves this by dividing the prediction problem into two distinct tasks that match how the occupation space is actually structured.

Layer 1 — Cluster Identification (XGBoost): The full O*NET occupation dataset is pre-clustered into meaningful occupation groups, clusters where the skill profiles are internally similar. An XGBoost classifier is trained to take a user's input skill set and identify which cluster their profile belongs to. XGBoost was chosen here for the same reason it was chosen in AgriVerse, non-linear pattern recognition, robustness to noisy skill inputs, and interpretability compatibility.

Layer 2 — Career Identification Within Cluster (Deep Neural Network): Once the cluster is identified, a DNN model specific to that cluster is activated. Each cluster has its own specialised DNN, trained only on the occupations within that cluster. This means the model doing the final prediction has never seen occupations from other clusters, its entire decision space is relevant. It finds the top 10 most suitable careers within the identified cluster, ranked by fit.

This hierarchical approach dramatically reduces prediction noise. The XGBoost handles the coarse separation; the DNN handles the fine-grained discrimination. Neither model is asked to do more than it's good at.

The Explainability Layer

Knowing which career you should pursue is only useful if you understand why. SkillAI uses Captum, PyTorch's native interpretability library, to generate feature attributions for every career recommendation. For each of the top 10 predicted careers, Captum maps which input skills contributed positively to the match and which worked against it, with magnitude.

This is the career equivalent of AgriVerse's SHAP engine, the same design philosophy applied to a different domain. A user doesn't just see "Software Developer, 87% match." They see "Your Python proficiency, systems thinking, and problem-solving orientation strongly support this recommendation. Your limited project management experience slightly reduces the fit for senior roles."

IBM WatsonX then takes the Captum attribution data and generates a full plain-language career report for each recommendation, readable, actionable, and specific to the user's actual skill profile.

The Data

The model is trained on the 2025 version of the U.S. Department of Labor's Occupational Information Network (O*NET) Skills and Occupation dataset, the most authoritative occupational skills taxonomy available. O*NET is maintained by the U.S. Department of Labor, updated annually, and covers hundreds of standardised skill descriptors mapped to over 1,000 occupations. Using the 2025 version means the recommendations reflect the current labour market, not data from five years ago.

Architecture

Full-stack MERN application. React frontend handles dynamic skill selection, users pick from a structured skill taxonomy rather than typing free text, which ensures clean input to the ML pipeline. Node.js and Express.js backend orchestrates the two-phase prediction pipeline, calling the Python ML layer via shell execution. The Python layer runs XGBoost cluster identification, DNN career prediction, and Captum explanation generation sequentially, returning a structured JSON payload. WatsonX report generation runs as the final step before the response is returned to the frontend.

Results & Impact

Deployed career recommendation engine predicting top 10 job matches from 2025 O*NET data using a two-layer XGBoost + DNN pipeline with full Captum explainability.