Inclusion by Relevance: AI-Driven Session Discovery for Canada's Indigenous Classrooms

Bharat Malhotra
Mar 23
4 min read

Connected North (CN) is a TakingITGlobal programme that connects K–12 students in remote Indigenous communities across Canada to live virtual learning sessions — delivered by scientists, artists, Elders, career mentors, and more.

The Promise and the Gap

Connected North's catalog has grown to nearly 3,000 sessions from 447 providers — Indigenous role models, cultural knowledge keepers, scientists, artists, and career mentors — reaching over 40,000 students across 218 schools annually. It is one of the most diverse virtual learning catalogs built specifically for Indigenous communities in Canada.

But a catalog is only as inclusive as it is discoverable. With no personalised discovery layer, teachers navigating the web app tended to re-book familiar sessions. Newer sessions, Indigenous-led content, and community-sourced programming — often the most culturally relevant — were the least likely to be found. The catalog's breadth was real; its reach was not.

The Data Challenge

Before building anything, the team had to confront the data it actually had — and it was far from ideal.

Teacher session ratings existed, but 78.7% were 5-star ratings. With almost no variation, ratings were statistically useless as a training signal. Free-text feedback from teachers — the richest indicator of genuine satisfaction — covered only 9.4% of session records. And of the thousands of data fields analysed across CN's database, a large proportion had low coverage, inconsistent formatting, or missing history, making them unusable for modelling. Starting from 82 candidate features, only 42 survived rigorous testing as genuinely predictive.

These are the kinds of constraints that stop most ML projects before they reach production. The team's response was to work thoughtfully with what existed, rather than wait for better data.

Engineering a Training Signal

Since no single reliable signal existed to train on, the team built one.

A Likeability Score was constructed by combining all available satisfaction signals in a weighted cascade — blending ratings, AI-analysed feedback sentiment, and teacher favourite flags in proportions that reflected the quality of evidence available for each session record. Where all three signals were present, all three contributed. Where only a favourite flag existed, that carried the full weight. The result was a training target that covered 70% of all session records with far more discriminating power than any individual signal.

This score became the ground truth the model trained on. Crucially, explicit thumbs-up and thumbs-down feedback collected from teachers after launch carries the highest weight in retraining, so the model sharpens over time on the clearest signal available: what teachers actually said they liked.

How the Model Works

The recommendation engine is a supervised ML ranking model — a tree ensemble — that learns what makes a session genuinely useful to a teacher, and then scores every available session for every teacher to generate a personalised ranked list.

The 42 features it draws on span the full context of a booking: the provider's track record, the session's quality history, how well the session's grade and subject coverage matches the teacher's profile, how similar teachers have responded to the same session in the past, and how a session's content aligns with the school's own history of sessions it has valued most. That last signal — measuring a session's semantic similarity against a prototype built from each school's most liked sessions — proved one of the stronger predictors, capturing cultural and contextual fit that structured metadata alone could not.

Two simple rules apply after scoring: sessions the teacher has already completed are removed, and sessions they have explicitly disliked are excluded from future recommendations.

The System

The full solution was delivered as a production-ready, end-to-end automated system — not a prototype. It retrains itself monthly on fresh data, refreshes recommendations every night without any API downtime, and serves results to teachers with very low latency. New teachers with no booking history are handled through an intelligent peer-based fallback, so every teacher gets relevant recommendations from day one. Everything runs on a GitHub-based CI/CD pipeline, so model updates are version-controlled and deployed without manual intervention. The system monitors itself continuously — polling its own health every 60 seconds, firing instant alerts on failures, and emailing the CN team a daily summary of recommendations and feedback — so the team has full visibility without needing to manage dashboards. Infrastructure runs on AWS in Canada Central for data residency compliance at very affordable infrastructure costs.

Inclusion by Relevance

More than 80% of teachers polled during user acceptance testing said they liked the recommendations they received — a strong validation that the model's predictions align with what teachers find genuinely useful, despite the significant data constraints it was built on.

But the deeper significance goes beyond a satisfaction score. A catalog of 3,000 sessions only delivers on CN's digital inclusion mission if teachers can find what is actually relevant to their students. By surfacing sessions that match a school's community context, cultural affiliations, and the kinds of content its teachers have valued most, the engine quietly expands which sessions get booked — including Indigenous-led content, community-sourced programming, and culturally grounded sessions that unassisted search would leave buried. When relevance improves, the full richness of the catalog reaches the classrooms it was built for. Discovery, done well, is itself an act of inclusion.