The moment Priya saw her answer marked wrong
Priya, a Grade 9 student from Chennai, had spent fifteen minutes solving a complex algebra problem, a question about calculating the cost of materials for a school project with multiple constraints. Her approach was unconventional: she set up the equations differently than the method shown in the textbook. The answer was correct, but her method was not in the "standard" solution key.
Her teacher, pressed for time and marking 180 papers, gave her 2 out of 5 marks for "incorrect method." The mark stayed. Priya's confidence shook. "Maybe I'm not good at maths," she thought of a narrative that would follow her for years.
What Priya's teacher did not know was that she had stumbled onto a more elegant solution, one that demonstrated deeper mathematical thinking than the textbook method. Yet in a system where one person, one afternoon, decides the evaluation of 180 students, such nuance is easily lost.
This is a story repeated millions of times daily in Indian classrooms. Teachers overworked, biased (consciously or unconsciously), time-pressed make quick judgments that feel objective but are laden with assumptions. A student's presentation style, handwriting quality, whether they remind the teacher of another student, perceived socio-economic status all silently influence the grade. Research shows that identical handwriting gets graded differently when the rater thinks it belongs to a "bright" versus "struggling" student.
But what if technology could intervene not to replace human judgment, but to surface what humans miss: the critical thinking hidden in an unconventional answer, the gender or caste bias in how questions are framed, the learning gap that a surface-level test score obscures?
This is where AI-powered assessment tools, thoughtfully designed and humanely implemented, offer a revolution in fairness.
Before exploring the solution, we must confront the scale of the problem.
Research on assessment bias in Indian schools reveals several pervasive patterns:
Narrow assessment: Traditional exams test recall and procedural knowledge, not critical thinking. Students who excel at pattern recognition thrive; those who think creatively struggle.
The cumulative impact is stratification: students marked as "good" early continue to be treated as capable, while those marked "struggling" face diminished expectations, fewer opportunities, and eventually, lower achievement. In India's context of caste, class, gender and regional language hierarchies, these biases become mechanisms of systemic exclusion.
The scope of the problem
In a system where marks determine board exams, college admission, and ultimately life trajectory, assessment bias is not a pedagogical issue, it is a justice issue.
When schools first began adopting automated grading tools, the promise was seductive: remove human bias with algorithms. Multiple-choice tests were fed into systems that would grade instantly, objectively, at scale.
But the reality was more complex.
1. Training data bias: AI systems learn from examples. If those examples, historical test papers, marked answer sheets come from a system already biased, the AI learns and amplifies those biases.
For instance, an AI trained on essay grades from schools where boys' essays were marked higher than girls' identical essays would learn to score "masculine" writing styles higher. A system trained on answers predominantly in English and Brahminical styles would undervalue creative solutions in regional languages or from non-traditional backgrounds.
Over the last 18–24 months, a new generation of tools has emerged. These systems do something radically different: they use AI not to replace human judgment, but to surface blind spots in human judgment and reveal the thinking behind answers.
1. Multi-level answer analysis
Modern NLP-based assessment tools can now read open-ended answers (including handwritten ones) with 95%+ accuracy and analyse them at multiple levels.]
Novelty and creativity: Is there an original or elegant approach?
2. Bias detection in questions and answers
The system flags:
Grading inconsistency: The system compares gradings across similar answers and alerts teachers to inconsistencies. "You gave Answer A (geometry solution) 4/5 but Answer B (trigonometric solution to the same problem) 2/5, even though both are correct. Review?"
3. Thinking pattern recognition
A system can distinguish between:
When a student answers a physics question about motion, the system can see if they:
Each reveals different cognitive depths. A student marked "wrong" for not using the expected formula but who actually demonstrated deeper thinking gets a fair re-evaluation.
A mid-size private school in Chennai deployed an assessment tool during the 2024–25 academic year to grade mathematics and science papers for Grades 7–10. The experience was revealing.
The incident: A Grade 9 student answered a geometry problem about triangle angles. The expected solution used coordinate geometry. The student's solution used pure geometric insight and properties to arrive at the answer in three lines instead of ten.
A human teacher, evaluating quickly and pattern-matching against the solution key, had marked it 2/5: "Incomplete method."
The AI system flagged this answer as "conceptually exemplary but methodologically unconventional." The teacher reviewed, realised the answer was not just correct but demonstrated deeper understanding, and changed the grade to 5/5.
Over the term, the system flagged 47 such cases answers that were substantively strong but did not match the expected format. Reclassification improved 41 of these students' grades and more importantly, shifted the narrative from "wrong method" to "novel thinking."
Teachers reported a crucial insight: "We were not rewarding thinking; we were rewarding format adherence."
Outcome:
Another school implemented a system to analyse how teachers were grading essays. The data was shocking.
Finding 1: Gender effect
Essays by students identified as girls were graded on average 0.8 points lower (on a 5-point scale) than essays by students identified as boys with similar conceptual content, grammar, and structure. The bias was not dramatic but consistent and directional.
When teachers were shown this data "Here are 10 pairs of essays. One is by a girl, one by a boy. Can you spot which is which?" many could not. Yet their grades had flagged a systematic preference.
Finding 2: Language bias
Students writing in English scored ~15% higher than those writing in Kannada or Tamil, even when content was equally strong. Teachers explained this as "clarity" and "expression," but the system revealed it was linguistic preference, not quality.
Intervention: Teachers engaged in explicit reflection on their rubrics. They separated "conceptual quality" (which should be language-agnostic) from "written English fluency" (which is valuable but distinct). Rubrics were revised.
Outcome after 6 months:
A government school with severe resource constraints, multi-grade classrooms, 60+ students per class, one harried teacher per grade deployed a system to analyse student work.
The revelation: Students in government schools frequently demonstrated critical thinking comparable to or exceeding private school students, but their answers were marked lower because they did not match textbook solutions.
One student asked, "Why does ice float?" In a science test, gave an answer rooted in density principles but also noted that ice floats because water has an unusual property it expands when frozen and speculated whether this was adaptive for aquatic life. The expected answer was one sentence: "Ice is less dense than water."
The student's answer was marked as "too complex, off-topic" and scored 1/3.
An AI analysis recognised this as higher-order thinking: analysis → synthesis → theoretical speculation. The system flagged this and recommended 3/3.
Over time, teachers began to design assessments explicitly for critical thinking (not just factual recall), knowing the system would recognise and reward it. Student engagement soared and students realised their thinking was valued even if it did not match the textbook.
What makes these new tools effective is a shift in design philosophy. Instead of asking "How do we automate grading?"schools are asking "How do we use AI to make human judgment better?”
AI does not grade. Teacher's grade. But AI surfaces patterns humans miss:
The system is transparency and accountability, not automation.
Rather than black-box scoring, the system makes evaluation criteria visible. Teachers and students see:
This clarity itself reduces bias because there is less room for unconscious preference.
Rather than one number (the grade), the system provides a profile:
This matches the vision your earlier writing articulated: assessing competencies, not just marks.
Continuous monitoring allows the system to:
This reflects the SANGATHAN governance principle: data should guide decisions and surface inequities so they can be corrected.
Research from schools implementing fair AI assessment tools shows measurable improvements:
(Compiled from school reports, Chennai, Bangalore, Jaipur, 2024–25)[
Your earlier articles on SAMAVESH (inclusion) and ANKUR (personalised learning) rest on a premise: every child can learn, and systems should be designed to recognise and nurture that capability.
Traditional assessment breaks this promise. A girl who thinks deeply but writes hesitantly gets a lower grade. A backward-caste student with a brilliant insight but unconventional expression is marked down. A student thinking creatively but deviating from the textbook is penalised.
AI-powered fair assessment flips this. It says: "Let me look deeper. Is there thinking here that was overlooked?”
In the context of your other work:
Implementing fair AI assessment at scale is not frictionless. Real challenges remain.
Teachers fear being "audited" by algorithms. Concerns include:
Path forward: Frame AI as a tool for teachers, not an evaluator of teachers. Focus on time-saving (teachers report the tool cuts grading by 50%) and support (better insights for differentiation). Involve teachers in designing rubrics and bias-checking processes, not just receiving outputs.
AI is only as fair as its training data. If training data:
Path forward: Diverse, representative training data is essential. India needs coordinated effort to build datasets that:
The new IndiCASA dataset, a framework for evaluating biases in language models in Indian contexts, is a step forward, but much work remains.
Once students know they are being assessed fairly (not just on format but on thinking), behaviour changes. Some educators worry this could be gamed: "Students will give complex answers just to impress the AI."
But evidence suggests the opposite: when critical thinking is valued, students genuinely engage more deeply.
The deeper issue is philosophical: What do we want assessment to do? Rank students, or help them learn?
Fair AI assessment is not a standalone tool. It fits into the coherent educational vision you have been articulating:
Together, these pieces create an education system oriented toward genuine learning and human flourishing, not sorting and stratification.
Priya's story marked wrong for thinking differently is the old paradigm. In the new one, her unconventional solution would be flagged as exemplary. The system would tell her teacher: "This student demonstrates novel problem-solving. Feed more open-ended problems to this learner."
Priya would internalise a different narrative: "I can think. My thinking is valued. I belong."
That shift from self-doubt to agency is not trivial. It is the difference between a student who persists and one who gives up. Between a student who sees mathematics as "not for me" and one who sees it as a domain where she can contribute.
Fair AI assessment, implemented thoughtfully with strong human oversight and commitment to equity, is a tool for making that shift real for millions of Indian students.
The challenge now is not whether the technology works. It does. The challenge is scaling it equitably, ensuring it reaches not just elite private schools but the government schools where the need is greatest, and building the governance frameworks to keep it honest.