Michael de la Maza is a Professor of Machine Learning and Business Analytics at Hult International Business School. 

Grading is one of the most valuable activities that faculty perform in the classroom. By giving students valuable and timely feedback, we nurture student growth.

As a professor, my three greatest grading concerns are consistency, timeliness, and feedback. Did the student I graded at 8am get the same high-quality feedback as a student that I graded at 4pm? Did I catch all of the opportunities for improvement?

During the past year, I have been experimenting with a new way to increase grading consistency and timeliness: I built an AI grader using Facebook’s open source Llama 3.3 model running on an Nvidia H100 GPU.

Here is what happened when I programmed an AI running on a high end GPU to grade student answers.

Why a high end machine is needed

First, a technical note. This grading cannot be done with chatbots like ChatGPT that cost $20 a month. The compute needed is far greater than what is provided by these “consumer” AI tools. Indeed, I first tried to use these chatbots for grading but quickly ran out of memory and tokens.

I chose Llama 3.3 because it sits in a “Goldilocks” zone: powerful enough to grade short answer questions but open source so I can run it on my own rented GPU.[1]

The end result is that the system could grade a set of ten short answer questions (200-300 words each) in about five minutes. My classes have about 60 students so it took about 300 minutes, or five hours, to grade an entire class.

The AI grading workflow

My workflow keeps a “human in the loop” and uses the AI as a “blunder stopper.” In the end, I, as the professor, am responsible for grading quality and providing high quality feedback to all students.

My new AI grading workflow is as follows:

  1. Human grading: My course assistant grades the assessment as usual.
  2. AI grading: While my course assistant is grading, I run the AI grading system.
  3. Blunder stopper: I then compare the grades given by the course assistant and the AI. When there is a discrepancy of more than five points, I dive in and resolve the difference.

The performance

The data from the first run was fascinating. The correlation between the grades given by the course assistant and the grades given by AI was 0.6. This is a “moderate to strong” positive relationship. It means that the course assistant and AI generally agree on the best scores and the weakest.

AI feedback

In addition to grading, I programmed the AI to provide feedback. Unlike humans, AI never gets tired and is able to provide detailed feedback for 60 students with ease.

Here is an example of the AI’s feedback: “Score 8.5: Strong definition of measurable outcomes. To improve: You explicitly mentioned establishing a baseline, but you failed to describe how you would track cost savings over Q1 and Q2 specifically.”

This is the holy grail of assessments: personalized, high quality feedback at scale. I’m now able to use the AI’s comment to enrich the feedback I give to students.

The future of education: Augmentation, not automation

AI promises a revolution in education. Some have even suggested that it will do away with the need for educators like me and educational institutions.

My experience with the AI grading system suggests the opposite. By turning the AI into a “blunder stopper,” I am able to provide consistent and timely grades with high quality feedback to my students.

By using the AI grader, I can focus on working with students who need added help or create extra challenges for top students. Using the AI grader is one of the best forms of support I can provide my students.

Reclaiming time for face-to-face mentorship

Students tell me that they enjoy attending class because of the emotional connection with the professor. They also benefit from hearing stories about my career in industry.

Time spent grading detracts from face-to-face time with students out of class. By using the AI grader, I reclaim over half of my grading time. That time is immediately invested into the students. I can spend more time before and after class, do more one-on-ones, and work on research projects with the most advanced students.

The value of attending university has never been just about consuming facts. After all, all of the facts are available on the web or the library. The value of a university education revolves around human connection, discussion, and mentorship.

When AI acts as a grading second opinion, it liberates the professor. Instead of having to go through the time-consuming process of double checking hundreds of grades, faculty can leave that to the AI. The educator evolves into a mentor who can spend quality time with students.

For example, in the past I have collaborated with students on research papers and case studies. In one situation, a student interviewed managers at Sephora about how they were using AI to recommend skincare products to customers. This work turned into a case study that benefited the student who co-authored the case study and benefits future students who will learn about industry AI applications by working through the case.

Moving beyond recall to synthesis

Because the AI grader can assess whether a student has accurately described a concept, I no longer need to check if students can recall what they learned in class. Instead, I can focus on higher levels of Bloom’s taxonomy – analyzing, evaluating, synthesizing, and creating.

This is how AI makes it easier to focus on critical thinking. Because the AI grader provides detailed feedback on the student’s understanding of the facts, we can then elevate the classroom discussion. I can challenge them to go beyond simply summarizing a lecture. I can ask them to defend their perspective, challenge assumptions, and apply their newfound understanding to novel situations.

In my fifteen years as a digital transformation consultant, I have found that this critical thinking is what distinguishes people as they move up the career ladder. Executives tend not to focus on the details – they assume their staff has done the work correctly. Instead, executives are looking for big picture insights that come from synthesizing large amounts of analysis.

The AI grader provides a safety net that ensures students understand the facts. I can then challenge them to interrogate those facts.

The value of the educator in the age of AI

Using the AI grader is one of the best ways to support my students.

While this implementation required specialized knowledge and a high end GPU, in the future such grading “blunder stoppers” will be available to all educators.

If educators use AI to simply get things done faster, we are missing the point. AI should not be used to create distance between a professor and a student. Instead, it should bring them closer by eliminating the need to focus on lower levels of Bloom’s taxonomy in the classroom.

By automating the routine, the AI grader does more than combat grading inflation and ensure grading consistency. By freeing both professors and students to focus on critical thinking, it elevates the educational experience.

[1] This work was supported in part by a grant from Lambda.ai.