The Bot Takes a Bow thumbnail

The Bot Takes a Bow

Late law month I wrote about a sample NextGen question that GPT-4 discovered was based on an outdated, minority rule of law. NCBE has now removed the question from their website, although it is still accessible (for those who are curious) through the Wayback Machine. While the Bot takes a small bow for assisting NCBE on this question, I’ll offer some reflections.

We hear a lot about mistakes that GPT-4 makes, but this is an example of GPT-4 correcting a human mistake. Law is a vast, complex field, especially considering state-to-state variations in the United States. Both humans and AI will make mistakes when identifying and interpreting legal rules within this large universe. This story shows that AI can help humans correct their mistakes: We can partner with AI to increase our knowledge and better serve clients.

At the same time, the partnership requires us to acknowledge that AI is also fallible. That’s easier said than done because we rely every day on technologies that are much more accurate than humans. If I want to know the time, my phone will give a much more accurate answer than my internal clock. The odometer in my car offers a more accurate measure of the car’s speed than my subjective sense. We regularly outsource many types of questions to highly reliable technologies.

AI is not the same as the clocks on our phones. It knows much more than any individual human, but it still makes mistakes. In that sense, AI is more “human” than digital clocks, odometers, or other technologies. Partnering with AI is a bit like working with another human: we have to learn this partner’s strengths and weaknesses, then structure our working relationship around those characteristics. We may also have to think about our own strengths and weaknesses to get the most out of the working relationship.

GPT-4’s review of the NextGen question suggests that it may be a useful partner in pretesting questions for exams. Professors read over their exam questions before administering them, looking for ambiguities and errors. But we rarely have the opportunity to pretest questions on other humans–apart from the occasional colleague or family member. Feeding questions to GPT-4 could allow us to doublecheck our work. For open-ended questions that require a constructed response, GPT-4 could help us identify issues raised by the question that we might not have intended to include. Wouldn’t it be nice to know about those before we started grading student answers?

I hope that NCBE and other test-makers will also use AI as an additional check on their questions. NCBE subjects questions to several rounds of scrutiny–and it pretests multiple-choice questions as unscored questions on the MBE–but AI can offer an additional check. Security concerns might be addressed by using proprietary AI.

Moving beyond the testing world, GPT-4 can offer a doublecheck for lawyers advising clients. In some earlier posts, I suggested that new lawyers could ask GPT-4 for pointers as they begin working on a client problem. But GPT-4 can assist later in the process as well. Once a lawyer has formulated a plan for addressing a problem, why not ask GPT-4 if it sees any issues with the plan or additional angles to consider? (Be sure, of course, to redact client identifying information when using a publicly accessible tool like GPT-4.)

Our partnership with GPT-4 and other types of AI is just beginning. We have much to learn–and many potential benefits to reap.