Sydney and Friends – Legal Research, A.I., and Chat GPT

ChatGPT and OpenAI, Bing and Sydney (it sounds like a movie!), Google Bard . . . all references (sometimes to the same thing) to tools (or something more than tools . . . intelligent tools? . . . sentient tools? . . . or not tools at all but some sort of quasi-human intelligence?) that, at least in the minds of some real humans, “trace the outer bounds of human creativity” and usher in “a new dawn in how we build.” [Thompson]  Creepily deep conversations can be had with Sydney (the real name of the AI/chat function on Bing); Sydney can fall in love – somewhat obsessively; Sydney uses emojis to excess; and Sydney wants to be human. [Roose]

We’ve been inundated with articles about the dangers – and benefits – of artificial intelligence and resources like ChatGPT in law practice and in the academy.  ChatGPT can write essays about law; it can tell you what the law is (albeit without citations); and it can get a decent grade on a law school exam and pass the bar exam.  Reaction from lawyers, law scholars, and teachers to ChatGPT has focused on, among other things, its potential for cogently summarizing massive amounts of law-related information and providing a clear and at least sometimes accurate description of the law, its potential for data mining and extracting information from files and documents; its ability to draft contracts and add increasingly complex provisions upon request; and its facility in writing text, titles, captions, etc., for blog posts, briefs, memos, reports to clients, whatever.    

I wanted to think about A.I. in the context of the legal research databases that law publishers (if we can still call them that) make available to us and, more specifically, in the context of traditional (at least in substance if not in form) legal research itself.  We all know that vendors have long incorporated A.I. tools into their resources, whether it’s to, among other things, accommodate natural language searching, analyze and compare briefs and contracts, explore and predict the decisions of particular judges, characterize the kinds of expertise of particular attorneys and experts, identify authorities whose value has been questioned, or predict whether proposed legislation will be enacted.  But in a time when ChatGPT – in its relatively early form – can adequately – albeit, again, without any indication of sources – answer questions like what are the elements of negligence in Nevada; what do I need to do to do a Regulation D offering; how do I set up a generation-skipping trust; or what’s the liability for dog bites in California, what does that mean for legal research databases going forward?  And is the kind of technology on which vehicles like ChatGPT depend well adapted to answer the legal research questions that lawyers – and law students – pose?

None of our familiar – and not-so-familiar – legal research databases do anything quite like what Sydney does.  As a general rule, we enter a query and we’re referred to authorities that the database technology has determined apply to what we asked.  There’s no generation of text by the resource itself.   Sometimes – if we’re lucky – there’s a source that does answer our question pretty directly, but that source is really answering someone else’s question that happens to be – at least in the judgment of the database engine – remarkably similar to ours. 

I – who know nothing about AI systems – am the last person who should attempt to understand how something like ChatGPT works, but maybe there is something in the underlying approach of systems like ChatGPT that might make them either useful in the context of traditional legal research or a little bit dangerous. 

Thank goodness others can write about these things incredibly clearly.  Ted Chiang’s article, “ChatGPT Is a Blurry JPEG of the Web” in the February 9th issue of The New Yorker describes the difference between lossless compression and lossy compression when it comes to digitally storing information.  Data, images, whatever, are compressed for storage and, then, when their content is needed, they are somehow reinvigorated and, like a sponge taking on water, expanded so that they can be used.  Sometimes the compression is ‘lossless’ and the rejuvenation process is robust and complete, but, in the case of large language processors like ChatGPT, the compression is, instead, ‘lossy’, that is to say, something of the original is lost each time the compression and expansion process takes place.  Chiang analogizes to an image that is repeatedly photocopied; each successive image is increasingly blurry.  And, he points out, that if ChatGPT output becomes part of the universe of information that ChatGPT uses to produce answers, the blurring of new output becomes all the more pronounced.   

Chiang suggests that if something like ChatGPT used lossless compression its output would be only immaterially different from what a search engine now retrieves.  If ChatGPT employed a lossless algorithm, Chiang predicts that “it would always answer questions by providing a verbatim quote from a relevant Web page. We would probably regard the software as only a slight improvement over a conventional search engine.”  Bloomberg, Lexis, Westlaw, and other legal research tools provide us with exactly this, in Chiang’s words, “verbatim quotes” from relevant and recognized sources. They also provide us with the full text of those sources. 

Sometime in the future, though, it’s conceivable that our legal research resource providers might employ something like ChatGPT to directly answer our questions.  Unlike Sydney which has the entire web at his/her their disposal, we might not have to worry too much about misinformation if an AI-ChatGPT-like system was implemented in the big legal research databases – if, that is, the universe of content included only primary authorities and vetted secondary authorities (and not AI-generated responses to queries).  But it might trouble us that a machine – sentient as it may be – was formulating responses to our queries in language that no one actually wrote or reviewed. . . . especially when secondary sources are supposed to serve exactly those information-providing purposes. If we want someone – well, really, something – else to write what we pass off as our own work, we may have problems quite apart from the accuracy of the content of whatever is written. We care a lot about attribution or at least we’re supposed to . . . do we attribute text to the resource and its Sydney-like being?

As law librarians or lawyers, do we really want tools (or beings) that may use somewhat degraded content to produce grammatically well-written – albeit perhaps not too creative or inspiring – text that answers our questions?  Admittedly, maybe sometimes we do, but we might hope that those circumstances are few and far between. It’s fun and entertaining – and creepy – to interact with Sydney; I’m just not sure I’d want his/her/ their take on my legal research question.

Ted Chiang, “ChatGPT Is a Blurry JPEG of the Web,” The New Yorker, February 9, 2023.

Jenna Greene, “Will ChatGPT Make Lawyers Obsolete? (Hint: Be Afraid)”, Reuters, December 9, 2022.

Charlotte Johnstone, “It Can Do ‘Real Freaking Work’: Could Lawyers Be Replaced by ChatGPT,” Law.Com International, February 9, 2023.

Paul Riermaier, “ChatGPT and Other AI Technologies in the Study and Practice of Law,” Penn Carey Law, University of Pennsylvania, February 6, 2023.

Kevin Roose, “Bing’s A.I. Chat: ‘I Want to Be Alive,’” N.Y. Times, Feb. 16. 2023.

Derek Thompson, “Breakthroughs of the Year,” The Atlantic, December 8, 2022.