Sydney and Friends – Legal Research, A.I., and Chat GPT

ChatGPT and OpenAI, Bing and Sydney (it sounds like a movie!), Google Bard . . . all references (sometimes to the same thing) to tools (or something more than tools . . . intelligent tools? . . . sentient tools? . . . or not tools at all but some sort of quasi-human intelligence?) that, at least in the minds of some real humans, “trace the outer bounds of human creativity” and usher in “a new dawn in how we build.” [Thompson]  Creepily deep conversations can be had with Sydney (the real name of the AI/chat function on Bing); Sydney can fall in love – somewhat obsessively; Sydney uses emojis to excess; and Sydney wants to be human. [Roose]

We’ve been inundated with articles about the dangers – and benefits – of artificial intelligence and resources like ChatGPT in law practice and in the academy.  ChatGPT can write essays about law; it can tell you what the law is (albeit without citations); and it can get a decent grade on a law school exam and pass the bar exam.  Reaction from lawyers, law scholars, and teachers to ChatGPT has focused on, among other things, its potential for cogently summarizing massive amounts of law-related information and providing a clear and at least sometimes accurate description of the law, its potential for data mining and extracting information from files and documents; its ability to draft contracts and add increasingly complex provisions upon request; and its facility in writing text, titles, captions, etc., for blog posts, briefs, memos, reports to clients, whatever.    

I wanted to think about A.I. in the context of the legal research databases that law publishers (if we can still call them that) make available to us and, more specifically, in the context of traditional (at least in substance if not in form) legal research itself.  We all know that vendors have long incorporated A.I. tools into their resources, whether it’s to, among other things, accommodate natural language searching, analyze and compare briefs and contracts, explore and predict the decisions of particular judges, characterize the kinds of expertise of particular attorneys and experts, identify authorities whose value has been questioned, or predict whether proposed legislation will be enacted.  But in a time when ChatGPT – in its relatively early form – can adequately – albeit, again, without any indication of sources – answer questions like what are the elements of negligence in Nevada; what do I need to do to do a Regulation D offering; how do I set up a generation-skipping trust; or what’s the liability for dog bites in California, what does that mean for legal research databases going forward?  And is the kind of technology on which vehicles like ChatGPT depend well adapted to answer the legal research questions that lawyers – and law students – pose?

None of our familiar – and not-so-familiar – legal research databases do anything quite like what Sydney does.  As a general rule, we enter a query and we’re referred to authorities that the database technology has determined apply to what we asked.  There’s no generation of text by the resource itself.   Sometimes – if we’re lucky – there’s a source that does answer our question pretty directly, but that source is really answering someone else’s question that happens to be – at least in the judgment of the database engine – remarkably similar to ours. 

I – who know nothing about AI systems – am the last person who should attempt to understand how something like ChatGPT works, but maybe there is something in the underlying approach of systems like ChatGPT that might make them either useful in the context of traditional legal research or a little bit dangerous. 

Thank goodness others can write about these things incredibly clearly.  Ted Chiang’s article, “ChatGPT Is a Blurry JPEG of the Web” in the February 9th issue of The New Yorker describes the difference between lossless compression and lossy compression when it comes to digitally storing information.  Data, images, whatever, are compressed for storage and, then, when their content is needed, they are somehow reinvigorated and, like a sponge taking on water, expanded so that they can be used.  Sometimes the compression is ‘lossless’ and the rejuvenation process is robust and complete, but, in the case of large language processors like ChatGPT, the compression is, instead, ‘lossy’, that is to say, something of the original is lost each time the compression and expansion process takes place.  Chiang analogizes to an image that is repeatedly photocopied; each successive image is increasingly blurry.  And, he points out, that if ChatGPT output becomes part of the universe of information that ChatGPT uses to produce answers, the blurring of new output becomes all the more pronounced.   

Chiang suggests that if something like ChatGPT used lossless compression its output would be only immaterially different from what a search engine now retrieves.  If ChatGPT employed a lossless algorithm, Chiang predicts that “it would always answer questions by providing a verbatim quote from a relevant Web page. We would probably regard the software as only a slight improvement over a conventional search engine.”  Bloomberg, Lexis, Westlaw, and other legal research tools provide us with exactly this, in Chiang’s words, “verbatim quotes” from relevant and recognized sources. They also provide us with the full text of those sources. 

Sometime in the future, though, it’s conceivable that our legal research resource providers might employ something like ChatGPT to directly answer our questions.  Unlike Sydney which has the entire web at his/her their disposal, we might not have to worry too much about misinformation if an AI-ChatGPT-like system was implemented in the big legal research databases – if, that is, the universe of content included only primary authorities and vetted secondary authorities (and not AI-generated responses to queries).  But it might trouble us that a machine – sentient as it may be – was formulating responses to our queries in language that no one actually wrote or reviewed. . . . especially when secondary sources are supposed to serve exactly those information-providing purposes. If we want someone – well, really, something – else to write what we pass off as our own work, we may have problems quite apart from the accuracy of the content of whatever is written. We care a lot about attribution or at least we’re supposed to . . . do we attribute text to the resource and its Sydney-like being?

As law librarians or lawyers, do we really want tools (or beings) that may use somewhat degraded content to produce grammatically well-written – albeit perhaps not too creative or inspiring – text that answers our questions?  Admittedly, maybe sometimes we do, but we might hope that those circumstances are few and far between. It’s fun and entertaining – and creepy – to interact with Sydney; I’m just not sure I’d want his/her/ their take on my legal research question.

Ted Chiang, “ChatGPT Is a Blurry JPEG of the Web,” The New Yorker, February 9, 2023.

Jenna Greene, “Will ChatGPT Make Lawyers Obsolete? (Hint: Be Afraid)”, Reuters, December 9, 2022.

Charlotte Johnstone, “It Can Do ‘Real Freaking Work’: Could Lawyers Be Replaced by ChatGPT,” Law.Com International, February 9, 2023.

Paul Riermaier, “ChatGPT and Other AI Technologies in the Study and Practice of Law,” Penn Carey Law, University of Pennsylvania, February 6, 2023.

Kevin Roose, “Bing’s A.I. Chat: ‘I Want to Be Alive,’” N.Y. Times, Feb. 16. 2023.

Derek Thompson, “Breakthroughs of the Year,” The Atlantic, December 8, 2022.

The Federal Depository Library Program . . . Going “All-Digital”?

With more than 1000 library participants nationwide, the Federal Depository Library Program has long served both as a means for the public to access materials published or authored by the federal government and for ensuring the preservation of current and historical government information.  Libraries that participate in the Program include academic, judicial, and government law libraries, college and university libraries, public libraries, and historical societies; member libraries have historically received print and /or microform copies of selected government documents at no charge and, in more recent times, digital access to some government publications as well.  In late 2021, the FDLP notified libraries that, beginning in 2022, distribution of documents in microform formats would be phased out.

Since 2014, libraries new to the FDLP have had the option of selecting only those documents that are available in digital formats (so, they’re all-digital FDLP participants!).  Having seen how the pandemic affected access and use of government information, and recognizing the needs of more and more libraries to move to nearly all-digital collections, the FDLP formed a Task Force on a Digital Federal Depository Library Program in early 2022.  This past September, the Task Force released its draft report for public comment. 

The Task Force was charged with determining “whether an all-digital FDLP is possible, and if so, [defining] the scope of an all-digital depository program and [making] recommendations as to how to implement and operate such a program” (Draft Report for Public Comment).  Data that lend context to the Report included the facts that (i) 25% of current FDLP participants already select only digital or nearly only digital documents; (ii) an additional 17% of federal depository libraries intend to transition to a truly all-digital collection; and (iii) 97% of new federal government documents published since 2009 are in digital formats.

The work of the Task Force was apportioned among six working groups charged with considering, respectively, the impact of an all-digital FDLP on access; the impact on depository libraries; the impact on federal agencies; the impact GPO, library services, and content management; Title 44 and legislative and policy issues; and implementation and strategic framework necessary to support a transition.  The first thing to note is that the Task Force’s use of the term “all-digital” does not imply that the FDLP would be exclusively digital.  Rather, the Task Force acknowledged that “alternative formats of both current and historical information would continue to be available” – for how long and which documents are included are yet to be determined (Draft Report for Public Comment). 

After outlining both the benefits of an all-digital program (e.g., improved access and metadata; flexibility for participating libraries) and the risks of not going all-digital (e.g., lack of standardization among agencies and lack of a systematic approach to collection and curation, let alone missed opportunities), the Report noted some corresponding barriers to and disadvantages of the all-digital approach (e.g., digital disparity; accessibility issues; challenges associated with particular types of government publications; authentication and version control; user privacy).

In the end, the consensus of the six working groups was that FDLP members would benefit from an all-digital approach and that the FDLP should indeed go all-digital.  Some working groups commented on the underlying laws and regulations that would affect or in fact inhibit the transition to an all-digital program and others described the infrastructure that would be necessary to develop and maintain an all-digital approach.  The recommendations in the Report focus on (i) ensuring cost-free access to government information; (ii) protecting the privacy of users of an all-digital depository library system; (iii) determining which documents should continue to be distributed in print and for how long; (iv) developing both standards to ensure authenticity and version control and best practices for digital preservation; (v) allowing different levels of participation among libraries; (vi) creating training to enable participating libraries to locate digital materials and curate digital collections; (vii)  collaborating with agencies, libraries, and others to ensure access to technologies that support the use of an all-digital FDLP; and (viii) considering new bibliographic resources and support for FDLP libraries. 

As the Report states, “[t]he move to an all-digital FDLP is not revolutionary, but rather in many ways, evolutionary and would result in the formalization of a process long-underway as increasing amounts of U.S. Government information are born digital.”  That said, there is much work to be done in reliably standardizing the creation and collection of authoritative government information.  But the fact that the FDLP is moving in the right direction – with thoughtfulness and a pretty comprehensive approach – is nothing but good news both for libraries that participate in the program and those who might want to do so in the future (and, really, for all of our patrons as well!).

Sources:

Draft Report for Public Comment, Task Force on a Digital Federal Depository Library Program, September 14, 2022.

Association of Research Libraries Statement on Digital FDLP Task Force Draft Report, October 31, 2022

$350 (More or Less) Could Be On Its Way to Your Law Library (From PACER) . . .

Litigation that began back in 2016 focusing on the amount and use of PACER fees collected by the federal judiciary settled last week (the settlement still needs to be approved by a federal judge).  The class action, filed in the federal district court for the District of Columbia, claimed that PACER fees were excessive and were used, in part, for unauthorized purposes (e.g., technology improvements for the federal courts).  Thanks to the settlement, PACER users will receive a refund of up to $350 of PACER fees paid from April 2010 to May 2018.  For those who incurred more than $350 in fees during that time, additional funds may be distributed after the initial refunds (of up to $350) have been made.  The settlement applies only to payments made in the past and does not affect current or future PACER fees. 

That said, apart from the litigation (but likely encouraged by it), the judiciary has eliminated some PACER fees since the lawsuit was filed and proposed legislation winding its way through Congress would make PACER cost-free.  That would mean that the judiciary would lose about $150 million in annual fees; reports have suggested that it costs around $64 million to update and maintain PACER annually. 

The following sources provide more information on the litigation and the settlement . . . and I am sure that the court filings are all available . . . . on PACER . . . .

Bloomberg

Law.com

Politico

Reuters

Washington Post