With artificial intelligence headlines appearing daily, making sense of it is hard. I doubt it will disrupt Big Law but believe it can create significant practice efficiencies. To get beyond the hype, it’s helpful to look at specific AI applications. So I was pleased to talk twice in June with Blue J Legal CEO and co-founder, Ben Alarie, about his company’s machine learning software (a type of AI) that answers legal questions. I’ll start with some background on Ben and a bit about the company, then describe the technology, and close with with some business model and general comments.

The Company and Co-Founders

Ben holds the Osler Chair in Business Law at the Faculty of Law at the University of Toronto. He and two of his co-founders are law professors. I asked bluntly how a bunch of law profs could run a successful business. Whoops, my legal academy views flashed neon. Ben said his law prof co-founders  all have data science experience and business acumen, plus a fourth co-founder is a senior enterprise software architect with years of experience, including at IBM.

My skepticism was further allayed on learning that Blue J has about 20 employees. That includes five full-time developers, about a dozen legal researchers, and the firm’s co-founders. Customers include Oslers, PwC, KPMG, and Deloitte.

The company has not taken traditional VC money but does have funding from several investors, including one of the Big Four accounting firms, the Canadian Tax Foundation, and the Business Development Bank of Canada (part of the Canadian government). Ben reports that the company has deferred a number of requests to invest and that that a Series A round could come soon.

The Technology and Offering

Blue J Legal uses ML and a rules engine to answer legal classification problems. Examples include whether a worker is an employee or contractor, a person is a Canadian resident for tax purposes, or spending is a current expense or capital expenditure for tax purposes.

The system ingests case law (or comparable agency rulings) as the basis for determining classifications. A combination of data scientists and legal researchers work with the material to formulate questions to ask users. Users answers those questions in natural language. The system returns a confidence level for the answer and explains in several nicely written paragraphs how and why the system reached its proposed classification. It also displays the five most relevant cases. Users can click through to see the full-text of those cases and/or view additional relevant cases.

The system does not yet highlight sections of the returned cases that contribute to the answer. That apparently is a pretty difficult ML challenge. Ben did say, however, that future releases would provide visual clustering of cases and/or more metadata about each case in the best answer set.

The user interface is simple and answers read like they are written by a lawyer. The simple UI and fast answers belies a lot of data science and legal research that underlies the system. The path to answers is interesting. The legal researchers construct various answer elements, which include various phrases, sentences and paragraphs. The machine learning evaluates the user’s facts using the ingested case law to identify the substantively correct classification and the system’s confidence. Then, the rule engine combines the user’s inputs, the ML outputs, and the pre-written answer segments to assemble a nicely written, complete answer.

Ben said that answers for some of the classifiers exceed 98% out-of-sample accuracy (and noted that lawyers still have a duty to confirm the answer). I asked what numerator and denominator yielded this percentage. Ben explained that it is an ML-derived measure so there is no actual fraction. He explained it as the percent of instances the system would be correct with the specific facts as entered by the user, based on out-of-sample testing of the underlying ML models.

To achieve such high confidence, I wondered how many cases Blue J might need to ingest. I was thinking in the hundreds if not more. Ben suggests the mid to high-double digits (meaning 50 to 100) are often enough for reasonably high confidence. That surprised me and I look forward to further field experience to confirm this.

The company does not sell the software as an engine. A legal organization could not turn it loose on content and have it work. Rather, systems must be set up with the assistance of data scientists and legal researchers.

We did not talk much about the development roadmap but Ben did share that a future iteration will generate an automatic sensitivity analysis of input (user answers) to outputs (legal answers). I think that will be useful. It may help lawyers reduce unnecessary time spent on fact gathering. If some factual answers have relatively little impact on the outcome, then why bother spending time on it? Or the client’s risk tolerance should drive that choice – an issue many lawyers do not sufficiently understand today.

Business Model, Questions, and Conclusions

As with many start-ups, the business model remains fluid. The target market today is lawyers and other professionals, with license terms similar to other enterprise software as a service. I told Ben that in the US, releasing this type of system to consumers or smaller businesses might raise unauthorized practice of law issues. (Let the record show I largely object to UPL claims.)

The company is considering licensing and distribution models. Presumably other organizations could use the engine if they had data scientists and legal researchers to run it.

I think Blue J has great potential. First, many legal questions translate nicely to ones of classification. And second, the software scales favorably, meaning building a system for US law, with its vastly bigger volumes of content on most topics than in Canada, would not take that much longer than for smaller content set in Canada.

The question I have about achieving the potential goes more to the scaling the system-building process. Plenty of people can do legal research but we have a shortage of good data scientists and ML specialists. They command hefty salaries. Even if compensation levels come down, the question of the break-even point for building a system remains open.

The final question about potential goes to the business model. Only a handful of law firms have built content- or rule-rich client-facing systems. The economics of traditional Big Law do not favor that. And law departments appear to invest even less in such systems. So that leaves legal publishers to consider, as well as the Big Four and alternative legal providers.

Even if building systems were easy and cheap, uptake could still be slow. One Big Law friend told me that uptake of Lex Machina – which he said demonstrably improves patent analysis and representation – took several years because lawyers just did not want to learn it. Some things take years – maybe decades – to change.

[Update on 6 August 2016: A YouTube demo of Blue J by Ben is now available here.]