15 June 2017
Meghan Keary: Commentator
Graeme Grovum: Corrs Chambers Westgarth – Head of Corrs Innovation
MEGHAN KEARY: Hello and welcome to High Vis the Corrs Chambers Westgarth Construction Podcast. My name is Meghan Keary and I am a lawyer in the Construction team. Corrs recently launched its JustOCR product a unique product which scans electronic documents giving them a quality rating and flagging those with poor searchability. The document fills an unmet need in the market helping identify gaps in other party’s document collections. In a construction context we are often faced with large volumes of electronic documents that need to be managed and reviewed. I am joined today by Graeme Grovum Head of Corrs Innovation. Graeme was a part of the team that developed this game changing product, welcome Graeme. Graeme can you tell us a little bit more about what drove the development for JustOCR?
GRAEME GROVUM: I sure can. So really we can think about this in two different streams there are two elements to it. First of all there’s the time cost equation and then second of all there’s our focus on delivering the best possible outcome for clients. Now as background OCR is the process of making an image that contains text searchable so it stands for Optical Character Recognition and litigation and arbitrations frequently require sifting through vast quantities of data in order to identify a small number of relevant documents. Now given the proliferation of data over the last decade the process now requires that we use computer aided searching in order to conduct this work in efficient and proportionate way. So if we think first of all about the time cost equation a number of years ago now we made the decision to move into Amazon Web Services and what that did for us was really opened our eyes up to the possibilities of scale in a true cloud environment. On a second point we recognised a glaring deficiency in a standard approach that most parties adopt when going through a discovery process or arbitration process as it relates to searching documents and that is that most people consider the OCR process to be a binary outcome. So after the processing is complete either the document is not searchable or it is searchable. What we recognised is its very much a scale or a gradient so depending on the quality of the image that you are scanning in the first instance you may end up even after processing with a document that is not very searchable and that’s something that we needed to address in order to determine how we could provide the best outcome for our clients. Now what we did is we solved this by creating a technical process for it determining the quality of searchable text. And creating this technology means that we can now identify the black spots in document collections. So those documents that even though they have been made “searchable” still aren’t going to respond to your searchers and that’s because the text is still poorly OCR’d it’s not very searchable.
MEGHAN KEARY: Now are you able to walk us through how this product works perhaps in the context of existing electronic document management products?
GRAEME GROVUM: Absolutely so as the name suggests all we do is OCR just OCR and the process is this you would have a document collection now that might be in a litigation management elevating in any sort of document management database. You would export the PDF’s that you would like to have OCR’d and then it’s just a matter of drag and drop upload the OCR processing and analysis happens automatically, you’re alerted by email as soon as the work is complete, you click the link and go back in and download the work product and as well as the OCR analysis report and that’s it. It’s very very simple.
MEGHAN KEARY: Fantastic. It will not be a surprise to those in the construction industry that projects and disputes can generate large volumes of documents and this grows as technology in the area develops. What are some of the time and cost benefits that the parties in construction disputes can expect to experience with just OCR?
GRAEME GROVUM: That’s a good question. I alluded to this earlier but here’s a real world example. When we just moved into AWS we were doing some testing and trying to figure out exactly what was possible within the environment and so I created a test which was taking one and a half million pages to be OCR’d and running that through the system. Now the previous methodology that we used was to use workstations, physical workstations internally and process them that way. One and a half million pages going 24 hours a day would have taken us about six hundred and fifty hours to complete. If we instead decided that we needed to get a bureau engaged in order to do this work for us and we were looking at being about in the area of $30,000.00 in terms of cost that that would incur for the client. Now neither of these are frankly too palatable so we needed to come up with a better way and the initial results that we got with this testing in AWS was that we were able to complete the one and a half million page process and complete it at a price point that was over 80% lower in both instances which is just phenomenal and so the question there was how do we actually now turn this into something that will benefit our clients first and foremost the firm as well but then also the industry more broadly. Now what we did from that was create JustOCR which is kind of the topic at hand here and what we have done is create a standalone service that anyone can come and use to upload their document, do the processing themselves. The price point that we have set it at is about half a cent US per page which is about 50% lower than the low end of the market right now. We are also turning around in about half the time that you would expect if you sent it out to an external provider and on top of that what we are doing is taking the OCR analysis technology that I’d spoke of earlier and we are just including that for free at no extra cost so anyone that comes and uses the services now are also going to be able to identify those blackspots in their collection or and this is a scenario that we’ve certainly used within Corrs as well analysing the other parties data when they provide it to you and that lets you know if not everything is going to return, the searches that you might expect on the basis that the text isn’t as searchable as it should be.
MEGHAN KEARY: So we’ve spoken about how this assists dispute lawyers but what about those front end lawyers how can this assist in large scale document reviews?
GRAEME GROVUM: The interesting thing about all of this is that OCR is a pretty boring topic for most. It’s not something that's, you know, front of mind for most people but the reality is that the ability to search text within a document collection is fundamental to pretty much everything we do as lawyers so whether you are working on a due diligence, whether you are working on an arbitration, whether you are using artificial intelligence like the Beagle's of the world, the Kira's for due diligence, the Luminance's for due diligence each of these technologies requires a searchable collection of data that resides underneath and that’s what we are doing we are making sure that people understand the limitations of their data set.
MEGHAN KEARY: Thanks Graeme for giving us some insight into this exciting new software and we look forward to hearing more about how it can benefit in saving time and money on matters with large documents. To our listeners, we hope you will join us again for the next episode of Corrs High Vis.
This Podcast is for reference purposes only it does not constitute legal advice and should be relied upon as such. You should always obtain legal advice about your specific circumstances.
Listen and subscribe to Corrs High Vis on:
This publication is introductory in nature. Its content is current at the date of publication. It does not constitute legal advice and should not be relied upon as such. You should always obtain legal advice based on your specific circumstances before taking any action relating to matters covered by this publication. Some information may have been obtained from external sources, and we cannot guarantee the accuracy or currency of any such information.