Bringing data tools to Google’s legal department

Editor’s Note: Recently, at the Chief Data Scientist, USA, we had the pleasure of being joined by @SiliconANGLE Media, Inc. (theCUBE) to interview some of our attendees about the world of Data Science. Watch the video below. This article written by Gabriel Pesek was originally posted on SiliconANGLE blog.

Disruption of traditional practices has become a big selling point for various technologies in recent years. But no matter how committed to an ideal a company may be, it will likely run into severe difficulties when trying to apply it to firmly established aspects of the enterprise experience, particularly in the legal department.

At the recent Chief Data Scientist, USA event, Jay Yonamine, head of Data Science for Global Patents at Google, sat down with Jeff Frick (@JeffFrick), co-host of theCUBE*, from the SiliconANGLE Media team, to discuss how the many data-crunching tools at Google’s disposal are being applied to the centuries-old process of patenting. Yonamine also discussed the biggest difficulties Google has encountered in doing so.

Culture clash

Yonamine began by noting that on the “customer-facing products” side of Google, almost everything is cutting-edge, with “machine learning, artificial intelligence, data-driven efficiency,” and so on. But on Google’s internal legal side, things aren’t quite as accelerated.

“Legal, as an industry, is sort of traditional, entrenched in established processes,” Yonamine stated, pointing out a perceived tension stemming from the legal and patent departments’ staff wanting to have as much access to the Google’s disruptive technologies as other departments.

“The challenge we face is demonstrating that the new tool, the new machine learning algorithm, the new application, is actually helping us in an observable, measurable way, relative to the sort of entrenched status quo,” he explained.

As Yonamine sees it, a lot of time is spent simply figuring out how to evaluate performance of the existing processes so that a baseline exists for comparison to the newer technologies that could potentially be applied. “That mindset, of a really data-driven approach, not just to building tools, but to measuring the ROI of the tools we build, is a challenge we’re working on,” he said.

Clarity and comparisons

Yonamine also identified two big areas of opportunity for improved data-handling within the patenting process. The first, “the actual prosecution,” involves filing the patent and going through the back-and-forth, including textual adjustments, as well as determining whether the pay-off of the patented item would justify the cost of the patenting process; for Google, this is a process involving tens of thousands of applications on a regular basis.

The second is “transactions,” with companies buying, selling and licensing portfolios, and the contingent assessment of the relevance of those portfolios’ contents to the products of the involved companies.

In each of these areas, time and knowledge constraints make it unfeasible to have value assessment of these portfolios done by people, but Yonamine sees plenty of room for machine learning and algorithms to help with the determinations.

In addition to the patent-analysis tool of Google hosts, he also pointed to the “tremendous push to machine-readability” in the data of patenting as a major step in the right direction, though he acknowledged a few lingering data-streams that provide that data in PDFs, making it “a bit clunky to get at the underlying text in a way that’s machine-readable.”

The big picture

Taking a big picture view, Yonamine picked out cost at scale as a huge driver of improving the patenting process, but emphasized the importance of establishing a clear model of operations with which improvements could be compared.

“To the extent that ‘better’ is subjective, it tends to be a losing proposition to prove the ROI of the machine-learning application if no one can agree on how much better the final result was. So that’s kind of the cultural challenge in a nutshell,” he explained.

Watch the complete video interview below:

.embed-container { position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; max-width: 100%; } .embed-container iframe, .embed-container object, .embed-container embed { position: absolute; top: 0; left: 0; width: 100%; height: 100%; }