They say a lawyer never asks a question before knowing the answer. If so, then what is a lawyer supposed to ask on a first date? For that matter, what is anyone supposed to ask on a first date? In 2011, the dating site OkCupid ran some statistics. According to OkCupid’s own users, if your aim on a first date is to find out quickly whether a total stranger might be your soul mate, then the three best questions for you to ask are:
Q: Is God important in your life?
Q: Is sex the most important part of a relationship?
Q: Does smoking disgust you?
But OkCupid already had a database of more than a quarter-million questions that its users have posed to one another, and more than 775 million answers. More importantly, it had a list of some 35,000 successful couples who had met on OkCupid. So OkCupid ran the numbers. The three user-suggested questions above were not terrible predictors of relationship success: about 15% of successful couples agreed on the answer to all three questions, more than twice the chance of agreement than would occur by chance alone. But another set of three questions were an even better predictor. 32% of successful couples agreed on all three, 3.7 times the chance of agreement by chance alone. Those three questions were:
Q: Wouldn’t it be fun to chuck it all and go live on a sailboat?
Q: Do you like horror movies?
Q: Have you ever traveled around another country alone?
It’s a small example of what popular science calls “big data.” If you have a large enough pool of information (say, 775 million answers to dating questions), and if you have the ability to process it all (say, the computers at OkCupid), then you don’t need a testable hypothesis (say, that God, sex and smoking are three determinative forces underlying love). Instead, you can just run the numbers and look for the correlations. According to OkCupid’s data, sailing, horror and travel trump God, sex and smoking. Why? It doesn’t matter why! It simply is. This is the thrill of big data.
And big data is everywhere: every time we pay for a transaction with a credit card, or post a status update on Facebook, we add some data to the pool that a company might choose to analyze for useful correlations. How are the automatons at Amazon able to accurately recommend something as personal as a book? Big data. How is something as electronic as Google able to watch for something as human as a flu outbreak? Big data. What is the latest hope to curing cancer? Sequencing every cancer genome—otherwise known as big data. And big data keeps getting bigger.
The law has not been immune. Lawyers have always proven their cases by searching for evidence of who said what to whom. With the advent of digital communication—online conference calls, online video chats, emails, IMs, social media—there is more evidence than ever. Increasingly, if you want to have your day in court, you need to be prepared for big data. And it does not come cheap: in 2009, the Compliance, Governance and Oversight Council reported that, because of the growth in electronic data, document discovery and review represented between 50% and 70% of the Fortune 500′s total litigation costs.
But it appears that the law may be turning a corner; lawyers finally have tools powerful enough to sort through all the data their clients can generate. The market for outsourcing tasks such as document review has reached maturity over the past decade, scaling down the cost per page of preparing for trial even as the number of pages has increased. At the same time, the sophistication of document review software has improved, allowing small teams of attorneys not only to search for key people and key terms—but to see the patterns in the data. What sorts of events would prompt the thieving employees to switch their conversations from email to telephone? How many fantasy footballs emails did the negligent CEO write on the day he should have been preparing for a call with analysts?
Looking to the future, we expect computers like IBM’s Watson to be able to assist lawyers in sorting through millions of documents at a fraction of the cost—a “digital associate” as IBM executives have put it—just as Watson is already able to help doctors review vast archives of medical research to prepare on-the-fly treatment plans for cancer patients.
The upside for big data and the law? People getting their day in court much more quickly and much more cheaply. And that means more time and more money for sailing, horror movies, and solitary trips abroad.