Law in the Age of Big Data

Introduction

The term Big Data has become a buzzword in our daily life and the importance of it continues to grow as the speed of the number of data increasing is way too fast. Data is the oil of the digital era and the storm of Big Data is reshaping almost every part of the society (Economist, 2017). Yet, compared to other social domains, the number of reflections in law is relatively small. The connection between Big Data and the Law can be thematised in several ways. On one hand, Big Data itself has become the data or the subject of the Law. How should law frame, define and regulate the Big Data phenomenon is quite urge and has become a hot issue. Big Data causes ethnic dilemma between its value-created predictive power and vulnerable personal privacy and it haunts data protection theories. New methodological and theoretical approaches should be concluded faster to tell the society what the Big Data regulation looks like. On the other hand, the term Big Data provided a new tool for legal practice and legal academy. Firstly, Big Data techniques and technologies provides the whole legal sector an opportunity to deal with the desperately increasing number of new legal data and a chance to eliminate the human incompetence like bias and human error from the legal system. Secondly, it reshaped the traditional way of conducting legal research, a group of legal scholars have decoupled themselves from their ‘traditional’ role of analysing and commenting on case law and draft legislation (doctrinal research), instead, a data-driven empirical approach is praised (Stolker, 2013). However, all of these arguments are one-sided, whether a new phenomenon mainly based on linkages, probabilities and numbers can be integrated smoothly into the abstract and theory-laden legal area based on human complexity is still uncertain (Devins et al, 2017). This essay wants to take the chance to analysis arguments from both sides. However, as the reflection of Big Data and its associated technologies and techniques in legal area is immature and still developing, many arguments are consumption-based and theory-based instead of being built on pure facts.

Before getting straight to the point, this essay wants to introduce some backgrounds. Firstly, what is Big Data? Secondly, what is the traditional and shared way of legal thinking among not only legal academy but also among legal practice. Finally, as data is the core of Big Data but is a rarely-used word in the area of legal research, this essay will analysis different aspects of data in the legal area and their historic developments.

Even though the question requires us to focus mainly on the area of legal research, it is hard to make a clear cut between legal academy and legal professions, as many legal practices are academic-based especially when legal professions involve judges. There are no huge differences between the way a legal scholar conducting a research and a judge making a decision, a legislator considering a new law or even a lawyer making strategies for a lawsuit (Most of them will apply a doctrinal approach and this will be discussed later). Also, it can be said that everything happens in legal practice can be the subject or data of legal research. Therefore, this essay will not only focus on legal research but also legal profession and legal regulation.

What Big Data exactly means?

The origin of the term Big Data is hard to trace. Diebold used a mildly tongue-in-cheek title “I Coined the Term ‘Big Data’ ”, to claim that he was the first one to use the term Big Data. It was later overturned by Lohr (2013) and Lohr agreed with Douglas Laney that credits should be given to SGI’s chief scientist John Mashey. Nevertheless, who coined the term Big Data is not the focus of this essay, the more important thing is what the term means. The most famous definition of big data is given by Laney’s ‘three Vs’ framework (2001). In his report, three defining properties or dimensions of Big Data are introduced: the large amount of data (Volume); the high speed required at which data are being generated, collected and processe (Velocity) and the various categories of data (Variety). This framework keeps being expanded. More Vs like Value, Visulisaiton and Veracity of data are added into the Vs family (DeVan, 2016). However, the Vs family can hardly be considered as a classical definition: it only gives the differentia specifica describing how Big Data differs from other ‘things‘ but does not provide the genus proximum where Big Data belongs to (Ződi, 2017). Due to the loose and on-going definition of Big Data, this essay wants to narrow it into a simplest way: Big Data are the new ways, mainly technical, of producing, collecting, processing and using data, which together, as a driving force, might ultimately change the mindset, the attitudes, and all of society (Ződi, 2017). Current Big Data technologies are based on highly automated data mining and machine learning skills. With the help of meta datasets, they create a big “predictive power” (Siegel, 2015). It enables businesses to know more about their customers by finding more linkages between things. However, it is wrong to “assume that once all the data is collected it will be a fairly straightforward step to move seamlessly to ‘correct’, and presumably useful, conclusions” (Bryant and Raja, 2014, p.2). That is to say, the linkages analysed by Big Data technologies may sometimes be meaningless and simply apply it may cause wrongful conditions. This argument will be expanded to details later when discussing the role of Big Data technologies in legal areas.

Traditional legal way of thinking: a doctrinal approach

Different from other social domains, it is quite true to say that among legal areas, legal scholars from the academic side and lawyers, judges and regulators from the practice side share the same doctrinal way of thinking, analysing and solving problems. A doctrinal approach “always analyses texts, namely some important texts in the framework of normative concepts, which is partly established by the text of the laws, partly by judicial practice, and partly by legal scholars (Ződi, 2017, p.81).” Based on a doctrinal approach, when considering a new legal development, past developments like past legislation, past cases are the most important components. Maintaining a consistent and foreseeable legal system is the cornerstone of the legal doctrinal research. However, as an important aspect of the society, creating a legal system that affects the actions of its citizens in a positive way is another important component, in legal aspect, we can it “Policy reasoning”. It is obvious that a doctrinal way is unable to quantitative the effect of a new law or a new case decision on the society. Legal researchers always predict it based on former experiences and associated social researches where similar legislations occurred. When the number of empirical legal researchers is increasing, this job is left for them. Methodologies empirical legal researchers use are similar to social science. However, the advantage is that empirical legal research can foresee the effect of a legal change on the society rather than wait for the results of social researches after a change has made.

The historical development on different aspects of legal data

Throughout the history, data is a rarely-used words in legal areas. We talk more about things like legislations, cases, government policies and other sources of law. In facts, all of these can be described as legal data. The main forms of data representation are documents. This situation is true for doctrinal researcher, but for empirical researchers who are eager to map law in society, more data like social surveys and questionnaires are required. Forms of representation for those data are more various, tables, graphs and other types of visulisation. Therefore, when a judge is deciding a case, a legislator is creating a new law and a legal researcher is conducting a research, they are all “analysing data”.

The development of techniques and technologies did change the way legal data is represented in some ways. Before legal data was digitalised, most legal scholars laboriously pore through books to search case law. The start of the computer revolution begins in 1973, when Lexis invented the red “UBIQ” terminal to let scholars search case law online. The revolution quickly moved from document searching to document creation when Wang introduced a computer dedicated to word processing (Friedmann, 2004). However, no matter how computer and digital technologies were developing, their functions were limited to help organise and manage datasets to help to search specific legislation and case faster. The analysis job was never passed to a computer or an automated coding in legal areas.

When the society moves to an age of Big Data, things change slightly. More and more legal contract automation technologies occurred and companies like PWC (interesting it is an auditing company) are proving their clients automated contract filing service. The incorporation of Big Data technologies seems a jump from simple data management directly to highly automated data analysis in the legal area. However, we should remember that contract filing is the simplest paperwork in legal firms, this change does little more than just take the jobs from paralegals and junior lawyers. When looking other aspects of the legal industry, the incorporation of Big Data technologies only limited to help lawyers predict judges’ biases or deciding their counsel fees. These things do matter for a law firm but change little in terms of legal data analysis. One reason is that most legal data are unstructured and it is hard for a Big Data technology to predict valuable outcomes (even can, it costs too much and is time consuming). Another reason is simply that the legal side does not trust a machine can solve human complexity better than a human. For the latter reason, this essay at the next section wants to analysis different scholars’ different opinions on how Big Data technologies can be incorporated into law. Most of them are based on assumptions as there are limited facts but are worth to see as they provide directions.

Big Data technologies as a new tool

5.1 A personalised law: uphold interests of the minority?

It is undeniable that human beings are becoming more and more incompetent when facing a world with millions of new data occur every day. It can be foreseeable that Big Data technologies must step in to some extent. Let us firstly imagine a legal system that is mainly based on Big Data technologies. In such a world, “laws are supposedly calibrated to policy objectives and optimal human behavior, based on a machine analysis of massive amounts of data, thus cutting out human bias, incompetence, and error (Alarie et al., 2016).” When looking into details, for example, policymakers might be able to ignore status and regulations, based on Big Data technologies, they can customised micro-directives, or even an AI can create automated regulations that is only suitable for you. Therefore, Big Data is driving a trend towards behavioral optimisation and “personalized law” (Devins et al., 2017). This picture of the future sounds exciting and promising. A legal system based on human beings can only preserve the interests of the majority. With a personalised law and more micro-directives, we can uphold interests of minorities.

This future incorporation of Big Data technologies (if they are invented) can be seen as a total overturn on legal way of “data analysis”, a personalised law that prevails optimal human behabvior is totally ignoring what doctrinal researchers insist, the consistency of the legal system. Or it is doubted whether there is any written legislation as the solution of a dispute may be calculated after a dispute has occurred. In areas like taxation law, it seems a good choice. Not a single tax lawyer will deny the current taxation law is broken. We can see the intention of tax legislators to treat different groups with different tax burdens through income tax. However, this intention never works out well, human beings become more and more incompetent to manage all the different conflicting interests and incentives in a world of globalisation and individualism. It causes taxation law becomes hopelessly complex and inflexible, not a single nation can remain its consistency. A Big Data-based tax counting and collection system sounds a good solution, you just need to type in all your personal information and a customised tax rate will be created for you based on factors like your social status or even just your level of happiness. However, as tax law is in a mess, it is such an outsider among the legal system and it simply cannot be a good reflection of the whole. Sadly, even Adam Smith () argues that tax should be designed to manage wealth gap and preserve equality, tax nowadays is just a pure game of calculation and its value should be rebuilt. However, it is not the case for other areas of law where they should be consistent enough to be “foreseeable” for their citizens, the more easily a citizen can foresee the legality of an action, the better. For example, criminal law which acts as a role of “general prevention” where punishment can prevent others committing the same crime. However, if the punishment is waiving, it will cause fluke mind and cause social disorder and uncertainty.

5.2 Al judges: wipe out human biases and errors

Also, in terms of equity, human beings have shortcomings when judging. We are born with bias based on our backgrounds and affected by emotions. How to make a judge neutral is a problem that is never resolved. However, with an AI judge based on Big Data algorithms, it seems to be possible to wipe out biases when interpreting discretionary categories, such as what is “reasonable” or “fair”. Current research has also argued that if the predictive power of BD analytics is so powerful, it is better to use these algorithms to establish sentences in criminal cases to eliminate proven sentencing disparities (Volkov, 2016). However, we should know that the stage of sentencing is after and totally based on the stage that judges declare one’s guilt. We need to commit that neither a human nor an AI can know the truth of a crime, even 99.99% probabilities never equals to truth. If Al judges based on probabilities can decide one’s guilt, this world will become horrific.

A recent research conducted by Wu and Zhang (2016) can help us foresee this situation. The authors build four classifiers based on supervised machine learning to discriminate between criminals and non-criminals by scanning only their face images. It reminds us of the so-called “father of modern criminology”, Cesare Lombroso, who argued that criminality was not a matter of sin or free will, instead, could be a medical problem based on biological inheritance. It is simply an idea of born with innate criminality. Just imagine a machine has made a prediction that a certain group of men with a definite skin and hair colour, height, social status will commit violent crimes with a 90 per cent probability, Will the authority at least place these people under surveillance? And if they do, how this will be justified? How can any measurement be justified that is based on attributions that are not under the control of the person? As Devins et al (2017) has argued that law is theory-laden, it is abstract, value-based and built on compromise. Even though everyone in this world thinks a person has killed the victim, without actus reus and mens rea that can be accepted by law, this person will be unguilty. This is the space that this legal system left for everyone to defend their actions and to have their words. If a 99.99% possibility can sentence someone to death without a chance for him to defend himself, this society is horrific.

4.2 A “super-empirical” legal research

“Conservatives as they supposedly are, lawyers or legal scholars are often viewed as more preoccupied with the past than with the future (Dyevre, 2016, p.2).” Legal disputes, after all, are solved based on rules or cases arose before. For a new law draft, no matter how advancing the topic it relates to and how prospective it should be, traditional doctrinal legal scholars, at first step, will turn their attention to past legislations and other sources of law to ensure this law draft will not be “outsiders” that pose threat to consistence of the whole system. However, it does not mean legal scholars do not pay too much attention to the future. Instead, the legal sector is bound to make assumptions and predictions for the future, “however broad and tentative, over the shape of the world to come whenever we decide what our children should learn at school, what training future workers should receive or, more generally, how we should invest our scarce resources (Dyevre, 2016, p.2).” However, are the law sill capable of keeping the merits of regulating the society in a positive way based on a doctrinal research? It has become one of the leitmotifs of the methodological writings concerning legal science that traditional ‘doctrinal’ scholarship seems to be in a crisis (Bodig, 2015). New developments arise every day because of information expulsion, with the volume of data continue to grow at exponential rates. The whole legal sector seems to be trapped and hesitates to make a move. If the legal system cannot respond quickly, it will be lagged behind and limit the prosperctive of the society. Bodig (2015) and Dyevre (2016) both claim that a data-driven empirical approach is a solution of the current crisis in legal methodology. In fact, there has been examples that shows Big Data techniques and technologies has successfully incorporated into legal research. A recent research used big data and machine text analysis to compare around 300,000 court texts, and demonstrated that the Supreme Court of the United States “may be edging away from its roots” (Livermore, Riddell and Rockmore, 2016). Rockmore (2016) argues “big data has opened up the complex world of judicial opinions to allow us to understand one of our most important institutions in ways that human reading and hand-coding simply cannot.”

However, like Anderson (2008), Bodig (2015) and Dyevre (2016) are both afraid that a data-driven approach may make the traditional doctrinal approach obsolete, and question that whether an empirical legal research is still a legal science. However, we must remember that within an empirical legal research, social facts are always observed or interpreted on a legal basis, that is to say what a person from a legal area think about the facts. Also, an empirical research always involves collecting and processing data. For manually coded research, it is quite clear that coding sometimes requires interpretation and partially arbitrary decisions, but even in the case of a machine-made analysis during the construction of the text-analysing algorithms, one must make certain decisions which can distort the data collection itself. Therefore, based on these two peculiarities, it means that BD-based research projects in law will not supersede doctrinal efforts; rather, they will rely on them. Doctrinal scholarship will provide the theoretical framework, the concepts and the distinctions that will serve as a basis for the higher narratives on which empirical and BD projects can build. But eventually, there will be a reverse process as well. BD research projects can offer new insights and ideas for which doctrinal scholarship can begin to build new theories and narratives (Ződi, 2017).

5 Big Data technologies as a new subject of Law

Big Data offers new opportunities for different aspects of our society. An often-cited example is Google Flu Trends. By making use of aggregate search queries (not originally collected with this innovative application in mind) to make early detection of disease possible. However, by the same token, for Big Data to reach its potential, data needs to be gathered at an unprecedented scale whenever possible and reused for different purposes over and over again. This puts Big Data on a direct collision course with the core principles of existing data protection laws. Lawmakers around the globe are struggling to find a new balance between the need to protect the information privacy of individuals and the demand to utilize the latent value of data. Nearly all categories of ‘traditional’ data protection are being questioned (Tane and Polonetzky, 2012; Crawford and Schultz, 2014). Traditional data protection regulation is based on the consent of the person and legitimate purpose of data use. But in the age of Big Data, so much data is generated by a person that simply no one can control it. Look back to the Google Flu Trends, ‘Can you imagine Google trying to contact hundreds of millions of users for approval to use their old search queries?’ Or what ‘legitimate purpose’ of the data processing means, when ‘the most innovative secondary uses haven’t been imagined when the data is first collected (Mayer-Schönberger and Cukier, 2012).’

Also, the harvesting of large data sets and the use of analytics clearly implicate privacy concerns. Information regarding individuals’ privacy like health, location, and online activity is exposed to scrutiny. Traditionally, organizations used various methods of de-identification to distance data from real identities and allow analysis to proceed. However, computer scientists have repeatedly shown that even anonymized data can often be re-identified and attributed to specific individuals. A recent research has published a study in which they were able to identify the participants in confidential legal cases, even though such participants had been anonymized (Vokinger and Mühlematter, 2019). By using a combination of artificial intelligence and big data, the study’s authors could mine over 120,000 public legal records and then use an algorithm to identify connections between them. Based on these linkages, the researchers succeeded in de-anonymizing participants in 84% of the judgments they mined, doing so in less than one hour. Legal confidentiality is a shield for citizens. However, it seems that this shield has just been broken. Paul Ohm (2010) observed that “[r]eidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization.” This disrupt may cause a confidence crisis, if a data subject cannot trust a data owner that his information will be stored safely or be processed in a manner that his privacy is respected, he will be very likely not to use the service any more. Big Data technologies will become meaningless without any data. And this confidence or trust should be rebuilt based not only on the self-discipline of data owner but also on a well-functioned legal regulation.

Richards and King (2014) focuses on a redefinition of legal privacy. Their argument can be separated into four parts. Firstly, they agree with Ohm (2010) that privacy should not be dead in the Age of Big data. However, they indicate that privacy should rather be perceived as ‘information rules’ than as ‘information we can keep secret or unknown’. It means a data subject defines what is privacy is never the case. Only privacy that the law thinks are acceptable remain being privacy. Therefore, the law can keep a balance between good practices within the Big Data industry (even though they trespass privacy) and the protection on privacy. Secondly, we must rethink our attitudes towards sharing personal information. However, shared private information can still remain confidential, and that is what counts. The third ethical standard in BD ethics is transparency. Transparency, as the authors stated, ‘fosters trust by being able to hold others accountable’. However, authors also indicate the transparency paradox, ‘Transparency of sensitive corporate or government secrets could harm important interests, such as trade secrets or national security. Too little transparency can lead to unexpected outcomes and a lack of trust. Transparency also carries the risk that inadvertent disclosures will cause unexpected outcomes that harm privacy and breach confidentiality’. Finally, citizens should be empowered to define ourselves against the machine-made identity analysed by public institutions like government. However, all these four arguments rather point out contradictions instead of providing a solution. The authors keep mentioning balance but fails to give a picture of how. This is common in similar researches.

European Union had a first try in 2016 when reviewing its 1995 Data Protection Directive and introducing the so-called General Data Protection Regulation (GDPR). It took the side of customers which requires a clearly defined purpose at the time of data collection, and data cannot be reused for a very different purpose. Also, a data subject can request obtain information about ‘the existence of automated decision-making […] and meaningful information about the logic involved’ and is entitled to ask for ‘human interventon’. Superisingly, European Union insist on the old consent theory and requires more transparency. According to the European Commission (2014), the main purpose of this legal change was to maintain the trust of customers which is key to build a healthy data-driven digital economy. Whether this aim is achieved can be reflected by the case of two American legal information service companies that are building their services on the liberal publication policy of American court documents. These services, in their existing form, simply would be legally impossible to build in Europe.

Bibliography

Kapoor, A. (2016). Data could be key in driving India towards a $5 trillion economy. [online] The Economist Times. Available at: https://economictimes.indiatimes.com/news/economy/policy/data-could-be-the-key-in-driving-india-towards-a-5-trillion-economy/articleshow/70406051.cms [Assessed 1 Jan. 2020]
Lohr, S. (2013). The Origins of ‘Big Data’: An Etymological Detective Story. [online] Bits. Available at: https://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/ [Assessed 1 Jan. 2020]
Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety. [online] Meta Group. Available at: https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf [Assessed 1 Jan. 2001]
DeVan, A. (2016). The 7 V’s of Big Data. [online] Impact Radius Blog. Available at: https://impact.com/marketing-intelligence/7-vs-big-data/ [Assessed 1 Jan. 2020]
Ződi, Z. (2017). Law and Legal Science in the Age of Big Data. [online] Researchgate. Available at: https://www.researchgate.net/publication/320643327_Law_and_Legal_Science_in_the_Age_of_Big_Data [Assessed 1 Jan. 2020]
Siegel, E. (2015). Predictive Analytics : The Power to Predict Who Will Click, Buy, Lie, or Die, Revised and Updated. New York: Wiley.
Bryant, A and Raja, U. (2014). In the Realm of Big Data… . First Monday, 19(2), pp.1-18
Friedmann, R. (2004). Back to the Future. [online] The American Lawyer. Available at: https://www.law.com/americanlawyer/almID/1101136517607/ [Assessed 1 Jan. 2020]
Alarie, B, Niblett, A and Yoon, A. (2016). Regulation by Machine. [online] SSRN. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2878950 [Assessed 1 Jan. 2010]
Devins, C, Felin, T, Kauffman, S and Koppl, R. (2017). The Law and Big Data. Cornell Journal of Law and Public Policy, 27(357), pp. 357-413.
Wu, X and Zhang, X. (2016). Automated Inference on Criminality using Face Images. [online] Researchgate. Available at: https://www.researchgate.net/publication/310235081_Automated_Inference_on_Criminality_using_Face_Images [Assessed 1 Jan. 2020]
Dyevre, A. (2016). The Future of Legal Theory and the Law School of the Future. Cambridge: Intersentia.
Bodig, M. (2015). Legal Doctrinal Scholarship and Interdisciplinary Engagement. Erasmus Law Review, (2), pp. 43–54.
Livermore, M, Riddell, A and Rockmore, D. (2017) The Supreme Court and the Judicial Genre. Arizona Law Review, 59, pp. 837-901.
Anderson, C. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. [online] The Wired. Available at: https://www.wired.com/2008/06/pb-theory/ [Assessed 1 Jan. 2020]
Tene, O and Polonetsky, J. (2012). Privacy in the Age of Big Data: A Time for Big Decisions. Stanford Law Review, 64, pp. 63-69.
Mayer-schönberger, V. and K. Cukier. (2013). Big Data; A Revolution that will
Transform How We Live, Work and Think. Boston: Houghton Mifflin Harcourt.
Ohm, P. (2010). Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. UCLA Law Review, 57(6), pp. 1701–1778.
Richards, N and King, J. (2014). Big Data Ethics. Wake Forest Law Review, 49(2), pp. 393–432.
Communication from the Commission. (2014). Towards a thriving data-driven economy. [online] EUR-Lex. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1404888011738&uri=CELEX:52014DC0442 [Assessed 1 Jan. 2020]

Law in the Age of Big Data

LawTeacher

Areas of Legal Expertise

Cite This Work

Related Services

DMCA / Removal Request

Essays

Dissertations

Summaries

Essay Services

Dissertation Services

Other Services

About

Company Information

Contact

LawTeacher

Areas of Legal Expertise

Cite This Work

Related Services

DMCA / Removal Request