GDPR and AI

 

This article was made as a summary of the study published by the European Parliament which addresses the relation between the EU General Data Protection Regulation (GDPR) and artificial intelligence (AI).

It considers challenges and opportunities for individuals and society, and the ways in which risks can be countered and opportunities enabled through law and technology.

AI definition

AI systems are software (and possibly also hardware) systems designed by humans that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal. AI systems can either use symbolic rules or learn a numeric model, and they can also adapt their behaviour by analysing how the environment is affected by their previous actions.

The High-Level Expert Group characterises the scope of research in AI as follows:

“As a scientific discipline, AI includes several approaches and techniques, such as machine learning (of which deep learning and reinforcement learning are specific examples), machine reasoning (which includes planning, scheduling, knowledge representation and reasoning, search, and optimization), and robotics (which includes control, perception, sensors and actuators, as well as the integration of all other techniques into cyber-physical systems).”

In an AI system’s ability to improve itself could lie the ‘singularity’ that will accelerate the development of science and technology, so as not only to solve current human problems (poverty, underdevelopment, etc.), but also to overcome the biological limits of human existence (illness, aging, etc.) and spread intelligence in the cosmos.

AI and algorithms

The term ‘algorithm’ is often used to refer to AI applications, e.g., through locutions such ‘algorithmic decision-making.’ However, the concept of an algorithm is more general that the concept of AI, since it includes any sequence of unambiguously defined instructions to execute a task, particularly but not exclusively through mathematical calculations. To be executed by a computer system, algorithms have to be expressed through programming languages, thus becoming machine- executable software programs. Algorithms can be very simple, specifying, for instance, how to arrange lists of words in alphabetical order or how to find the greatest common divisor between two numbers (such as the so-called Euclidean algorithm). They can also be very complex, such as algorithms for file encryption, the compression of digital files, speech recognition, or financial forecasting. Obviously, not all algorithms involve AI, but every AI system, like any computer system, includes algorithms, some dealing with tasks that directly concern AI functions.

AI and big data

The term big data identifies vast data sets that it is difficult to manage using standard techniques, because of their special features, the so-called thee V’s: huge Volume, high Velocity and great Variety. Other features associated to big data are low Veracity (high possibility that at least some data are inaccurate), and high Value. Such data can be created by people, but most often they are collected by machines, which capture information from the physical word (e.g., street cameras, sensors collecting climate information, devices for medical testing, etc.), or from computer-mediated activities (e.g., systems recording transactions or tracking online behaviour etc.).

From a social and legal perspective what is most relevant in very large data sets, and which makes them ‘big data’ from a functional perspective, is the possibility of using such data sets for analytics, namely, for discovering correlations and making predictions, often using AI techniques, as we shall see when discussing machine learning. In particular, the connection with analytics and AI makes big data specifically relevant to data protection.

Machine learning

AI has made an impressive leap forward since it began to focus on the application of machine learning to mass amounts of data. This has led to a number of successful applications in many sectors ranging from automated translation to industrial optimisation, marketing, robotic visions, movement control, etc. – and some of these applications already have substantial economic and social impacts.

In machine learning approaches, machines are provided with learning methods, rather than, or in addition to, formalised knowledge. Using such methods, they can automatically learn how to effectively accomplish their tasks by extracting/inferring relevant information from their input data. As noted, and as Alan Turing already theorised in the 1950s, a machine that is able to learn will achieve its goals in ways that are not anticipated by its creators and trainers, and in some cases without them knowing the details of its inner workings.

The answers by learning systems are usually called ‘predictions’. However, often the context of the system’s use often determines whether its proposals are be interpreted as forecasts, or rather as a suggestion to the system’s user. For instance, a system’s ‘prediction’ that a person’s application for bail or parole will be accepted can be viewed by the defendant (and his or her lawyer) as a prediction of what the judge will do, and by the judge as a suggestion guiding her decision (assuming that she prefers not to depart from previous practice). The same applies to a system’s prediction that a loan or a social entitlement will be granted.

Data protection

Privacy Shield data protection

Internet security and data protection concept.

Data protection is at the forefront of the relationship between AI and the law, as many AI applications involve the massive processing of personal data, including the targeting and personalised treatment of individuals on the basis of such data. This explains why data protection has been the area of the law that has most engaged with AI, although other domains of the law are involved as well, such as consumer protection law, competition law, anti-discrimination law, and labour law.

The key aspect of AI system, of the machine learning type, is their ability to engage in differential inference: different combinations of predictor-values are correlated to different predictions. As discussed above, when the predictors concern data on individuals and their behaviour, the prediction also concerns features or attitudes of such individuals.

The GDPR, as we shall see in the following sections, provides some constraints: the need for a legal basis for any processing of personal data, obligations concerning information and transparency, limitations on profiling and automated decision-making, requirements on anonymisation and pseudonymisation, etc.

Risks and opportunities

To predict a certain outcome in a new case means to jump from certain known features of that case, the so-called predictors (also called independent variables, or features), to an unknown feature of that case, the target to be predicted (also called dependent variable, or label).

This forecast is based on models that capture general aspects of the contexts being considered, on the basis of which it is possible to connect the values of predictors and targets. For instance a model in the medical domain may connect symptoms to diseases, a psychometric model may connect online behaviour (e.g., friends, posts and likes on a social network) to psychological attitudes; etc.

For instance, targeted advertising may be based on records linking the characteristics and behaviour of consumers (gender, age, social background, purchase history, web browsing, etc.) to their responses to ads. Similarly, the assessment of job applications may be based on records linking characteristics of previous workers (education, employment history, jobs, aptitude tests, etc.), to their work performance; the prediction of the likelihoods of recidivism by a particular offender may be based on records combining characteristics of past offenders (education, employment history, family status, criminal record, psychological tests, etc.) with data or assessments on the recidivism; the prediction of a prospective borrower’s creditworthiness may be based on records linking the characteristics of past borrowers to data or assessments about their creditworthiness

Challenges

Through technologies we can address the grand challenges of humanity, such as maintaining a healthy environment, providing the resources for a growing population (including energy, food, and water), overcoming disease, vastly extending human longevity, and eliminating poverty. It is only by extending ourselves with intelligent technology that we can deal with the scale of complexity needed.

However, the development of AI and its convergence with big data also leads to serious risks for individuals, for groups, and for the whole of society. For one thing, AI can eliminate or devalue the jobs of those who can be replaced by machines: many risk losing the ‘race against the machine’, and therefore being excluded from or marginalised in the job market. This may lead to poverty and social exclusion, unless appropriate remedies are introduced (consider, for instance, the future impact of autonomous vehicles on taxi and truck drivers, or the impact of smart chatbots on call- centres workers).

Moreover, by enabling big tech companies to make huge profits with a limited workforce, AI contributes to concentrating wealth in those who invest in such companies or provide them with high-level expertise. This trend favours economic models in which ‘the winner takes all‘. Within companies, monopoly positions tend to prevail, thanks to the network effect (users’ preference for larger networks), coupled with economies of scale (enabled by automation) and exclusive or preferential access  to data and technologies.

Illegal activities

There is also a need to counter the new opportunities for illegal activities offered by AI and big data. In particular, AI and big data systems can fall subject to cyberattacks (designed to disable critical infrastructure, or steal or rig vast data sets, etc.), and they can even be used to commit crimes (e.g., autonomous vehicles can be used for killing or terrorist attacks, and intelligent algorithms can be used for fraud or other financial crimes).

Certain abuses may be incentivised by the fact that many tech companies – such as major platforms hosting user-generated content – operate in two or many-sided markets. Their main services (search, social network management, access to content, etc.) are offered to individual consumers, but the revenue stream comes from advertisers, influencers, and opinion-makers (e.g., in political campaigns). This means not only that any information that is useful for targeted advertising will be collected and used for this purpose, but also that platforms will employ any means to capture users, so that they can be exposed to ads and attempts at persuasion.

In other cases, a training set may be biased against a certain group, since the achievement of the outcome being predicted (e.g., job performance) is approximated through a proxy that has a disparate impact on that group. Assume, for instance, that the future performance of employees (the target of interest in job hiring) is only measured by the number of hours worked in the office. This outcome criterion will lead to past hiring of women – who usually work for fewer hours than men, having to cope with heavier family burdens – being considered less successful than the hiring of men; based on this correlation (as measured on the basis of the biased proxy), the systems will predict a poorer performance of female applicants.

Profiling

Through AI and big data technologies – in combination with the panoply of sensor that increasingly trace any human activity – individuals can be subject to surveillance and influence in many more cases and contexts, on the basis of a broader set of personal characteristics (ranging from economic conditions to health situation, place of residence, personal life choices and events, online and offline behaviour, etc.).

The notion of profiling in the GDPR only covers assessments or decisions concerning individuals, based on personal data, excluding the mere construction of group profiles:

‘profiling'[…] consists of any form of automated processing of personal data evaluating the personal aspects relating to a natural person, in particular to analyse or predict aspects concerning the data subject’s performance at work, economic situation, health, personal preferences or interests, reliability or behaviour, location or movements, where it produces legal effects concerning him or her or similarly significantly affects him or her.

Ethical framework

According to the High-Level Expert Group, in order to implement and achieve trustworthy AI, seven requirements should be met, building on the principles mentioned above:

–  Human agency and oversight, including fundamental rights.

–  Technical robustness and safety, including resilience to attack and security, fall back plan and general safety, accuracy, reliability and reproducibility.

–  Privacy and data governance, including respect for privacy, quality and integrity of data, and access to data.

–  Transparency, including traceability, explainability and communication.

–  Diversity, non-discrimination and fairness, including the avoidance of unfair bias.

accessibility and universal design, and stakeholder participation.

–  Societal and environmental wellbeing, including sustainability and environmental friendliness, social impact, society and democracy.

–  Accountability, including auditability, minimisation and reporting of negative impact, trade-offs and redress.

Legal framework

Moving from ethics to law, AI may both promote and demote different fundamental rights and social values included in the EU Charter and in national constitutions.

AI indeed can magnify both the positive and the negative impacts of ICTs on human rights and social values. The rights to privacy and data protection (Articles a 7 and 8 of the Charter) are at the forefront, but other rights are also at stake: dignity (article 1), right to liberty and security, freedom of thought, conscience and religion (Article 10), freedom of expression and information (Article 11), freedom of assembly and association (Article 12), freedom of arts and science (Article 13), right to education (article 14), freedom to choose an occupation and right to engage in work (Article 15), right to equality before the law (Article 20), right to non-discrimination (article 21), equality between men and women (Article 23), rights of the child (Article 24), right to fair and just working conditions (Article 31), right to health care (article 35), right to access to services of general economic interest (Article 36), consumer protection (Article 38), right to good administration (Article 41), right to an effective remedy and to a fair trial (Article 47). Besides individual right also social values are at stake, such as democracy, peace, welfare, competition, social dialogue efficiency, advancement in science, art and culture, cooperation, civility, and security.

Legal regimes

Given the huge breath of its impacts on citizens’ individual and social lives, AI falls under the scope of different sectorial legal regimes. These regimes include especially, though not exclusively, data protection law, consumer protection law, and competition law.

As has been observed by the European Data Protection Supervisor (EDPS) in Opinion 8/18 on the legislative package ‘A New Deal for Consumers,’ there is synergy between the three regimes. Consumer and data protection law share the common goals of correcting imbalances of informational and market power, and, along with competition law, they contribute to ensuring that people are treated fairly. Other domains of the law are also involved in AI: labour law relative to the new forms of control over worker enabled by AI; administrative law relative to the opportunities and risk in using AI to support administrative decision-making; civil liability law relative to harm caused by AI driven systems and machines; contract law relative to the use of AI in preparing, executing and performing agreements; laws on political propaganda and elections relatively to the use of AI in political campaigns; military law on the use of AI in armed conflicts; etc.

Personal data

Personal data

To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

Through pseudonymisation, the data items that identify a person (i.e., the name) are substituted with a pseudonym, but the link between the pseudonym and the identifying data items can be retraced by using separate information (e.g., through a table linking pseudonyms and real names, or through cryptography key to decode the encrypted names). Recital (26) specifies that pseudonymised data still are personal data.

Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. Also, If technological developments make it possible to turn anonymised data into personal data, such data are to be treated as personal data.

Two issues

In connection with the GDPR definition of personal data, AI raises in particular two key issues:

(1) the ‘re-personalisation’ of anonymous data, namely the re-identification of the individuals to which such data are related;

(2) and the inference of further personal information from personal data that are already available.

Thanks to AI and big data the identifiability of the data subjects has vastly increased. The personal nature of a data idem no longer is a feature of that item separately considered. It has rather become a contextual feature. As shown above, an apparently anonymous data item becomes personal in the context of further personal data that enable re-identification. For instance, the identifiability of the Netflix movie reviewers supervened on the availability of their named reviews on IMDb. As it has been argued, ‘in any “reasonable” setting there is a piece of information that is in itself innocent, yet in conjunction with even a modified (noisy) version of the data yields a privacy breach.’

Consent

Consent according to Article 4(11) GDPR should be freely given, specific, informed and unambiguous, and be expressed through a clear affirmative action:

‘consent’ of the data subject means any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her.

The first issue pertains to the specificity of consent: does consent to the processing for a certain purpose also cover further AI-based processing, typically for data analytics and profiling? – e.g., can data on sales be used to analyse consumer preferences and send targeted advertising? This seems to be ruled out, since consent needs to be specific, so that it cannot extend beyond what is explicitly indicated. However, the fact that the data subject has only consented to processing for a certain purpose (e.g., client management) does not necessarily rule out that the data can be processed for a further legitimate purpose (e.g., business analytics): the further processing is permissible when it is covered by a legal basis, and it is not incompatible with the purpose for which the data were collected.

Purpose

A tension exists between the use of AI and big data technologies and the purpose limitation requirement. These technologies enable the useful reuse of personal data for new purposes that are different from those for which the data were originally collected.

For instance, data collected for the purpose of contract management can be processed to learn consumers’ preferences and send targeted advertising; ‘likes’ that are meant to express and communicate one’s opinion may be used to detect psychological attitudes, political or commercial preferences, etc.

To establish whether the repurposing of data is legitimate, we need to determine whether a new purpose is ‘compatible’ or ‘not incompatible’ with the purpose for which the data were originally collected.

Data subjects’ rights

There has been a wide discussion on whether Article 15 of the GDPR should be read as granting data subjects the right to obtain an individualised explanation of automated assessments and decisions. Unfortunately, the formulation of Article 15 is very ambiguous, and that ambiguity is reflected in Recital 63. In particular it is not specified whether the obligation to provide information on the ‘logic involved’ only concerns providing general information on the methods adopted in the system, or rather specific information on how these methods where applied to the data subject

The use of AI makes it more likely that a decision will be based ‘solely’ on automated processing. This is due to the fact that humans may not have access to all the information that is used by AI systems, and may not have the ability to analyse and review the way in which this information is used. It may be impossible, or it may take an excessive effort to carry out an effective review – unless the system has been effectively engineered for transparency, which in some cases may be beyond the state of the art. Thus, especially when a large-scale opaque system is deployed, humans are likely to merely execute the automated suggestions by AI, even when they are formally in charge.

Data protection impact assessment

Article 35 of the GDPR requires that a data protection impact assessment is preventively carried out relatively to processing that is likely to result in a high risk to the rights and freedoms of natural persons. The assessment is required in particular when the processing involves a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person.

Thus, an impact assessment is usually required when AI-based profiling contributes to automated decision-making affecting individuals, since such profiling is likely to be ‘systematic and extensive.’

Certification

Articles 40-43 of the GDPR address codes of conduct and certification. While these provisions do not make explicit reference to AI, certification procedure may be highly relevant to AI, given the risks involved in AI application, and the limited guidance provided by legal provisions.

Adherence to certification mechanisms may contribute to demonstrate compliance with the obligations of the controller and with the requirements of privacy by design. The idea of a certification for AI applications has been endorsed by the European Economic and Social Committee (EESC) which ‘calls for the development of a robust certification system based on test procedures that enable companies to state that their AI systems are reliable and safe.’ Thus, it suggests developing a ‘European trusted-AI Business Certificate based partly on the assessment list put forward by the High-Level Experts’ group on AI.’ On the other hand, some perplexities on a general framework for certification have also been raised, based on the complexity of AI technologies, their diversity, and their rapid evolution.

GDPR and AI

It has been argued that the GDPR would be incompatible with AI and big data, given that the GDPR is based on principles – purpose limitation, data minimisation, the special treatment of ‘sensitive data’, the limitation on automated decisions–that are incompatible with the extensive use of AI. As a consequence, the EU would be forced to either renounce application of the GDPR or lose the race against those information-based economies – such as the USA and China – that are able make full use of AI and big data.

Contrary to this opinion, the report showed that it is possible – and indeed likely – that the GDPR will be interpreted in such a way as to reconcile both desiderata: protecting data subjects and enabling useful applications of AI. It is true that the full deployment of the power of AI requires collecting vast quantities of data concerning individuals and their social relations, and that it also requires processing of such data for purposes that were not fully determined at the time the data were collected. However, there are ways to understand and apply the data protection principles that are consistent with the beneficial uses of AI.

Main conclusions

In the following, the main conclusions of this report on the relations between AI and the processing of personal data are summarised.

  • The GDPR generally provides meaningful indications for data protection relative to AI applications.
  • The GDPR can be interpreted and applied in such a way that it does not hinder beneficial application of AI to personal data, and that it does not place EU companies at a disadvantage in comparison with non-European competitors.

Thus, GDPR does not seem to require any major change in order to address AI.

Conclusion

In conclusion, controllers engaging in AI-based processing should endorse the values of the GDPR and adopt a responsible and risk-oriented approach, and they should be able to do so in a way that is compatible with the available technologies and with economic profitability (or the sustainable achievement of public interests).

However, given the complexity of the matter and the gaps, vagueness and ambiguities present in the GDPR, controllers should not be left alone in this exercise. Institutions need to promote a broad social debate on AI applications, and should provide high level indications. Data protection authorities need to actively engage a dialogue with all stakeholders, including controllers, processors, and civil society, to develop appropriate responses, based on shared values and effective technologies.

Consistent application of data protection principles, when combined with the ability to use AI technology efficiently, can contribute to the success of AI applications, by generating trust and preventing risks.

Marija Boskovic Batarelo, LL.M. Law and Technology