Bulletin January 2018 (Vol. 19 No. 2)

When you’re basically saying to a computer, ‘here is all the data, make the best decision for me’ without understanding how that decision is reached, whether it is fair, whether it has unintended consequences, then you have really very challenging questions. Professor John Bacon-Shone How can individuals be protected when their personal data is constantly being collected for uses that may not be apparent until some future date? And when it may not be obvious who is collecting that data? As giants like Google, Facebook, WeChat and Alibaba track their users every minute of the day, these questions are rising high on government agendas around the world. In little more than a decade, most people now share personal information in order to gain access to services – whether socialising, shopping, seeking entertainment, or checking up on their health. Even our whereabouts can be tracked at every moment if the location service on our phones is turned on. That goldmine of information is being used by both businesses and governments to make decisions about individuals and groups, such as how much to charge certain users for services, whether to deny them access and what trends are revealed by their data. And therein lie several problems. First, the story told by big data may not be an accurate one. Professor John Bacon-Shone of the Faculty of Social Sciences, a statistician with an interest in big data and privacy who also advises the Hong Kong Government on the issues, cites the example of the Google Flu Trends web service which aggregated search queries about flu to predict outbreaks. “The problem is, it’s just an association, not causation, and it doesn’t work well at prediction. If you have a different type of flu, the whole thing falls apart,” he said. Big data may also contain coding mistakes or built-in biases. Another example cited by Professor Bacon-Shone concerns decisions in the US on who should be granted bail. When African Americans were shown to be less likely to get bail after controlling for other factors, the decision was computerised. But the data fed into the computer came from past decisions. “The inputs already had bias in them. So you end up replicating the bias,” he said. A third problem is that even when data is anonymised for the sake of privacy, it may be possible to re-identify a person because the data retains telling details. For example, hospital data about accident casualties will include the date, time of admission and condition, and inferences could be drawn about the identity of a patient. More worryingly, with big data crunching DNA information, it is becoming possible to predict a person’s hair colour, eye colour and even surname based on a sample of their DNA. “There are people who have been foolish enough to put their full DNA profiles in the public domain. DNA has the potential for massive health benefits but also for massive risks,” he said. All of this seems to cry out for regulation. But this, too, is problematic. Traditional regulation out of step Personal data protection laws typically require banks and other institutions to keep accurate up-to-date information and disclose how it will be used. But when the technology is changing rapidly, with new and unanticipated uses becoming possible, this may no longer be sufficient. Professor Anne SY Cheung of the Faculty of Law has been studying privacy and personal data protection and is co-editor of the 2015 book Privacy and Legal Issues in Cloud Computing . “Recent legal reforms and position papers from Professor John Bacon-Shone speaking at the Symposium on ‘Data Protection Law Development in the Information Age’ at the City University of Hong Kong in September, 2016. The aggregation of big data can be put to some amazing uses. But there are also risks for the individual. BIG DATA’S DARK SIDE 11 | 12 The University of Hong Kong Bulletin | January 2018 Cover Story