HKU Bulletin Nov 2025 (Vol.27 No. 1)

‘The more data you have, the better the result’ is an idea that underpins AI and machine learning. By collecting huge amounts of data, it becomes possible to discern patterns and make accurate predictions or decisions. But that approach often fails to consider that the raw data usually belongs to an individual, revealing things that person may not want to share or even be aware of. Professor Edith CH Ngai of the Department of Electrical and Electronic Engineering has first-hand experience of this from her research projects, including a project on smart water auditing with the HKU Water Centre and the Water Supplies Department of the Government of the HKSAR to understand household water consumption patterns in Hong Kong. They affixed devices to water meters at homes to take photos Drawing on her experience collecting data that was more revealing than expected, Professor Edith CH Ngai has been working on ways to decentralise data collection and machine learning. A Better Way to Share of the readings, digitise them and report them to a central server. This enabled automatic and continuous monitoring of water usage to see when demand was high and when it abated. Surprisingly, it also revealed individual anomalies. One family was found to take long showers, another to use the washing machine several times a day, and still another to leave the kitchen tap running for long continuous periods. “Our original goal was to understand overall domestic water consumption in Hong Kong, but after investigating the data in detail, we found some weird behaviours that we didn’t expect,” she said. “As researchers, we want to understand the data of a certain community, but sometimes the data may also reveal things about individuals’ private lives.” HKU Bulletin | Nov 2025 Cover Story 10 11 A Smart Meter Analyser that can be clamped onto a government-issued water billing meter to digitise readings and automatically transmit the data to the cloud. Edge computing This relates to another area she is working on – edge general intelligence. ‘Edge’ means smart devices such as phones and computers that are handier and closer to the end users and can perform computations themselves, rather than sending data to a central server. Examples include Siri or AI personal assistants. This can also protect privacy, she said. Professor Ngai expects many more AI applications will be developed for smart devices. For university researchers, it is also a more fruitful path because they would not need large models and expensive computing power to conduct their studies. Professor Ngai admitted that while engineering scholars were less aware of privacy concerns in the past, possibly because the community uses many open-source datasets, awareness is increasing, as are more stringent ethical and privacy demands. This was reflected in her collaboration with the Faculty of Medicine on a study of children’s health before, during and after the COVID-19 pandemic. “Privacy demands are much more stringent for medical studies and journals,” she said. “A lot of people working on AI may still fight more for model accuracy and clean data. They do not want noisy or perturbed data. But as it gets closer to practical usage and applications by the general public, people will want more attention paid to privacy,” she said. Adding noise For instance, a person’s location may unknowingly be revealed when using apps. Professor Ngai cited the example of users who tag photos of animals or objects outdoors to aid machine learning. If their location can be determined, this would not only be of concern for the individual’s sake but may also deter others from providing data to improve machine learning models. “It’s a bit of a contradictory situation. If we get more data, then the model will be more accurate and powerful. But then the privacy concern becomes stronger,” she said. One solution is to add ‘noise’ to the data. For instance, a researcher monitoring physical activity could widen the monitoring area to incorporate dozens of people at a time, without pinpointing an individual and the paths they take every day. However, that risks sacrificing accuracy, she said. Professor Ngai has instead been looking at federated learning, which facilitates collaborative and distributed machine learning across different devices and users. Rather than sending raw data to a central server, individuals perform local training on their devices and send their updates of the model to the server, which combines input from all users without revealing private information such as users’ locations. “In this sense, people don’t need to share personal data, but they can still work together to do machine learning model training,” she said. Federated learning is still not perfect – malicious users may mislabel things to poison the model – so Professor Ngai is also working on ways to identify malicious updates and provide robust global aggregation. In this case, the families were not entirely unaware – they had been informed and given consent to the monitoring, and their data was securely stored and anonymised before data analysis. However, other data collection efforts may be less diligent or even less aware of the privacy threats. Professor Edith CH Ngai It’s a bit of a contradictory situation. If we get more data, then the model will be more accurate and powerful. But then the privacy concern becomes stronger.

Made with FlippingBook

RkJQdWJsaXNoZXIy ODI4MTQ=