Privacy-preserving Data Aggregation in Mobile Phone Sensing
Privacy-preserving Data Aggregation in Mobile Phone Sensing
[pdf-embedder url=”http://wellapets.com/wp-content/uploads/2019/06/Privacy-preserving-Data-Aggregation-in-Mobile.pdf” title=”Privacy-preserving Data Aggregation in Mobile”]
Abstract—Mobile phone sensing provides a promising paradigm for collecting sensing data and has been receiving increasing attention in recent years. Different from most existing works, which protect participants’ privacy by hiding the content of their data and allow the aggregator to compute some simple aggregation functions, we propose a new approach to protect participants’ privacy by delinking data from its sources. This approach allows the aggregator to get the exact distribution of the data aggregation, and therefore enables the aggregator to efficiently compute arbitrary/complicated aggregation functions. In particular, we first present an efficient protocol that allows an untrusted data aggregator to periodically collect sensed data from a group of mobile phone users without knowing which data belongs to which user. Assume there are n users in the group. Our protocol achieves “n-source anonymity” in the sense that the aggregator only learns that the source of a piece of data is one of the n users. Then, we consider a practical scenario where users may have different source anonymity requirements and provide a solution based on dividing users into groups. This solution optimizes the efficiency of data aggregation and meets all users’ requirements at the same time.
Index Terms—Privacy, data aggregation, cloud computing, security, mobile sensing
Mobile phone sensing provides a new paradigm for people to efficiently perform sensing tasks. In a typical mobile phone sensing application, a data aggregator recruits a group of mobile phone users to perform sensing tasks.
With various kinds of sensors embedded in their mobile phones, these users perform the sensing task and then send the data back to the data aggregator through the communication network. Due to the outstanding sensing ability of mobile phones in recent years smartphones and the ubiquitousness of mobile phone users, mobile phone sensing is gaining increasing attention from both industry and academia. A number of mobile phone sensing based applications have been developed across areas such as healthcare [12, 34], transportation [31, 42], environment monitoring [28, 32], etc. In these applications, data collected by the aggregator often contains users’ private information. For example, most applications for traffic or environment monitoring collect the user’s physical location in addition to their direct POI (points of interest) e.g. the traffic congestion level or the noise level; most healthcare applications collect information relating to a users health such as weight and blood pressure. Concerned about their privacy, mobile phone users may refuse to participate in the sensing especially when the aggregator is untrusted.
Thus, protecting participants’ privacy is extremely important to mobile phone sensing applications. Realizing the importance of privacy protection, researchers began to investigate privacy issues in mobile phone sensing and a few works [22, 23, 26, 36, 38, 39] on protecting participants’ privacy have been carried out in recent years. Take  for an example, an efficient secure protocol is designed for an untrusted aggregator to compute the sum of all participants’ time-series data. Before sending its data to the aggregator, each mobile phone user encrypts it using an additively homomorphic cipher. The encryption keeps the content of a user’s data private from the aggregator and other users. Different cryptographic schemes are used in [36, 38, 39] to implement the same summation functionality of the aggregator. Based on the secure summation, a few functions on the data aggregation other than the sum, such as the average, the Max/Min etc. can also be computed without knowing each user’s data.
We notice that all works above choose to protect users’ privacy by “hiding” their data’s contents. Protocols proposed in these works are specifically designed to compute a certain aggregation function without revealing each data’s value. If we want to compute more than one different aggregation function, we often need to apply one specific protocol for each function, which is very inefficient. Furthermore, most aggregation functions studied in these works are simple functions such as sum, average, Max/Min etc. Non-linear functions such as variance, z-test function, F-test function etc. are rarely studied. Unlike these works, in this paper, we protect users’ privacy by delinking data from its sources. In particular, we aim to design protocols that allow the data aggregator to periodically collect a random permutation of all users’ data without being able to identify the source of any particular piece of data. This approach allows the aggregator to get the exact distribution of the data aggregation, and therefore enables the aggregator to efficiently perform complicated statistic analyses that are difficult to perform using protocols that hide the data’s contents. In addition, letting the aggregator know the data’s contents (rather than keeping it private) is necessary for some mobile sensing applications. One possible example would be the users’ location data in transportation sensing applications. Users’ location data is often required to be made available in order to enable accurate analysis results.
In practice, delinking data from its sources also provides satisfactory privacy protection for participants in many mobile sensing applications. For example, consider a mobile sensing application that monitors the trend of an epidemic (e.g. avian flu) in a city. The aggregator may want to collect citizens’ body temperatures. Disguising every citizen’s body temperature value protects citizens’ privacy, as does disguising the connection of a citizen to their body temperature value. Consider an application that guarantees the source of data is hidden amongst all citizens in a city. If all citizens send their body temperatures using this application to the aggregator, the aggregator only knows a random permutation of all citizen’s body temperatures. Although it can easily spot abnormal body temperatures that indicate some citizens are likely infected, it cannot know which particular citizens are infected, nor the body temperature of a particular citizen.
Such an application protects all participants’ privacy well. Even an infected citizen would not refuse to send its abnormal body temperature in such an application.
Our paper consists mainly of two parts. In the first part, we study how to delink the data from its sources in a general mobile sensing application. Suppose there are n users and one untrusted aggregator. We propose an anonymous data aggregation protocol that allows the aggregator to collect all users’ data. Our protocol achieves “n-source anonymity” in the sense that the aggregator only learns that the source of any particular piece of data is one of the n users. In the second part of this paper, we consider the situation where n is very large and different users may have different privacy requirements. In order to improve the efficiency, we propose dividing users into groups according to their privacy requirements and allowing users in each group to execute the anonymous data aggregation protocol together. We provide an optimal grouping algorithm which finds an optimal grouping that meets all users’ privacy requirements and minimizes the total amount of data received by the aggregator at the same time.
Our contribution can be summarized as: • Proposing a new privacy-preserving approach for mobile phone sensing data aggregation that can
be applied to arbitrary aggregation functions. Our approach does not assume that this aggregator is trusted and does not require data transmissions among sensor nodes.
• Presenting an anonymous data aggregation protocol that allows the data aggregator to recieve a random permutation of all users’ data without knowing the source of any particular piece of data.
• Formally proving the anonymous data aggregation protocol is n-source anonymous when there are n honest users in the application.
• When the total number of users is large and users have different minimal privacy level requirements, proposing a grouping algorithm that can be used to find an optimal grouping. Grouping users this way and allowing users within each group to execute our anonymous data aggregation protocol together would satisfy all users’ privacy requirements and optimize the entire data aggregation’s efficiency at the same time.
• Performing experiments to show the efficiency of our protocols. The rest of this paper is organized as follows. In Section 2, we review the related work. After introducing the preliminary knowledge of our problem and solutions in Section 3, we present our anonymous data aggregation protocol in Section 4. In Section 5, we show how to deal with an extreme scenario in which the total number of users is very large. We evaluate our solutions’ performance in Section 6 and then discuss possible extensions of our protocols in Section 7. Finally, we conclude our paper in Section 8.