Big sensor data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity it is difficult to process using on-hand database management tools or traditional data processing applications. Cloud computing provides a promising platform to support the addressing of this challenge as it provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. Some techniques have been developed in recent years for processing sensor data on cloud, such as sensor-cloud. However, these techniques do not provide efficient support on fast detection and locating of errors in big sensor data sets.

We develop a novel data error detection approach which exploits the full computation potential of cloud platform and the network feature of WSN. Firstly, a set of sensor data error types are classified and defined. Based on that classification, the network feature of a clustered WSN is introduced and analyzed to support fast error detection and location. Specifically, in our proposed approach, the error detection is based on the scale-free network topology and most of detection operations can be conducted in limited temporal or spatial data blocks instead of a whole big data set. Hence the detection and location process can be dramatically accelerated.

Furthermore, the detection and location tasks can be distributed to cloud platform to fully exploit the computation power and massive storage. Through the experiment on our cloud computing platform of U-Cloud, it is demonstrated that our proposed approach can significantly reduce the time for error detection and location in big data sets generated by large scale sensor network systems with acceptable error detecting accuracy.



Recently, we enter a new era of data explosion which brings about new challenges for big data processing. In general, big data is a collection of data sets so large and complex that it becomes difficult to process with onhand database management systems or traditional data processing applications. It represents the progress of the human cognitive processes, usually includes data sets with sizes beyond the ability of current technology, method and theory to capture, manage, and process the data within a tolerable elapsed time. Big data has typical characteristics of five ‘V’s, volume, variety, velocity, veracity and value. Big data sets come from many areas, including meteorology, connectomics, complex physics simulations, genomics, biological study, gene analysis and environmental research. According to literature since 1980s, generated data doubles its size in every 40 months all over the world. In the year of 2012, there were 2.5 quintillion (2.5  1018) bytes of data being generated every day.

Hence, how to process big data has become a fundamental and critical challenge for modern society. Cloud computing provides apromising platform for big data processing with powerful computation capability, storage, scalability, resource reuse and low cost, and has attracted significant attention in alignment with big data. One of important source for scientific big data is the data sets collected by wireless sensor networks (WSN). Wireless sensor networks have potential of significantly enhancing people’s ability to monitor and interact with their physical environment. Big data set from sensors is often subject to corruption and losses due to wireless medium of communication and presence of hardware inaccuracies in the nodes. For a WSN application to deduce an appropriate result, it is necessary that the data received is clean, accurate, and lossless. However, effective detection and cleaning of sensor big data errors is a challenging issue demanding innovative solutions. WSN with cloud can be categorized as a kind of complex network systems. In these complex network systems such as WSN and social network, data abnormality and error become an annoying issue for the real network applications.

Therefore, the question of how to find data errors in complex network systems for improving and debugging the network has attracted the interests of researchers. Some work has been done for big data analysis and error detection in complex networks including intelligence sensors networks. There are also some works related to complex network systems data error detection and debugging with online data processing techniques. Since these techniques were not designed and developed to deal with big data on cloud, they were unable to cope with current dramatic increase of data size. For example, when big data sets are encountered, previous offline methods for error detectionand debugging on a single computer may take a long time and lose real time feedback. Because those offline methods are normally based on learning or mining, they often introduce high time cost during the process of data set training and pattern matching. WSN big data error detection commonly requires powerful real-time processing and storing of the massive sensor data as well as analysis in the context of using inherently complex error models to identify and locate events of abnormalities.

In this paper, we aim to develop a novel error detection approach by exploiting the massive storage, scalability and computation power of cloud to detect errors in big data sets from sensor networks. Some work has been done about processing sensor data on cloud. However, fast detection of data errors in big data with cloud remains challenging. Especially, how to use the computation power of cloud to quickly find and locate errors of nodes in WSN needs to be explored. Cloud computing, a disruptive trend at present, poses a significant impact on current IT industry and research communities. Cloud computing infrastructure is becoming popular because it provides an open, flexible, scalable and reconfigurable platform. The proposed error detection approach in this paper will be based on the classification of error types. Specifically, nine types of numerical data abnormalities/errors are listed and introduced in our cloud error detection approach. The defined error model will trigger the error detection process. Compared to previous error detection of sensor network systems, our approach on cloud will be designed and developed by utilizing the massive data processing capability of cloud to enhance error detection speed and real time reaction. In addition, the architecture feature of complex networks will also be analyzed to combine with the cloud computing with a more efficient way. Based on current research literature review, we divide complex network systems into scale-free type and non scale-free type. Sensor network is a kind of scale-free complex network system which matches cloud scalability feature.