Network-Based Modeling and Intelligent Data Mining of Social Media for Improving Care


Intelligently extracting knowledge from social media has recently attracted great interest from the Biomedical and Health Informatics community to simultaneously improve healthcare outcomes and reduce costs using consumer-generated opinion. We propose a two-step analysis framework that focuses on positive and negative sentiment, as well as the side effects of treatment, in users’ forum posts, and identifies user communities (modules) and influential users for the purpose of ascertaining user opinion of cancer treatment. We used a self-organizing map to analyze word frequency data derived from users’ forum posts. We then introduced a novel network-based approach for modeling users’ forum interactions and employed a network partitioning method based on optimizing a stability quality measure. This allowed us to determine consumer opinion and identify influential users within the retrieved modules using information derived from both word-frequency data and network-based properties. Our approach can expand research into intelligently mining social media data for consumer opinion of various treatments to provide rapid, up-to-date information for the pharmaceutical industry, hospitals, and medical staff, on the effectiveness (or ineffectiveness) of future treatments. Index Terms—Data mining, complex networks, neural networks, semantic web, social computing.


Social media is providing limitless opportunities for patients to discuss their experiences with drugs and devices, and for companies to receive feedback on their products and services [1]–[3]. Pharmaceutical companies are prioritizing social network monitoring within their IT departments, creating an opportunity for rapid dissemination and feedback of products and services to optimize and enhance delivery, increase turnover and profit, and reduce costs [4]. Social media data harvesting for bio-surveillance have also been reported [5]. Social media enables a virtual networking environment. Modeling social media using available network modeling and computational tools is one way of extracting knowledge and trends from the information ‘cloud:’ a social network is a structure made of nodes and edges that connect nodes in various relationships. Graphical representation is the most common method to visually represent the information. Network modeling could also be used for studying the simulation of network properties and its internal dynamics.
A sociomatrix can be used to construct representations of a social network structure. Node degree, network density, and other large-scale parameters can derive information about the importance of certain entities within the network. Such communities are clusters or modules. Specific algorithms can perform network-clustering, one of the fundamental tasks in network analysis. Detecting particular user communities requires identifying specific, networked nodes that will allow information extraction. Healthcare providers could use patient opinion to improve their services. Physicians could collect feedback from other doctors and patients to improve their treatment recommendations and results. Patients could use other consumers’ knowledge in making better-informed healthcare decisions.
The nature of social networks makes data collection difficult. Several methods have been employed, such as link mining [6], classification through links [7], predictions based on objects [8], links [9], existence [10], estimation [11], object [12], group [13], and subgroup detection [14], and mining the data [15], [16]. Link prediction, viral marketing, online discussion groups (and rankings) allow for the development of solutions based on user feedback.
Traditional social sciences use surveys and involve subjects in the data collection process, resulting in small sample sizes per study. With social media, more content is readily available, particularly when combined with web-crawling and scraping software that would allow real-time monitoring of changes within the network. Previous studies used technical solutions to extract user sentiment on influenza [17], technology stocks [18], context and sentence structure [19], online shopping [20], multiple classifications [21], government health monitoring [22], specific terms relating to consumer satisfaction [23], polarity of newspaper articles [24], and assessment of user satisfaction from companies [25], [26]. Despite the extensive literature, none have identified influential users, and how forum relationships affect network dynamics. In the first stage of our current study, we employ exploratory analysis using the self-organizing maps (SOMs) to assess correlations between user posts and positive or negative opinion on the drug. In the second stage, we model the users and their posts using a network-based approach.
We build on our previous study [27] and use an enhanced method for identifying user communities (modules) and influential users therein. The current approach effectively searches for potential levels of organization (scales) within the networks and uncovers dense modules using a partition stability quality measure [28]. The approach enables us to find the optimal network partition. We subsequently enrich the retrieved modules with word frequency information from module-contained users posts to derive local and global measures of users opinion and raise flag on potential side effects of Erlotinib, a drug used in the treatment of one of the most prevalent cancers: lung cancer [29].