Deep Agent: A Framework for Information Spread and Evolution in Social Networks (SocialSim)
Home / Projects / Deep Agent: A Framework for Information Spread and Evolution in Social Networks (SocialSim)
Overview:
In project Deep Agent (DARPA SocialSim program, $6.2M, PI Garibay), we build a comprehensive, realistic and at-scale computational simulation of information spread and evolution in online social networks using a novel computational modeling paradigm: Deep Agent Framework (DAF).
The Deep Agent Framework unleashes the power of combining massively parallel computing, data analytics of large datasets and machine learning into assisting model designers to mix and match sub models in a semi-automated way, exploring, testing and validating not one but tens of thousands of models against not a single real world phenomenon but a large set of target behaviors. This process will aid model designers to continuously improve the best-so-far model as the problem challenges become more difficult. In fact, this process would also serve to automatically recombine and introduce variants to all the models produced by different Social Sim performers in order to obtain the overall best model. This framework will enable the creation of an accurate and at-scale simulation of information spread and evolution that can run on a typical off-the shelf commercial computer or small cluster.
The Deep Agent Framework posits the following: (1) The modeling of social dynamics can be accomplished by a network of computational agents endowed with deep neurocognitive capabilities via emotional, cognitive, and social modules. This is a synthesis of Agent Zero (Epstein, 2014) and Homos Socialis (Gintis and Helbing, 2015) frameworks. (2) Instead of the creation of a single hand-designed model of information spread and evolution, we create a family of modular sub-components from which multiple plausible models can be systematically assembled, tested and validated. These subcomponents will be created from both leading social theory-driven models and data-driven models. (3) The use of machine learning techniques to aid expert model designers and social scientists in our team in the computer-aided exploration of tens of thousands of competing models of information spread and evolution. The search is guided by model accuracy, as measured by comparing model simulated outputs with real-world social dynamics data.
UCF SocialSim
The UCF SocialSim GitHub repository will be home to the RHPC SMPLE and the GDELT GA Search algorithum toolkits.
The RHPC SMPLE toolkit is is an extension to the Repast HPC agent-based modeling platform that contains a pre-built structure for representing social media platforms, users, and content, and for representing the platforms in their full functionality, so that the affordances each platform provides are captured in detail. This can be echoed in the user strategies, so that users can employ the functions of the platforms and can engage in actions on some platforms that are impossible on others- reflecting the reality that some platforms can be exploited in ways that other platforms cannot. The platforms are themselves agents, and can change their affordances or their strategies for providing content to users. The toolkit therefore allows the portrayal of a rich social media environment and the testing of detailed scenarios and what-if interventions. The toolkit is designed for large-scale simulations on HPC systems.
The GDELT GA Search algorithm is the subject of a second toolkit being released by our team. Given a time series of daily counts of social media activity (that is, event counts during the training period) and a data set from the GDELT archives covering the same period, the GDELT GA Search algorithm attempts to find criteria that acts as a query to retrieve a subset of relevant GDELT events, generally a subset that can be transformed into a second time series that can then be used to correlate with and/or predict event counts from the known periods in the first time series. These criteria are then applied to the GDELT data set into the unknown period to extract a time series of GDELT events into the unknown period and use this, via some method of prediction (generally linear extrapolation) to arrive at expected event counts for that period. The result is a list of expected raw counts of events per day. Generally this was done per platform and per information ID (but not by activity type).
The MACM derives from traditional diffusion of information models such as the independent cascade model threshold but it is unique, as it is the first of its kind to simulate diffusion of information in the form of conversations. The MACM follows the principles of conversation theory instead of merely simulating the binary adoption of a topic or opinion.The MACM is based on three premises:
Premise 1: Diffusion of information over online social media occurs through conversations. Individuals participate in conversations due to the following factors: 1) influence of other participants, 2)influence from information sources exogenous to the conversation, or 3) the internal need to participate in conversation.
Premise 2: Conversation participants can perform three types of actions: 1) Initiation of a new conversation, 2) contribution to an existing conversation, 3) sharing existing information from a conversation.
Premise 3: Given a particular topic of interest, the influences can be determined from event timeseries data, by measuring the ratio of information flow, from the influencing timeseries to the influenced timeseries, to the total information flow of the influenced timeseries.
The MBM model is built on the Repast HPC platform, and was the originial platform in which the RHPC SMPLLE took was developed. The MBM model simulates social network evolution by multiplex networks, which have multi-layer network structure with possible shared nodes among different layers. As MBM is designed based on concepts from graph theory, we refer to Online Social Networks (OSN) users as nodes and user interactions as links. The model consists of a directed bipartite graph with bipartite pairs of users-repositories for GitHub, users-subReddits for Reddit, and users-users for Twitter, distinguished by multiple layers. Each of the separate user actions in the platform generate a sub-graph and the combination of the actions generate the whole network structure. The set of user actions in this model are conversation creation, contribution, vote , and follow, which can be formalized as (Ci ∉ {C}), (Ci = Cj,Mi ∉ {M}), (Vi ∉ {V}), and (Li ∉ {LUj}) respectively, where indices are representative of users that perform the action, and {C},{V}, and {LUj} refer to the sets of all conversations, votes, and links to followers of the user in the model up to the current time-step.
The cognitive factor of MBM refers to the information overload resulting in higher attention to recent activities and active users. In other words, MBM considers the recency bias affecting OSN users’ decision making processes to possible propagation of information. This concept has been designed in the model in terms of age and fitness values, such that the user’s influence decays in time. Content targets that have been recently the object of actions, and the users that have recently acted, see their fitness decrease the least, whereas these values for inactive users are decreased the most. This results in paying higher attention to the influential users and targets, but allowing their fitness values to be reduced in popularity over time, and eventually to be supplanted by newer elements. Reaching a certain age leads to the node removal from the model node-set. As a result, the model’s predictions are most affected by recent trending activities with higher attention to more active users.