Natural Language Processing Datasets

SocialSim Research Contributions

Social Simulation

Referred Publications 

  1. Taylor, C. E., Mantzaris, A. V., and Garibay, I. Exploring how homophily and accessibility can facilitate polarization in social networks. Information 9, 12 (Dec. 2018), 325. Number: 12 Publisher: Multidisciplinary Digital Publishing Institute 

Polarization in online social networks has gathered a significant amount of attention in the research community and in the public sphere due to stark disagreements with millions of participants on topics surrounding politics, climate, the economy and other areas where an agreement is required. This work investigates into greater depth a type of model that can produce ideological segregation as a result of polarization depending on the strength of homophily and the ability of users to access similar minded individuals. Whether increased access can induce larger amounts of societal separation is important to investigate, and this work sheds further insight into the phenomenon. Center to the hypothesis of homophilic alignments in friendship generation is that of a discussion group or community. These are modeled and the investigation into their effect on the dynamics of polarization is presented. The social implications demonstrate that initial phases of an ideological exchange can result in increased polarization, although a consensus in the long run is expected and that the separation between groups is amplified when groups are constructed with ideological homophilic preferences. 

  1. Newton, O. B., Fiore, S. M., and Song, J. Developing theory and methods to understand and improve collaboration in open source software development on GitHub. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 62, 1 (Sept. 2018), 1118–1122. Publisher: SAGE Publications Inc1 

As a result of fundamental changes in organizational needs and practices, social coding, a facet of modern collabo ration, has become a prevalent phenomenon in software development. While the adoption of social media platforms for social coding, like GitHub, has enabled distributed, asynchronous collaboration among software developers, the structure of such platforms introduces a novel set of socio-technical challenges that cognitive engineering is uniquely qualified to address. Towards this end, we examine GitHub?s social and technical features as a means for both improving and hindering coordination and problem solving in software development. Through an integration of existing theories from the organizational sciences with recent research in social coding, we offer a set of preliminary research questions aimed at improving understanding of open source software development. 

  1. Al-Rubaye, A., and Sukthankar, G. A popularity-based model of the diffusion of innovation on GitHub. In Proceedings of the 2018 Conference of the Computational Social Science Society of the Americas (Cham, 2020), T. Carmichael and Z. Yang, Eds., Springer Proceedings in Complexity, Springer International Publishing, pp. 165–178 

Open source software development platforms are natural laboratories for studying the diffusion of innovation across human populations, enabling us to better understand what motivates people to adopt new ideas. For example, GitHub, a software repository and collaborative development tool built on the Git distributed version control system, provides a social environment where ideas, techniques, and new methodologies are adopted by other software developers. This paper proposes and evaluates a popularity-based model of the diffusion of innovation on GitHub. GitHub supports a mechanism, forking, for creating personal copies of other software repositories that can be used to measure the propagation of code between developers. We examine the effects of repository popularity on two aspects of knowledge transfer, innovation adoption and sociality, measured on a dataset of GitHub fork events. 

  1. Hajiakhoond Bidoki, N., and Sukthankar, G. Network Semantic Segmentation with Application to GitHub. In 2018 International Conference on Computational Science and Computational Intelligence (CSCI) (Las Vegas, NV, USA, Dec. 2018), IEEE, pp. 1281–1284 

In this paper we introduce the concept of network semantic segmentation for social network analysis. We consider the GitHub social coding network which has been a center of attention for both researchers and software developers. Network semantic segmentation describes the process of associating each user with a class label such as a topic of interest. We augment node attributes with network significant connections and then employ machine learning approaches to cluster the users. We compare the results with a network segmentation performed using community detection algorithms and one executed by clustering with node attributes. Results are compared in terms of community diversity within the semantic segments along with topic coverage. 

  1. Bidoki, N. H., Sukthankar, G., Keathley, H., and Garibay, I. A cross-repository model for predicting popularity in GitHub. In 2018 International Conference on Computational Science and Computational Intelligence (CSCI) (Las Vegas, Nevada, USA, Dec. 2018), IEEE Computer Society, pp. 1248–1253 

Social coding platforms, such as GitHub, can serve as natural laboratories for studying the diffusion of innovation through tracking the pattern of code adoption by programmers. This paper focuses on the problem of predicting the popularity of software repositories over time; our aim is to forecast the time series of popularity-related events (code forks and watches). In particular, we are interested in cross-repository patterns-how do events on one repository affect other repositories? Our proposed LSTM (Long Short-Term Memory) recurrent neural network integrates events across multiple active repositories, outperforming a standard ARIMA (Auto Regressive Integrated Moving Average) time series prediction based on the single repository. The ability of the LSTM to leverage cross-repository information gives it a significant edge over standard time series forecasting. 

  1. Saadat, S., Gunaratne, C., Baral, N., Sukthankar, G., and Garibay, I. Initializing agent-based models with clustering archetypes. In Social, Cultural, and Behavioral Modeling (Cham, 2018), R. Thom son, C. Dancy, A. Hyder, and H. Bisgin, Eds., Lecture Notes in Computer Science, Springer International Publishing, pp. 233–239 

Agent-based models are a powerful tool for predicting population level behaviors; however their performance can be sensitive to the initial simulation conditions. This paper introduces a procedure for leveraging large datasets to initialize agent-based simulations in which the population is abstracted into a set of archetypes. We show that these archetypes can be discovered using clustering and evaluate the benefits of selecting clusters based on their stability over time. Our experiments on the GitHub dataset demonstrate that simulation runs performed with the clustering archetypes are more successful at predicting large-scale activity patterns. 

  1. Senevirathna, C., Gunaratne, C., Jayalath, C., Baral, N., and Garibay, I. Evidence of influence hierarchies in GitHub’s cryptocurrency community. In Proceedings of International Conference on Computational Social Science (IC2S2-2019) (University of Amsterdam, The Netherlands, July 2019) 
  2. Garibay, I., Mantzaris, A. V., Rajabi, A., and Taylor, C. E. Polarization in social media assists influencers to become more influential: analysis and two inoculation strategies. Scientific Reports 9, 1 (Dec. 2019), 18592 

This work explores simulations of polarized discussions from a general and theoretical premise. Specifically the question of whether a plausible avenue exists for a subgroup in an online social network to find a disagreement beneficial and what that benefit could be. A methodological framework is proposed which represents key factors that drives social media engagement including the iterative accumulation of influence and the dynamics for the asymmetric treatment of messages during a disagreement. It is shown that prior to a polarization event a trend towards a more uniform distribution of relative influence is achieved which is then reversed by the polarization event. The reasons for this reversal are discussed and how it has a plausible analogue in real world systems. A pair of inoculation strategies are proposed which aim at returning the trend towards uniform influence across users while refraining from violating user privacy (by remaining topic agnostic) and from user removal operations. 

  1. Akula, R., Yousefi, N., and Garibay, I. DeepFork: Supervised prediction of information diffusion in GitHub. In Proceedings of the International Conference on Industrial Engineering and Operations Management Bangkok (Bangkok, Thailand, 2019), pp. 3640–3651

Information spreads on complex social networks extremely fast. A piece of information can go viral in no time and can be harmful. Often it is hard to stop this information spread causing social unrest. An intentional spread of software vulnerabilities in GitHub has caused millions of dollars in losses. GitHub is a social coding platform that enables a huge number of open source software projects to thrive. To better understand how the information spreads on GitHub, we develop a deep neural network model: “DeepFork”, a supervised machine learning based approach that aims to predict information diffusion; considering node as well as topological features in complex social networks. In our empirical studies, we observed that information diffusion can be detected by link prediction using supervised learning. This model investigates the followee-follower influence that underlay information dynamics in social coding platform. DeepFork outperforms other machine learning models as it better learns the discriminative patterns from the input features. DeepFork helps us in understand human influence on information spread and evolution. 

  1. Newton, O. B., Fiore, S. M., and Song, J. Expertise and complexity as mediators of knowledge loss in open source software development. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 63, 1 (Nov. 2019), 1580–1584. Publisher: SAGE Publications Inc 

This paper describes an approach integrating cognitive engineering with computational social science in the context of open source software (OSS) development. Through an analysis of large-scale collaborations in a complex operational setting, we study how expertise and task complexity predict changes in productivity when knowledge loss occurs. Using team data from thousands of software files, we model the effects of expertise, complexity, and knowledge loss on productivity. On its own, knowledge loss had a negative effect on productivity, but this effect was reversed when knowledge loss was combined with high complexity and high numbers of newcomers, suggesting that experts are better able to utilize crowdsourced work. We identify opportunities for research to inform prediction of outcomes in OSS projects based on team and task characteristics and demonstrate the value of integrating cognitive engineering with computational social science to study collaborative work in sociotechnical systems. 

  1. Schiappa, M., Chantry, G., and Garibay, I. Cyber security in a complex community: A social Media analysis on common vulnerabilities and exposures. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (Oct. 2019), pp. 13–20 

Social media platforms such as Twitter, Github, and Reddit are widely used forums for discussing all aspects of computer security. Common Vulnerabilities and Exposures (CVEs) are a frequent recurring topic given their potential for malicious exploitation. We want to understand more about the discussions that are taking place on these platforms and how they may influence a vulnerability’s lifespan and “exploitability”. We will do this by analyzing conversations on these platforms that reference a CVE identification number in both content, lifespan, and spread. We found that Twitter is the most common platform for discussion followed by Github. Github discussions indicate the attack vector of a CVE if exploited while Twitter indicates the impact of exploited CVEs. Our findings conclude that by tracking social media outlets, security vendors can identify CVEs with a high probability of malicious exploitation and proactively generate protection. 

  1. Garibay, I., Gunaratne, C., Yousefi, N., and Scheinert, S. The Agent-Based Model Canvas: A Modeling Lingua Franca for Computational Social Science. In Social-Behavioral Modeling for Complex Systems, P. K. Davis, A. O’Mahony, and J. Pfautz, Eds. John Wiley & Sons, Ltd, 2019, pp. 521–544. Section: 22 _eprint: 

Advancement in computational social science requires the expertise of social, computational, and data scientists. The field thrives due to the diversity of backgrounds and skills that these researchers bring from their native disciplines. Researchers also bring diversity in terminology that creates challenges in maintaining the effective communication necessary to design and execute complex research projects. We posit that a tool is needed to support communication and coordination across the members of research teams in computational social science. We propose a version of that tool: the Agent-Based Model Canvas. The Canvas is composed of nine basic building blocks that provide a framework for social theory design, testing, improvement, codification, and discussion in a truly interdisciplinary environment. The purpose of the Canvas is to assist theory discovery and refinement in an iterative fashion. The first version of the Canvas will be a starting point to list untested hypotheses and rough ideas about the elements of the model. As this iterative development process continues, the hypotheses are revised based on data analysis and emerging insights. This process progressively leads to refined descriptions that are machine executable, testable hypotheses, and finally, fully tested theories. 

  1. Bidoki, N. H., Schiappa, M., Sukthankar, G., and Garibay, I. Predicting social network evolution from community data partitions. In International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (Washington, D.C., 2019), p. 10 

Social media users exhibit repetitive behavior patterns that can be leveraged to predict trends in network evolution. Our hypothesis is that these patterns exhibit greater consistency within a single community of users; hence global data distributions can be more accurately modeled by composing community data distributions. This paper compares two different strategies for predicting social media usage with sampled historical data on Reddit and GitHub. We demonstrate that our community-based model outperforms the global one at predicting population, user, and content activity, along with network topology over three different datasets. 

  1. Sukthankar, G., and Beheshti, R. Using Agent-Based Models to Understand Health-Related Social Norms. In Social-Behavioral Modeling for Complex Systems, P. K. Davis, A. O’Mahony, and J. Pfautz, Eds. John Wiley & Sons, Ltd, 2019, pp. 633–654. Section: 27_eprint: doi/pdf/10.1002/9781119485001.ch27 

Social norms have been demonstrated to strongly affect people’s health choices, yet they are often not included in health models due to the complex interdependencies of reasoning about norm adoption over a large population. This article introduces two agent-based modeling frameworks designed explicitly for reasoning about the influence of social norms, lightweight normative architecture (LNA) and cognitive social learners (CSL), and illustrates their usage for modeling smoking cessation trends. LNA models the impact of personal, social, and environmental factors on recognition, adoption, and compliance with a single smoking norm, whereas CSL is capable of reasoning about multiple social norms. By incorporating a more complex normative reasoning model, CSL cannot only predict smoking trends but also accurately forecasts population-level responses to surveys on the social acceptability of smoking. These social models are an important complement to existing biological models of human health and wellness. 

  1. Hajiakhoond Bidoki, N., Mantzaris, A. V., and Sukthankar, G. An LSTM Model for Predicting Cross-Platform Bursts of Social Media Activity. Information 10, 12 (Dec. 2019), 394. Number: 12 Publisher: Multidisciplinary Digital Publishing Institute 

Burst analysis and prediction is a fundamental problem in social network analysis, since user activities have been shown to have an intrinsically bursty nature. Bursts may also be a signal of topics that are of growing real-world interest. Since bursts can be caused by exogenous phenomena and are indicative of burgeoning popularity, leveraging cross platform social media data may be valuable for predicting bursts within a single social media platform. A Long Short-Term-Memory (LSTM) model is proposed in order to capture the temporal dependencies and associations based upon activity information. The data used to test the model was collected from Twitter, Github, and Reddit. Our results show that the LSTM based model is able to leverage the complex cross-platform dynamics to predict bursts. In situations where information gathering from platforms of concern is not possible the learned model can provide a prediction for whether bursts on another platform can be expected. 

  1. Rajabi, A., Gunaratne, C., Mantzaris, A. V., and Garibay, I. On Countering Disinformation with Caution: Effective Inoculation Strategies and Others that Backfire into Community Hyper-Polarization. In Social, Cultural, and Behavioral Modeling (Cham, 2020), R. Thomson, H. Bisgin, C. Dancy, A. Hyder, and M. Hussain, Eds., Lecture Notes in Computer Science, Springer International Publishing, pp. 130–139 

The increasing adoption of social media platforms as a means of communication has made them into one of the main targets for disinformation and misinformation campaigns due, in part, to the speed increase and cost decrease of communication provided by these platforms. Given that facts and opinions are proposed, discussed and adopted by users of these platforms, countering this threat needs a better understanding of the dynamics by which false and misleading information spreads and gets adopted by users. This work develops an agent-based model that simulates an organized disinformation campaign performed by a group of users referred to as conspirators, which are opposed by a parallel organization acting as a barrier to the spread of disinformation, the inoculators. The exploration of the simulation results shows how different macroscopic states in respect to a disinformation infection and the stages for a macroscopic consensus exist. The control of the simulation based upon the model parameters allows the progression of the complete network to converge and separate over time. This provides insight into a plausible feature of social networks where the macrostate of the system depends upon the parameter values and can be modified. The relationship between these values is explored and provides intuition into aspects of a community which are necessary to withstand disinformation campaigns. The results also provide an important cautionary note that after a certain degree of conspiracy counter measures a network may become hyper-polarized. 

  1. Rajabi, A., Gunaratne, C., Mantzaris, A. V., and Garibay, I. Modeling Disinformation and the Effort to Counter It: A Cautionary Tale of When the Treatment Can Be Worse Than the Disease. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (Richland, SC, May 2020), AAMAS ’20, International Foundation for Autonomous Agents and Multiagent Systems, pp. 1975–1977 

The problem of disinformation in online social networks has recently received a considerable amount of attention from the research community. It has been shown that online social networks are extensively getting exploited to alter public opinion and individuals’ stance on a wide-range of topics. This study proposes an agent-based model that simulates a disinformation campaign by a group of organized users called conspirators, targeting a susceptible population, which are then opposed by a parallel organized group of users referred to as inouclators that try to act as a barrier to the spread of disinformation. The results of this study indicate that the process of inoculating a susceptible population against disinformation is mostly at the price of further polarizing the population. 

  1. Higham, D. J., and Mantzaris, A. V. A network model for polarization of political opinion. Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 4 (Apr. 2020), 043109. Publisher: American Institute of Physics 

We propose and study a simple model for the evolution of political opinion through a population. The model includes a nonlinear term that causes individuals with more extreme views to be less receptive to external influence. Such a term was suggested in 1981 by Cobb in the context of a scalar-valued diffusion equation, and recent empirical studies support this modeling assumption. Here, we use the same philosophy in a network-based model. This allows us to incorporate the pattern of pairwise social interactions present in the population. We show that the model can admit two distinct stable steady states. This bi-stability property is seen to support polarization and can also make the long-term behavior of the system extremely sensitive to the initial conditions and to the precise connectivity structure. Computational results are given to illustrate these effects. 

  1. Bidoki, N. H., Schiappa, M., Sukthankar, G., and Garibay, I. Modeling social coding dynamics with sampled historical data. Online Social Networks and Media 16 (Mar. 2020), 100070 

The aim of our research is to forecast the propagation of information related to cybersecurity threats and software vulnerabilities on social coding platforms such as GitHub. Users on social coding platforms exhibit repetitive behavior patterns that can be leveraged to predict trends in network evolution. These patterns exhibit greater consistency within a single community of users; hence global data distributions can be more accurately modeled by composing community data distributions. A wise sampling approach based on the identification of similarities between the historical and predicted patterns in social behavior can be used to augment the performance of other approaches in order to create an at scale simulation. This article compares two different strategies for predicting network evolution with sampled historical data on GitHub. We demonstrate that our community-based model outperforms the global one at predicting population, user, and content activity, along with network topology over three different datasets. 

  1. Bidoki, N. H., Mantzaris, A. V., and Sukthankar, G. Exploiting weak ties in incomplete net work datasets using simplified graph convolutional neural networks. Machine Learning and Knowledge Extraction 2, 2 (2020), 125–146 

This paper explores the value of weak-ties in classifying academic literature with the use of graph convolutional neural networks. Our experiments look at the results of treating weak-ties as if they were strong-ties to determine if that assumption improves performance. This is done by applying the methodological framework of the Simplified Graph Convolutional Neural Network (SGC) to two academic publication datasets: Cora and Citeseer. The performance of SGC is compared to the original Graph Convolutional Network (GCN) framework. We also examine how node removal affects prediction accuracy by selecting nodes according to different centrality measures. These experiments provide insight for which nodes are most important for the performance of SGC. When removal is based on a more localized selection of nodes, augmenting the network with both strong-ties and weak-ties provides a benefit, indicating that SGC successfully leverages local information of network nodes. 

  1. Gunaratne, C., Baral, N., Rand, W., Garibay, I., Jayalath, C., and Senevirathna, C. The effects of information overload on online conversation dynamics. Computational and Mathematical Organization Theory 26, 2 (June 2020), 255–276 

The inhibiting effects of information overload on the behavior of online social media users, can affect the population level characteristics of information dissemination through online conversations. We introduce a mechanistic, agent based model of information overload and investigate the effects of information overload threshold and rate of information loss on observed online phenomena. We find that conversation volume and participation are lowest under high information overload thresholds and mid-range rates of information loss. Calibrating the model to user responsiveness data on Twitter, we replicate and explain several observed phenomena: (1) Responsiveness is sensitive to information overload threshold at high rates of information loss; (2) Information overload threshold and rate of information loss are Pareto-optimal and users may experience overload at inflows exceeding 30 notifications per hour; (3) Local abundance of small cascades of modest global popularity and local scarcity of larger cascades of high global popularity explains why overloaded users receive, but do not respond to large, highly popular cascades; 4) Users typically work with 7 notifications per hour; 5) Over-exposure to information can suppress the likelihood of response by overloading users, contrary to analogies to biologically-inspired viral spread. Reconceptualizing information spread with the mechanisms of information overload creates a richer representation of online conversation dynamics, enabling a deeper understanding of how (dis)information is transmitted over social media. 

  1. Saadat, S., Newton, O. B., Sukthankar, G., and Fiore, S. M. Analyzing the productivity of GitHub teams based on formation phase activity. In 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) (Dec. 2020), pp. 169–176 

Our goal is to understand the characteristics of high-performing teams on GitHub. Towards this end, we collect data from software repositories and evaluate teams by examining differences in productivity. Our study focuses on the team formation phase, the first six months after repository creation. To better understand team activity, we clustered repositories based on the proportion of their work activities and discovered three work styles in teams: toilers, communicators, and collaborators. Based on our results, we contend that early activities in software development repositories on GitHub establish coordination processes that enable effective collaborations over time. 

  1. Mutlu, E. c., Oghaz, T., Tütüncüler, E., and Garibay, I. Do bots have moral judgement? the difference between bots and humans in moral rhetoric. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (Dec. 2020), pp. 222–226. ISSN: 2473-991X 

Understanding moral foundations can yield powerful results in terms of perceiving the intended meaning of the text data, as the concept of morality provides additional information on the unobservable characteristics of information processing and non-conscious cognitive processes. Considering that moral values vary significantly across cultures and yet many recurrent themes are observed and that each culture builds its societal and ideological narratives on top of its moral virtues, an enhanced understanding of morality can prove to be a valuable tool in deterring disinformation narratives by adversaries. Therefore, we investigate the evolution of latent moral loadings over time and across different sub-narratives on human and bot-generated tweets. For this purpose, we analyze the Syrian White Helmets-related tweets from April 1st, 2018 to April 30th, 2019. For the operationalization and quantifi cation of moral rhetoric in tweets, we use Moral Foundations Dictionary in which five psychological dimensions (Harm/Care, Subversion/Authority, Cheating/Fairness, Betrayal/Loyalty and Degradation/Purity) are considered. Our results present the significant differences between the strength and patterns of moral rhetoric for human and bot-generated content on Twitter. 

  1. Mutlu, E. c., Rajabi, A., and Garibay, I. CD-SEIZ: Cognition-driven SEIZ compartmental model for the prediction of information cascades on Twitter. In Proceedings of the 2020 Conference of The Computational Social Science Society of the Americas (Aug. 2020), Z. Yang and E. von Briesen, Eds., Springer International Publishing. In press 

Information spreading social media platforms has become ubiquitous in our lives due to viral information propagation regardless of its veracity. Some information cascades turn out to be viral since they circulated rapidly on the Internet. The uncontrollable virality of manipulated or disorientated true information (fake news) might be quite harmful, while the spread of the true news is advantageous, especially in emergencies. We tackle the problem of predicting information cascades by presenting a novel variant of SEIZ (Susceptible/ Exposed/ Infected/ Skeptics) model that outperforms the original version by taking into account the cognitive processing depth of users. We define an information cascade as the set of social media users’ reactions to the original content which requires at least minimal physical and cognitive effort; therefore, we considered retweet/ reply/ quote (mention) activities and tested our framework on the Syrian White Helmets Twitter data set from April 1st, 2018 to April 30th, 2019. In the prediction of cascade pattern via traditional compartmental models, all the activities are grouped, and their summation is taken into account; however, transition rates between compartments should vary according to the activity type since their requirements of physical and cognitive efforts are not same. Based on this assumption, we design a cognition-driven SEIZ (CD-SEIZ) model in the prediction of information cascades on Twitter. We tested SIS, SEIZ, and CD-SEIZ models on 1000 Twitter cascades and found that CD-SEIZ has a significantly low fitting error and provides a statistically more accurate estimation. 

  1. Song, J., and Fiore, S. M. For whom the tale’s told: Towards a multidimensional model of targeted narrative persuasion in information operations. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 64, 1 (Dec. 2020), 1505–1509. Publisher: SAGE Publications Inc 

The modern information environment provides an opportunity for cognitive engineering to inform the study of information operations, which involve strategic, often politically-motivated actions to manipulate a targeted audience. In this paper we integrate interdisciplinary theoretical concepts to provide a foundation for a model of persuasion in information operations. We identify sensemaking and framing as key processes and connect these to narrative and identity theories to illustrate how they can inform the study of the individual and the collective in modern sociotechnical systems. From this, we propose a model of narrative persuasion to guide research on social media information operations. Through this, we offer a set of research guidelines to demonstrate how this can serve as a foundation for empirical work blending quantitative and qualitative methods. In this way, we show how cognitive and computational sciences can be blended in support of fundamental and applied research in information operations. 

  1. Pho, P., and Mantzaris, A. V. Regularized Simple Graph Convolution (SGC) for improved inter pretability of large datasets. Journal of Big Data 7, 1 (Oct. 2020), 91 

Classification of data points which correspond to complex entities such as people or journal articles is a ongoing research task. Notable applications are recommendation systems for customer behaviors based upon their features or past purchases and in academia labeling relevant research papers in order to reduce the reading time required. The features that can be extracted are many and result in large datasets which are a challenge to process with complex machine learning methodologies. There is also an issue on how this is presented and how to interpret the parameterizations beyond the classification accuracies. This work shows how the network information contained in an adjacency matrix allows improved classification of entities through their associations and how the framework of the SGC provide an expressive and fast approach. The proposed regularized SGC incorporates shrinkage upon three different aspects of the projection vectors to reduce the number of parameters, the size of the parameters and the directions between the vectors to produce more meaningful interpretations. 

  1. Mantzaris, A. V. Incorporating a monetary variable into the Schelling model addresses the issue of a decreasing entropy trace. Scientific Reports 10, 1 (Oct. 2020), 17005 

The Schelling model of segregation has been shown to have a simulation trace which decreases the entropy of its states as the aggregate number of residential agents surrounded by a threshold of equally labeled agents increases. This introduces a paradox which goes against the second law of thermodynamics that states how entropy must increase. In the efforts to bring principles of physics into the modeling of sociological phenomena this must be addressed. A modification of the model is introduced where a monetary variable is provided to the residential agents (sampled from reported income data), and a dynamic which acts upon this variable when an agent changes its location on the grid. The entropy of the simulation over the iterations is estimated in terms of the aggregate residential homogeneity and the aggregate income homogeneity. The dynamic on the monetary variable shows that it can increase the entropy of the states over the simulation. The path of the traces with both variables in the results show that the shape of the region of entropy is followed supporting that the decrease of entropy due to the residential clustering has a parallel and independent effect increasing the entropy via the monetary variable. 

  1. Oghaz, T. A., Mutlu, E. c., Jasser, J., Yousefi, N., and Garibay, I. Probabilistic model of narratives over topical trends in social media: A discrete time model. In Proceedings of the 31st ACM Conference on Hypertext and Social Media (New York, NY, USA, July 2020), HT ’20, Association for Computing Machinery, pp. 281–290 

Online social media platforms are turning into the prime source of news and narratives about worldwide events. However, a systematic summarization-based narrative extraction that can facilitate communicating the main underly ing events is lacking. To address this issue, we propose a novel event-based narrative summary extraction framework. Our proposed framework is designed as a probabilistic topic model, with categorical time distribution, followed by extractive text summarization. Our topic model identifies topics’ recurrence over time with a varying time resolution. This framework not only captures the topic distributions from the data, but also approximates the user activity fluctu ations over time. Furthermore, we define significance-dispersity trade-off (SDT) as a comparison measure to identify the topic with the highest lifetime attractiveness in a timestamped corpus. We evaluate our model on a large corpus of Twitter data, including more than one million tweets in the domain of the disinformation campaigns conducted against the White Helmets of Syria. Our results indicate that the proposed framework is effective in identifying topical trends, as well as extracting narrative summaries from text corpus with timestamped data. 

  1. Winter, R., Scheinert, S., Stanfill, M., Salter, A., Newton, O. B., Song, J., Fiore, S., Rand, W., and Garibay, I. A taxonomy of user actions on social networking sites. In Proceedings of the 31st ACM Conference on Hypertext and Social Media (HT’ 20) (New York, NY, USA, July 2020), HT ’20, Association for Computing Machinery, pp. 233–234 

The spread of information within and across Social Networking Sites (SNSs) is increasingly impactful on contem porary society. As information (and misinformation) moves across multiple online platforms, it is important to be able to put these platforms in conversation with one another in order to better understand complex phenomena. This article proposes a taxonomy of actions that are consistent across SNSs to provide researchers and other stakeholders with consistent terminology that enables classifying and comparing user activities over a variety of social media platforms. The proposed taxonomy of actions indicates that although SNSs differentiate themselves in the market and at the level of user experience through unique capabilities and forms of interaction, they can be productively understood as varying means to perform the same set of underlying actions: create, vote, follow, and post. 

  1. Al-Rubaye, A., and Sukthankar, G. Scoring popularity in GitHub. In 2020 International Conference on Computational Science and Computational Intelligence (CSCI) (Dec. 2020), pp. 217–223 

Popularity and engagement are the currencies of social media platforms, serving as powerful reinforcement mecha nisms to keep users online. Social coding platforms such as GitHub serve a dual purpose: they are practical tools that facilitate asynchronous, distributed collaborations between software developers while also supporting passive social media style interactions. There are several mechanisms for “liking” content on GitHub: 1) forking repositories to copy their content 2) watching repositories to be notified of updates and 3) starring to express approval. This paper presents a study of popularity in GitHub and examines the relationship between these three quantitative measures of popularity. We introduce a weight-based popularity score (WTPS) that is extracted from the history line of GitHub repositories popularity indicators. 

  1. Rajabi, A., Talebzadehhosseini, S., and Garibay, I. Resistance of Communities Against Disinforma tion. In Proceedings of the 2019 International Conference of The Computational Social Science Society of the Americas (Cham, 2021), Z. Yang and E. von Briesen, Eds., Springer Proceedings in Complexity, Springer International Publishing, pp. 29–372 

The spread of disinformation is considered a big threat to societies and has recently received unprecedented attention. In this paper, we propose an agent-based model to simulate the dissemination of a conspiracy in a population. The model is able to compare the resistance of different network structures against the activity of conspirators. Results show that connectedness of network structure and centrality of conspirators are of crucial importance in preventing conspiracies from becoming widespread. 

  1. Garibay, I., Oghaz, T. A., Yousefi, N., Mutlu, E. c., Schiappa, M., Scheinert, S., Anagnos topoulos, G. C., Bouwens, C., Fiore, S. M., Mantzaris, A., Murphy, J. T., Rand, W., Salter, A., Stanfill, M., Sukthankar, G., Baral, N., Fair, G., Gunaratne, C., Hajiakhoond, N. B., Jasser, J., Jayalath, C., Newton, O. B., Saadat, S., Senevirathna, C., Winter, R., and Zhang, X. Deep Agent: Studying the dynamics of information spread and evolution in social networks. In Proceedings of the 2019 International Conference of The Computational Social Science Society of the Americas (Cham, 2021), Z. Yang and E. von Briesen, Eds., Springer Proceedings in Complexity, Springer International Publishing, pp. 153–1693 

This paper explains the design of a social network analysis framework, developed under DARPA’s SocialSim program, with novel architecture that models human emotional, cognitive, and social factors. Our framework is both theory and data-driven, and utilizes domain expertise. Our simulation effort helps understanding how information flows and evolves in social media platforms. We focused on modeling three information domains: cryptocurrencies, cyber threats, and software vulnerabilities for the three interrelated social environments: GitHub, Reddit, and Twitter. We participated in the SocialSim DARPA Challenge in December 2018, in which our models were subjected to an extensive performance evaluation for accuracy, generalizability, explainability, and experimental power. This paper reports the main concepts and models, utilized in our social media modeling effort in developing a multi-resolution simulation at the user, community, population, and content levels. 

  1. Mutlu, E. c., and Garibay, I. The degree-dependent threshold model: Towards a better understanding of opinion dynamics on online social networks. In Proceedings of the 2019 International Conference of The Computational Social Science Society of the Americas (Cham, 2021), Z. Yang and E. von Briesen, Eds., Springer Proceedings in Complexity, Springer International Publishing, pp. 83–944 

With the rapid growth of online social media, people become increasingly overwhelmed by the volume and the content of the information present in the environment. The fact that people express their opinions and feelings through social media channels, influence other people, and get influenced by them has led researchers from various disciplines to focus on understanding the mechanism of information and emotion contagion. The threshold model is currently one of the most common methods to capture the effect of people on others’ opinions and emotions. Although many studies employ and try to improve upon the threshold model, the search for an appropriate threshold function for defining human behavior is an essential and yet an unattained quest. The definition of heterogeneity in thresholds of individuals is oftentimes poorly defined, which leads to the rather simplistic use of uniform and binary functions, albeit they are far from representing reality. In this study, we use Twitter data of size 30,704,025 tweets to mimic the adoption of a new opinion. Our results show that the threshold is not only correlated with the out-degree of nodes, which contradicts other studies, but also correlated with nodes’ in-degree. Therefore, we simulated two cases in which thresholds are out-degree and in-degree dependent, separately. We concluded that the system is more likely to reach a consensus when thresholds are in-degree dependent; however, the time elapsed until all nodes fix their opinions is significantly higher in this case. Additionally, we did not observe a notable effect of mean-degree on either the average opinion or the fixation time of opinions for both cases, and increasing seed size has a negative effect on reaching a consensus. Although threshold heterogeneity has a slight influence on the average opinion, the positive effect of heterogeneity on reaching a consensus is more pronounced when thresholds are in-degree dependent. 

  1. Baral, N., Gunaratne, C., Jayalath, C., Rand, W., Senevirathna, C., and Garibay, I. Negative influence gradients lead to lowered information processing capacity on social networks. In Proceedings of the 2019 International Conference of The Computational Social Science Society of the Americas (Cham, 2021), Z. Yang and E. von Briesen, Eds., Springer Proceedings in Complexity, Springer International Publishing, pp. 265–2755 

Communication networks are known to exhibit asymmetric influence structures, constructed of a spectrum from highly influential individuals to highly influenced individuals. Information Processing Capacity (IPC) determines the level of responsiveness expressed by individuals when communicating with others in such networks. In this study, we explore the asymmetric influence structure of GitHub’s cryptocurrency developer community and show how it affects the IPC of the users in such networks. We use an agent-based model of information diffusion and conversation based on dynamic individual-level probabilities extracted from data on activity from cryptocurrency-related GitHub repositories. In this model, users that receive notifications from their neighbors at a rate above their IPC enter an overloaded state. We show that users who are influenced substantially more than they influence other users are typically expected to be overloaded and constantly experience lower IPC. In other words, these users are influenced more than they are able to express this magnitude of influence toward their neighbors. These results have potential implications in the design of viral marketing and reducing the harm of misinformation campaigns. 

  1. Mutlu, E. C., and Ozmen Garibay, O. Quantum contagion: A quantum-like approach for the analysis of social contagion dynamics with heterogeneous adoption thresholds. Entropy 23, 5 (May 2021), 538. Number: 5 Publisher: Multidisciplinary Digital Publishing Institute 

Modeling the information of social contagion processes has recently attracted a substantial amount of interest from researchers due to its wide applicability in network science, multi-agent-systems, information science, and marketing. Unlike in biological spreading, the existence of a reinforcement effect in social contagion necessitates considering the complexity of individuals in the systems. Although many studies acknowledged the heterogeneity of the individuals in their adoption of information, there are no studies that take into account the individuals’ uncertainty during their adoption decision-making. This resulted in less than optimal modeling of social contagion dynamics in the existence of phase transition in the final adoption size versus transmission probability. We employed the Inverse Born Problem (IBP) to represent probabilistic entities as complex probability amplitudes in edge-based compartmental theory, and demonstrated that our novel approach performs better in the prediction of social contagion dynamics through extensive simulations on random regular networks. 

  1. Senevirathna, C., Gunaratne, C., Rand, W., Jayalath, C., and Garibay, I. Influence cascades: Entropy-based characterization of behavioral influence patterns in social media. Entropy 23, 2 (Feb. 2021), 160. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute 

Influence cascades are typically analyzed using a single metric approach, i.e., all influence is measured using one number. However, social influence is not monolithic; different users exercise different influences in different ways, and influence is correlated with the user and content-specific attributes. One such attribute could be whether the action is an initiation of a new post, a contribution to a post, or a sharing of an existing post. In this paper, we present a novel method for tracking these influence relationships over time, which we call influence cascades, and present a visualization technique to better understand these cascades. We investigate these influence patterns within and across online social media platforms using empirical data and comparing to a scale-free network as a null model. Our results show that characteristics of influence cascades and patterns of influence are, in fact, affected by the platform and the community of the users. 

  1. Mantzaris, A. V., Chiodini, D., and Ricketson, K. Utilizing the simple graph convolutional neural network as a model for simulating influence spread in networks. Computational Social Networks 8, 1 (Mar. 2021), 12 

The ability for people and organizations to connect in the digital age has allowed the growth of networks that cover an increasing proportion of human interactions. The research community investigating networks asks a range of questions such as which participants are most central, and which community label to apply to each member. This paper deals with the question on how to label nodes based on the features (attributes) they contain, and then how to model the changes in the label assignments based on the influence they produce and receive in their networked neighborhood. The methodological approach applies the simple graph convolutional neural network in a novel setting. Primarily that it can be used not only for label classification, but also for modeling the spread of the influence of nodes in the neighborhoods based on the length of the walks considered. This is done by noticing a common feature in the formulations in methods that describe information diffusion which rely upon adjacency matrix powers and that of graph neural networks. Examples are provided to demonstrate the ability for this model to aggregate feature information from nodes based on a parameter regulating the range of node influence which can simulate a process of exchanges in a manner which bypasses computationally intensive stochastic simulations. 

  1. Aravamudan, A., Zhang, X., Song, J., Fiore, S. M., and Anagnostopoulos, G. C. Influence dynamics among narratives: A case study of the Venezuelan presidential crisis. In Social, Cultural, and Behavioral Modeling (Cham, 2021), R. Thomson, M. N. Hussain, C. Dancy, and A. Pyke, Eds., Lecture Notes in Computer Science, Springer International Publishing, pp. 204–213 

It is widely understood that diffusion of and simultaneous interactions between narratives—defined here as persistent point-of-view messaging—significantly contributes to the shaping of political discourse and public opinion. In this work, we propose a methodology based on Multi-Variate Hawkes Processes and our newly-introduced Process Influence Measures for quantifying and assessing how such narratives influence (Granger-cause) each other. Such an approach may aid social scientists enhance their understanding of socio-geopolitical phenomena as they manifest themselves and evolve in the realm of social media. In order to show its merits, we apply our methodology on Twitter narratives during the 2019 Venezuelan presidential crisis. Our analysis indicates a nuanced, evolving influence structure between 8 distinct narratives, part of which could be explained by landmark historical events.


  1. Rajabi, A., Mantzaris, A. V., Atwal, K. S., and Garibay, I. Exploring the disparity of influence between users in the discussion of Brexit on Twitter. Journal of Computational Social Science (Mar. 2021) 

The topic of political polarization has received increased attention for valid reasons. Given that an increased amount of the social exchange for opinions happens online, social media platforms provide a good source of information to investigate various aspects of the phenomena. In this work, data collected from Twitter are used to examine polarization surrounding the topic of the Brexit referendum on the membership of the European Union. The analysis specifically focuses on the question of how different tiers of users in terms of influence can project their opinions and if the polarized conditions affect the relative balance in the broadcast capabilities of the tiers. The results show that during polarization periods, users of the higher tier have increased capabilities to broadcast their information in relation to the lower tiers thereby further dominating the discussion. This validates previous modeling investigations and the hypothesis that polarization provides an opportunity for influencers to increase their relative social capital. 

  1. Akula, R., and Garibay, I. Interpretable multi-head self-attention architecture for sarcasm detection in social media. Entropy 23, 4 (Apr. 2021), 394. Number: 4 Publisher: Multidisciplinary Digital Publishing Institute 

With the online presence of more than half the world population, social media plays a very important role in the lives of individuals as well as businesses alike. Social media enables businesses to advertise their products, build brand value, and reach out to their customers. To leverage these social media platforms, it is important for businesses to process customer feedback in the form of posts and tweets. Sentiment analysis is the process of identifying the emotion, either positive, negative or neutral, associated with these social media texts. The presence of sarcasm in texts is the main hindrance in the performance of sentiment analysis. Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant, with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions make sarcasm detection very difficult. In this work, we focus on detecting sarcasm in textual conversations from various social networking platforms and online media. To this end, we develop an interpretable deep learning model using multi-head self-attention and gated recurrent units. The multi-head self-attention module aids in identifying crucial sarcastic cue-words from the input, and the recurrent units learn long-range dependencies between these cue-words to better classify the input text. We show the effectiveness of our approach by achieving state-of-the-art results on multiple datasets from social networking platforms and online media. Models trained using our proposed approach are easily interpretable and enable identifying sarcastic cues in the input text which contribute to the final classification score. We visualize the learned attention weights on a few sample input texts to showcase the effectiveness and interpretability of our model. 

  1. Jasser, J., Garibay, I., Scheinert, S., and Mantzaris, A. V. Controversial information spreads faster and further than non-controversial information in Reddit. Journal of Computational Social Science (May 2021) 

Online users discuss and converse about all sorts of topics on social networks. Facebook, Twitter, Reddit are among many other networks where users can have this freedom of information sharing. The abundance of information shared over these networks makes them an attractive area for investigating all aspects of human behavior on information dissemination. Among the many interesting behaviors, controversiality within social cascades is of high interest to us. It is known that controversiality is bound to happen within online discussions. The online social network platform Reddit has the feature to tag comments as controversial if the users have mixed opinions about that comment. The difference between this study and previous attempts at understanding controversiality on social networks is that we do not investigate topics that are known to be controversial. On the contrary, we examine typical cascades with comments that the readers deemed to be controversial concerning the matter discussed. This work asks whether controversially initiated information cascades have distinctive characteristics than those not controversial in Reddit. We used data collected from Reddit consisting of around 17 million posts and their corresponding comments related to cybersecurity issues to answer these emerging questions. From the comparative analyses conducted, controversial content travels faster and further from its origin. Understanding this phenomenon would shed light on how users or organization might use it to their help in controlling and spreading a specific beneficiary message. 

  1. Akula, R., and Garibay, I. Explainable detection of sarcasm in social media. In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (Online, Apr. 2021), Association for Computational Linguistics, pp. 34–39 

Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions makes sarcasm detection very difficult. In this work, we focus on detecting sarcasm in textual conversations, written in English, from various social networking platforms and online media. To this end, we develop an interpretable deep learning model using multi-head self-attention and gated recurrent units. We show the effectiveness and interpretability of our approach by achieving state-of-the-art results on datasets from social networking platforms, online discussion forums, and political dialogues. 

  1. Alsoubai, A., Song, J., Razi, A., Dacre, P., and Wisniewski, P. Social media during the COVID 19 pandemic: A public health crisis or a political battle? In Social Computing and Social Media: Applications in Marketing, Learning, and Health (Cham, 2021), G. Meiselwitz, Ed., Lecture Notes in Computer Science, Springer International Publishing, pp. 289–307

Since the start of coronavirus disease 2019 (COVID-19) pandemic, social media platforms have been filled with discussions about the global health crisis. Meanwhile, the World Health Organization (WHO) has highlighted the importance of seeking credible sources of information on social media regarding COVID-19. In this study, we conducted an in-depth analysis of Twitter posts about COVID-19 during the early days of the COVID-19 pandemic to identify influential sources of COVID-19 information and understand the characteristics of these sources. We identified influential accounts based on an information diffusion network representing the interactions of Twitter users who discussed COVID-19 in the United States over a 24-h period. The network analysis revealed 11 influential accounts that we categorized as: 1) political authorities (elected government officials), 2) news organizations, and 3) personal accounts. Our findings showed that while verified accounts with a large following tended to be the most influential users, smaller personal accounts also emerged as influencers. Our analysis revealed that other users often interacted with influential accounts in response to news about COVID-19 cases and strongly contested political arguments received the most interactions overall. These findings suggest that political polarization was a major factor in COVID-19 information diffusion. We discussed the implications of political polarization on social media for COVID-19 communication. 

  1. Gunaratne, C., Rand, W., and Garibay, I. Inferring mechanisms of response prioritization on social media under information overload. Scientific Reports 11, 1 (Jan. 2021) 

Human decision-making is subject to the biological limits of cognition. The fluidity of information propagation over online social media often leads users to experience information overload. This in turn affects which information received by users are processed and gain a response to, imposing constraints on volumes of, and participation in, information cascades. In this study, we investigate properties contributing to the visibility of online social media notifications by highly active users experiencing information overload via cross-platform social influence. We analyze simulations of a coupled agent-based model of information overload and the multi-action cascade model of conversation with evolutionary model discovery. Evolutionary model discovery automates mechanistic inference on agent-based models by enabling random forest importance analysis on genetically programmed agent-based model rules. The mechanisms of information overload have shown to contribute to a multitude of global properties of online information cascades. We investigate nine characteristics of online messages that may contribute to the prioritization of messages for response. Our results indicate that recency had the largest contribution to message visibility, with individuals prioritizing more recent notifications. Global popularity of the conversation originator had the second highest contribution, and reduced message visibility. Messages that presented opportunity for novel user interaction, yet high reciprocity showed to have relatively moderate contribution to message visibility. Finally, insights from the evolutionary model discovery results helped inform response prioritization rules, which improved the robustness and accuracy of the model of information overload. 

Accepted Referred Publications 

Atwal, K., Murphy, J. T., and Garibay, I. Analysis of aging effects on preferential attachment with a Twitter dataset. In Proceedings of the 2021 IEEE Global Communications Conference (Madrid, Spain, Dec. 2021), IEEE 

Referred Conference Presentations 

  1. Saadat, S., and Sukthankar, G. Predicting the performance of software development teams on github, July 2018. Poster presented at International Conference on Computational Social Science 
  2. Gunaratne, C., Senevirathna, C., Jayalath, C., Baral, N., Rand, W., and Garibay, I. A multi action cascade model of conversation. In Proceedings of International Conference on Computational Social Science (IC2S2-2019) (University of Amsterdam, The Netherlands, July 2019) 

Extended Abstract Attainment of understanding through coherent conversations is a unique aspect of human com munication. Tumultuous changes in communication media have lead to unprecedented challenges in conversation dynamics at a global scale. Autonomy and conflicting understanding of concepts drive conversation, defined in conversation theory as an act of conflict-resolution. Autonomy has been both enabled through the ease of self expression via social media, and hampered by political polarization. Instant information access via the internet, with live streaming capabilities of social media, has increased the global understanding of even the most niche communi ties, and yet has provided new reasons for conflict and more grounds for conversation. However, existing models of information diffusion [1, 3, 4, 2] fail to capture the distinct mechanics of different types of actions, a salient feature of conversation, and instead are limited to modeling the action of sharing, simulating only the spreading of awareness of a topic among a population. Yet, conversations thrive on the evolution of information, driven by interdependent questions to further understanding [5]. Conversation evolution requires the distinct abilities to initiate conversations, contribute to existing conversations, share understandings, and delete existing information. Without modeling such actions, information cascades with deep inter-dependencies of concepts and dialogue between participants cannot be simulated. We developed the Multi-Action Content Cascade Model (MACM), an information theoretic agent-based model, to address the above issues. MACM agents are conversation participants U, that perform actions E, on one or more target conversations T posting possible content C. User V is notified of their neighbor U’s actions via message tuples (analogous, but not limited, to social media notifications), and if they decide to act on a particular message, propagate a modified message to their friends representing their action: 

  1. Jayalath, C., Gunaratne, C., Senevirathna, C., and Garibay, I. A path dependent equivalence of the independent cascade model and the linear threshold model. In Proceedings of International Conference on Computational Social Science (IC2S2-2019) (University of Amsterdam, The Netherlands, July 2019) 
  2. Senevirathna, C., Gunaratne, C., Jayalath, C., Baral, N., and Garibay, I. Hidden patterns in influence hierarchies in GitHub’s cryptocurrency community, Oct. 2019. Poster presented at 2019 Annual Conference of the Computational Social Science Society of the Americas 
  3. Newton, O. B. Defining and promoting societal benefits in open source software development, Jan. 2020. Paper presented at GROUP4Good Workshop at the 2020 ACM International Conference on Supporting Group Work (GROUP ’20) 

My dissertation focuses on modeling membership change in open source software (OSS) projects and understanding its relationship with collective outcomes. For this research topic, I have reviewed the literature and conducted preliminary analyses to begin exploring the social and technical factors that influence membership change. An important next step then involves defining outcomes of interest for OSS projects, including positive social impacts. My goal for this workshop is to engage in a discussion with other participants to help broaden my understanding of societal benefits and build relationships with group researchers who can provide guidance on how to most effectively integrate the promotion of societal benefits in my research. 

  1. Song, J. Coordinated authentic behavior: Positive influence in online networks, Jan. 2020. Paper presented at GROUP4Good Workshop at the 2020 ACM International Conference on Supporting Group Work (GROUP ’20) 

In my ongoing doctoral research, I have begun initial analyses of how information spreads via coordinated efforts across social media platforms over time, as well as how the dynamic relationships between conflicting and comple mentary narratives affect the construction of collective beliefs within and across communities and cultures. I aim to consider not just how to mitigate harm (e.g., disrupting disinformation campaigns) but also how our understanding of coordination on social networking sites can be leveraged to promote and sustain positive change. To this end, my goal for this workshop is to collaborate with other researchers to gain new perspectives and a richer understanding of how to ensure research on online social influence intended to have a positive societal impact is, in fact, doing so. 

  1. Mutlu, E. c., and Garibay, I. Effects of assortativity on consensus formation with heterogeneous agents. In arXiv:2004.13131 [physics](Apr. 2020), pp. 1–8. Working paper presented at International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS) 

Despite the widespread use of Barabasi’s scale-free networks and Erdos-Renyi networks of which degree correlation (assortativity) is neutral, numerous studies demonstrated that online social networks tend to show assortative mixing (positive degree correlation), while non-social networks show a disassortative mixing (negative degree correlation). First, we analyzed the variability in the assortativity coefficients of different groups of the same platform by using three different subreddits in Reddit. Our data analysis results showed that Reddit is disassortative, and assortativity coefficients of the aforementioned subreddits are computed as -0.0384, -0.0588 and -0.1107, respectively. Motivated by the variability in the results even in the same platform, we decided to investigate the sensitivity of dynamics of consensus formation to the assortativity of the network. We concluded that the system is more likely to reach a consensus when the network is disassortatively mixed or neutral; however, the likelihood of the consensus significantly decreases when the network is assortatively mixed. Surprisingly, the time elapsed until all nodes fix their opinions is slightly lower when the network is neutral compared to either assortative or disassortative networks. These results are more pronounced when the thresholds of agents are more heterogeneously distributed. 

  1. Jayalath, C., Gunaratne, C., Rand, W., Senevirathna, C., and Garibay, I. Final states of threshold based complex-contagion model and independent-cascade model on directed scale-free networks under homogeneous conditions, July 2020. Poster presented at Tenth International Conference on Complex Systems 

There are a variety of information diffusion models, all of which simulate the adoption and spread of information over time. However, there is a lack of understanding of whether, despite their conceptual differences, these models represent the same underlying generative structures. Comparison of the possible causal trajectories that simulations of these models may take, allow us to look beyond conceptual discrepancies and identify mechanistic similarities. In this study, we analyzed the diffusion of information through social networks through agent-based simulations of a Linear-threshold based complex-contagion model and Independent-cascade model on directed Scale-free networks. The Linear-threshold model postulates that adoption occurs once the fraction of an individual’s adopted neighbors has exceeded an internal threshold. The adoption in the Independent-cascade model is governed through a Bayesian probability. We discover, empirically, that the final fraction of adopted nodes follows similar dynamics with both the threshold of the Linear-threshold model and the probability of adoption of the Independent-cascade model. In addition, we examine the fraction of infected-to-susceptible edges that drive the spread of the information in both models and discover that fraction of these transmissible edges also follows similar dynamics towards the end-states of both models. We thereby show that, despite differences in their conceptual motivations, both the Linear-threshold model and the Independent-cascade model function equivalently and can describe the same state space under homogeneous conditions on Scale-free networks. Through this study we attempt to highlight the importance of understanding the underlying causal mechanisms of models that might at first misleadingly seem to be conceptually different. 

  1. Zhang, X., Aravamudan, A., Koufakou, A., Gunaratne, C., Garibay, I., and Anagnostopoulos, G. C. Predicting software vulnerability exploits from social media confabulations, July 2020. Poster presented at the 6th International Conference on Computational Social Science (IC2S2) 

Invited Presentations and Workshops 

  1. Cyber Security in a Complex Community: A Social Media Analysis on Common Vulnerabilities and Exposures (May 2019). Madeline Schiappa presented a talk at the session titled “Using Big Data to Catalyze Innovation and Economic Prosperity in Smart Cities” at the International Conference on Smart Tourism, Smart Cities and Enabling Technologies. 
  2. 5G Kills? A Case Study of COVID-19 Misinformation on Twitter (May 2020). Jihye Song led the research team that received 1st place in the Net-COVID workshop, a special online workshop series presented by the University of Maryland’s COMBINE program in Network Biology in partnership with the University of Vermont’s Complex Systems Center. home?authuser=0 
  3. Social Media and the Infodemic: Exploring the Relationship between COVID-19 Misinformation and Public Health Outcomes(May 2020). Jihye Song presented a lecture at the session titled “Influencing Behavior Change During and After COVID-19” at the 2020 International Symposium on Human Factors and Ergonomics in Health Care. 
  4. Workshop on Inverse Generative Social Science (iGSS) Evolutionary Model Discovery (June 2021). Ivan Garibay, Bill Rand, and other prominent researchers conducted a workshop on the topic of inverse generative social science (iGSS), often called evolutionary model discovery (EMD). Most agent-based models take the agent rules as a given and optimize parameters around those rules (generative social science). Inverse generative social science provides a way for large numbers of rules and rule combinations to be informed by the data and then selected for optimal fit using machine learning techniques such as random forest. The workshop provided practitioners with current projects, applications and topics to advance this emerging approach to computational social science. This is an example of the science and research community around it that was inspired by the work of SocialSim. This year’s workshop had 187 participants, with 24 presenters from 7 countries. science/

Movie Review Data

Movie Review Data:

This dataset is a combination of movie review datasets for sentiment polarity, sentiment scale and subjectivity. If you use any of these datasets, make sure to cite the corresponding version of the dataset. More information and links of download can be found on

Keywords: Natural Language Processing, NLP, Movie Review, Sentiment Classification, Sentiment Polarity, Subject Rating

Large Movie Review

Large Movie Review:

This dataset contains a huge amount of movie review and can be applied for the task of sentiment classification. The training data contains 25,000 highly polar movie reviews and the testing data has 25,000 data samples. The dataset can be downloaded from Please refer to the README file in release which contains more details.

If you use this dataset, make sure to cite the paper:

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts, Learning Word Vectors for Sentiment Analysis, Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, 2011.

Keywords: Natural Language Processing, NLP, Movie Review, Sentiment Classification, Subject Rating