Social Action Tracking and Influence Analysis in Online Social Network

Social Action Tracking via Noise Tolerant Time-varying Factor Graphs (KDD'10)

Chenhao Tan, Jie Tang,Jimeng SunQuan Lin and Fengjiao Wang 

Introduction

Our work focuses on Social Action Tracking. We build a noise tolerant time varying factor graph model for dynamic social action modeling. This page is for the description of the datasets and code. This work has been accepted in SigKDD 2010. If you use data sets provided or refer to the model, please cite:

@INPROCEEDINGS{Tan:10KDD,
AUTHOR = "Chenhao Tan and Jie Tang and Jimeng Sun and Quan Lin and Fengjiao Wang",
TITLE = "Social Action Tracking via Noise Tolerant Time-varying Factor Graphs",
YEAR = {2010},
BOOKTITLE = "KDD'10",
PAGES = "807--816",
%BOOKTITLE = "Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'10)",
}

The full paper can be downloaded here[pdf]. And this is the presentation. [ppt]

Motivation

To clearly motivate our work, we perform an analysis on three real social networks: Twitter, Flickr, and Arnetminer.

On Twitter, the action is defined as whether a user discusses the topic “Haiti Earthquake” on his microblogs (tweets). On Flickr, the action is defined as whether a user adds a photo to his favorite list. On Arnetminer, the action is defined as whether a researcher publishes a paper on a specific conference (or journal).

The analysis includes three aspects:

(1) social influence;

(2) time-dependency of users’ actions;

(3) action correlation between users.

Figure 1: Social Influence

Figure 1 shows the effect of social influence. We see that with the percentage of one’s friends performing an action increasing, the likelihood that the user also performs the action is increased. For example, when the percentage of one’s friends discussing “Haiti Earthquake” on their tweets increases the likelihood that the user herself posts tweets about “Haiti Earthquake” is also increased significantly. 

 

Figure 2: Time-dependency

Figure 2 illustrates how a user’s action is dependent on his historic behaviors. It can be seen that a strong time-dependency exists for users’ actions. For instance, on Twitter, averagely users who posted tweets about “Haiti Earthquake” will have a much higher probability (+20-40%) to post tweets on this topic than those who never discussed this topic on their blogs.

 

Figure 3: Correlation

Figure 3 shows the correlation between users’ actions at the same timestamp. An interesting phenomenon is that friends may perform an action at the same time. E.g., on Twitter, two friends have a higher probability (+19.6%) to discuss “Haiti Earthquake” than two users randomly chosen from the network.

Model

To modeling and tacking the social influence and users’ actions, we propose a Noise-Tolerant Time-varying Factor Graph Model (NTT-FGM), which is based on three intuitions:

1 Users’ actions at time t are influenced by their friends’ historic actions (time < t).

2 Users’ actions at time t are usually dependent on their previous actions.

3 Users’ actions at a same time t have a (strong) correlation.

 

Figure 4: Graph Representation of Model

Moreover, the discrete variable yti only models the user’s action at a coarse level, but cannot describes the intention degree of the user to perform an action. Directly modeling the social actions Y would inevitably introduce noise to the model. A continuous variable for modeling the action bias is favorable. Thus the concept of latent action state is presented: 

Latent action state: For each user’s action yti , we define a (continuous) latent state zti ∈ [0, 1], which corresponds to a combination of the observed action yti  and a possible bias, to describe the actual intention degree of the user to perform the action.

Figure 4 shows the graphical structure of the NTT-FGM model. An action of user vi at time t, i.e., yti is modeled by using a (continuous) latent action state zti , which is dependent on friends’ historic actions zt-1∼vi (where ∼ vi represents friends of user vi in the network), users’ action correlation zt∼vi , and users’ attributes xti. Specifically, in the NTT-FGM model, each discrete action is mapped into the latent state space and the action bias is modeled using a factor function. For example, for yti  = 1, a small value of its corresponding zti suggests that a user vi has a low intention to perform the action, thus a large action bias |zti −yti |. Next, influence between users is modeled using the latent states based on the same assumption: latent states of users’ actions at time t are conditionally independent of all the previous states given the latent states at time t − 1. Finally, actions’ correlation is also modeled in the latent state space. A Markov random field is defined to model the dependency (correlation) among the continuous latent states. 

Thus, given a series of attribute augmented networks G = {Gt = (Vt,Et,Xt, Yt)}, t  {1, · · · , T } and V = V1 \cup V2  \cup. .. \cup VT , |V | = N, the joint distribution over the actions Y given G can be written as

p(Y|G) =\prod^{T}_{t=1} \prod^{N}_{i=1}f(yti |zti )f(zti |zt−1∼vi )f(zti |zt∼vi , xti

where notation ∼ vi represents neighbors of vi in the social network.

The joint probability has three types of factor functions:

  • Action bias factor: f(yti |zti) represents the posterior probability of user vi’s action yi at time t given the continuous latent state zti ;
  • Influence factor: f(zti |zt−1∼vi ) reflects friends’ influence on user vi’s action at time t;
  • Correlation factor: f(zti |zt∼vi , xti denotes the correlation between users’ action

The three factors can be instantiated in different ways, reflecting the prior knowledge for different applications. Finally, all the three factor function are defined by quadratic functions due to two reasons: the quadratic function is integrable and it offer the possibility to design an exact solution to solve the objective function (joint probability).

The model is learned using an EM-style algorithm and for scale up to large-scale data sets, a distributed learning algorithm has been designed based on the MPI (Message Passing Interface).

Experiment Results

The following charts and diagrams are some interesting results we get in the experiment.

We show the contribution of different factors by comparing the performance ignoring different factors. The result is as Figure 5 shows.

Figure 5: Contribution of different factors

We also compare the latent state and actual state as Figure 6 shows. 

Figure 6: Latent State

Our result can be also used to analyze the correlation between users as Figure 7 shows.

Figure 7: Correlation Analysis

 

Datasets

Datasets we used in the experiment of our paper "Social Action Tracking via Noise Tolerant Time-varying Factor Graphs" for KDD 2010 is publicly available.

Our data sets include three genres of real-world data sets: Twitter (a microblogging data set crawled

from twitter.com), Flickr4 (a data set of photo sharing from flickr.com), and Arnetminer (a data set from the academic search system, arnetminer.org). 

 

The Flickr data set is the data set related with the http://socialnetworks.mpi-sws.org/. You can mail to the original author to get the coarse data. You can get the subset for performance comparison here.

 

The Arnetminer data set is available by using the open service of arnetminer. And the subset for performance comparison can also be downloaded here.

 

The Twitter data set is crawled from a typical user related with Haiti Earthquake, calpedro. And the whole data set can be downloaded here. You can get the subset for performance comparison here.


Also, we synthesized some data to compare the efficiency of our distributed algorithm in different kinds of network. It is available here.

 

 

Code

We will publish the code later. The binary version is here.

 

We are still updating it. Last updated (Jun 26, 2010)