Recent Reserch Projects

Memory-based Anomaly Detection via A Deep Learning Approach

Given a time-varying graph, how can we quickly detect where (nodes, edges) and when (time steps) anomalies occur? To resolve this problem, we consider network snapshots to be equivalent to images, in which way a streaming set of network observations can be taken as video data. Under such situation, we develop an unsupervised memory-based neural network that leverages LSTM Autoencoders and Hypersphere Learning to detect anomalies. Our method not only considers temporal dependency but also take care of topological dependency. We conduct experiments in both synthetic dataset and a variety of real-word datasets, including urban transportation graphs, textual-based graphs and surveillance video.

Event Sequence Clustering and Forecasting

Event sequences are being increasing generated in a variety domains, such as e-commerce, health informatics and social networks. In terms of the mining of such asynchronous datasets, prior works particularly focus on modeling and predicting event sequences. However, we argue that event sequence clustering is also a critical problem as some sub-patterns might be dominant in sub-populations. For example, in health informatics patients might exhibit diverse disease progression patterns; in e-commerce area, customers might show a variety of commercial bebaviors. In this project, our goal is, by clustering event sequence into subgroups, to obtain a better understanding of the sub-patterns of each group, and to improve prediction performance based on such clustering results.

Anomaly Detection via Multi-view Time-Series Learning

The problem of anomaly detection in dynamic networks has attracted much attention in a broad range of domains, such as transportation, communication, financial systems, and social networks. Examples include detection of civil unrest using social media data, identification of crowd activities or emergencies in cities and discovery of network intrusion or network failures. Particularly with the increasing adoption of ubiquitous sensors and social mobile technologies, it becomes possible to continuously collect datasets from multiple data sources in real time. The continuously-gathered data allows us to understand the temporal regularities and irregularities of a dynamic system. We propose a novel approach that leverages multi-view learning and support vector description to tackle this problem. Extensive experiments are conducted on both synthetic and real datasets. Results demonstrate that our method outperforms the state-of-the-art baseline methods in detecting three types of events that involve (i) time-varying features alone, (ii) time-aggregated features alone, as well as (iii) both features. Moreover, our approach exhibits consistent and good performance in face of issues including noises, anomaly pollution in training phase and data imbalance.

Superspreaders in Information Cascading with First-order Transitions

In social networks, the collective behavior of large populations can be shaped by a small set of influencers through a cascading process induced by "peer pressure". For large-scale networks, efficient identification of multiple influential spreaders with a linear algorithm in threshold models that exhibit a first-order transition still remains a challenging task. Here we address this issue by exploring the collective influence in general threshold models of behavior cascading. Our analysis reveals that the importance of spreaders is fixed by the subcritical paths along which cascades propagate: the number of subcritical paths attached to each spreader determines its contribution to global cascades. The concept of subcritical path allows us to introduce a linearly scalable algorithm for massively large-scale networks. Results in both synthetic random graphs and real networks show that the proposed method is more effective in influence maximization compared with other linearly scalable heuristic approaches.

Collective Influence and Optimal Percolation

In a network, some nodes are more important than others. Although much smaller than the whole network size, their activation can cause large scale information diffusion, and meanwhile, their elimination can induce network collapse. Recently, Morone & Makse have mapped this problem of localizing the minimal set of influential nodes to optimal percolation, and proposed a novel algorithm called Collective Influence (CI algorithm). Our work is to verify CI's efficiency from the perspective of real-world data analysis. We collected large scale information-diffusion data from social platforms and scientific publications, such as twitter, LiveJournal, facebook and APS journals. By reconstructing the underlying topological structures and upper spreading dynamics, we try to validate CI's effectiveness by comparing it with several other heuristic strategies, such as PageRank, adaptive High-degree, K-shell and etc.

Former Research Projects

Identification of Highly-susceptible Individuals in Spreading Dynamics

Identifying highly susceptible individuals in spreading processes is of great significance in controlling outbreaks. We explore the susceptibility of people in susceptible-infectious-recovered (SIR) and rumor spreading dynamics. We find the susceptibility of individuals is sensitive to the choice of spreading dynamics. For SIR spreading, since the susceptibility is highly correlated to nodes' influence, the topological indicator k-shell can better identify highly susceptible individuals. In contrast, in rumor spreading model, where nodes' susceptibility and influence have no clear correlation, degree performs the best among considered topological measures. Our finding highlights the significance of both topological features and spreading mechanisms in identifying highly susceptible population.

Alpha Magnetic Spectrometer 02 Experiment (AMS-02)

I once participated in the experiment at European Organisation for Nuclear Research (CERN) in Geneva in 2012. The Alpha Magnetic Spectrometer (AMS-02) is a state-of-the-art particle physics detector designed to operate as an external module on the International Space Station. It will use the unique environment of space to study the universe and its origin by searching for antimatter, dark matter while performing precision measurements of cosmic rays composition and flux. One key task is to construct an analysis framework for particle discrimination (distinguish positrons against protons with relatively low deviations) using methods from computer science and statistics. We focus on this problem and develop computing algorithms for data analysis and data visualizations using a framework called ROOT developed at CERN under LINUX system.