Decentralized Network Search

Introduction

When the "small-world" property is mentioned, typically the structural properties associated with it (i.e., small diameter/shortest-paths and large clustering) are first thought of. However, apart from the structural characteristics of small-world networks there is an important algorithmic property, namely, decentralized search. The structural properties inform us that small, short-paths exists, but how can we find them in a distributed manner? Stanley Milgram's experiment after all showed us that people are good at finding these short paths.

While there have been several models and algorithms that have been proposed for decentralized search, I will focus on the Kleinberg's model. This model is simple and considers a (social) network embedded in a Euclidean space. The main idea is that in order for the network to be efficiently searchable through a decentralized algorithm, the probability of forming a link $\{u,v\}$ between two users $u$ and $v$, decays like the $d^{th}$ power of their distance $l(u,v)$, i.e., $Pr[\{u,v\}]\propto \dfrac{1}{l(u,v)^d}$, where $d$ is the dimensionality of the space. This property assures that the a node is equally as likely to form links at all distance scales.

Simply put, nodes need to have some long-range contacts that will allow them to make "large" (geographic) hops towards the target! Here I ought to examine whether digital social networks exhibit this property. In particular, I will utilize mobility and social information for users in a digital social network and examine how the probability of friendship behaves as a function of distance (in reality I will just refer to studies that have done this for me!).

An Inconsistency

I used a dataset from Gowalla social network. The reason I picked Gowalla is because being a location-based social network it also offers information about where users go and hence one can estimate the "home location" of a user. Cho et al. have identified that the probability of friendship in this network decays with distance with an exponent equal to 0.83! This means that contrary to what we know from Milgram's experiment, this social network appears to not be searchable (based on Kleinberg's model - and other extensions that have appeared in the literature - this would require an exponent of 2 as the Earth's surface is two-dimensional). Similar results have been reported for other social networks. For example, Liben-Nowell et al. have reported similar results on a dataset from LiveJournal, where the exponent is approximately 1.2. In this work, it was further shown that a simple greedy algorithm, similar to Kleinberg's proposal, could identify the short paths in a decentralized fashion. So what is going on? Kleinberg has proven that if $d$ is different than the dimensionality of the space then the time/hops required to reach the destination will be exponential. Given that in the real data the calculated exponent is significantly different than 2, how is it possible to still have searchable networks?

A Tweak

The above inconsistency appears since the "right" value of $d$ is 2. Why should it be 2 though? Even though the Earth's surface has a topological dimension of 2, the plane is clearly not filled in a uniform way from cities and people. Led by this observation, I wanted to examine another type of dimensionality, namely, fractal dimension. In contrast to topological dimensions, the fractal dimensions can take non-integer values allowing us to describe in greater detail the space of interest. While there are many different ways to define the fractal dimension of an object, I will use the fractal correlation dimension $D_2$ on a cloud of points $\mathcal{S}$. In particular, with $C(r)$ being the fraction of points from $\mathcal{S}$ that have distance smaller or equal to $r$, the intrinsic fractal dimension $D_2$ of the point set in the range of scales $r_1$ to $r_2$ is given by:
$C(r) \propto r^{D_2}~~~~r_1\le r \le r_2$

In fact, a pointset that is uniformly distributed in the unit square has intrinsic dimension $D_2 = 2$ (i.e., equal to the topological dimension), for the range of scales $[r_{min},1]$, where $r_{min}$ is the smallest distance among the pairs of the set.

In our case, I computed the home-location of every Gowalla user using an iterative density clustering on each user's check-ins (this is an interesting approach that will be the topic of another post!). Then I created 100 random sample sets of approximately 1,000 points each and computed the intrinsict fractal dimension on each set. Figure 2 presents the distribution of $D_2$ over the sets. As we see the distribution of $D_2$ "covers" the value of the power-exponent for the friendship probability with distance for Gowalla (red vertical line). In fact we perform the following statistical test:

H_0: D_2 = d

H_1: D_2 \neq d

Without making any assumptions for the distribution of $D_2$, we compute the 2.5% and 97.5% percentiles of $D_2$ from our data. This gives 0.8279 and 0.9245 respectively. Hence, the interval $[0.8279, 0.9245]$ covers 95% of the values of $D_2$ and includes the value of $d$. Hence, we cannot reject the null hypothesis that $D_2 = d$ at the 5% significance level. Hence, we can informally say that the power exponent $d$ of the friendship probability with distance is equal to the fractal correlation dimension $D_2$ of the users' home locations. While further examination is certainly required, this serves as preliminary evidence that the fractal dimension of the underlying space is the one that drives the distributed network search.

Comments

If you have any comments/thoughts (or simply want access to the data I used) please e-mail (kpele A-T pitt.edu) me or and I will post it here with the appropriate credits!