User:David C. Thompson/Projects/Networks

I was introduced to the science of social network analysis during an informal seminar in 2003. The presenter, Dr. Jonathon Doye, was discussing some work he had published the previous year wherein he had applied the techniques of network analysis to the problem of model energy landscapes [1]. He began by introducing the concept of a ‘small-world network’, and he did this through the parlor game ‘The Six Degrees of Kevin Bacon’. In this game a group of players aim to connect through film roles any actor in movie history to Kevin Bacon, in as short a number of links as possible. The actor’s ‘Bacon number’ is then a count of the number of links between the actor and Mr. Bacon, and is found to be typically less than 7; that the number of links from any actor to Kevin Bacon is so small, is a stark illustration of the small-world effect. For instance, Anthony Hopkins has a Bacon number of 2: he starred in ‘The Mask of Zorro’ with Maury Chakin, who starred in ‘Where the Truth Lies’ with Kevin Bacon [2]. The concept is familiar to us all, and I am sure that the reader has at some point exclaimed “what a small world it is” when meeting someone new who it turns out also shares some of the same friends. This small-world property has been quantified and has been found to not be limited to social networks; surprisingly short distances between points in complex networks, both natural and man-made, have been observed [3, 4].

Almost concurrent to the progress of this large body of multi-disciplinary research has been the proliferation of ‘social networking’ sites on the Internet (itself a network of routers, computers, and domains which has been shown to have the small-world property). These sites exist in a bewildering array and cater to varied tastes and groups with some of the most popular in the United States (U.S.) aimed at college/high school students (Facebook, MySpace) and professionals (LinkedIn, Plaxo). Popular social networks, as measured by the number of unique users, are ‘sticky’ in that a given user will spend longer on average at that site and thus become a prime target for directed advertising and marketing; the acquisition of MySpace by News Corp. for $580 million dollars in 2006 is an example of how seriously this is being taken in the business community, as is recent disclosure of the entrance of Yahoo and Google to an already crowded space [5].

Initially, I was at a loss to explain how, or even why, such services were important either to me or to the population in general; indeed, such services seemed like time sinks where I might engage in rather more ‘not working’ than networking. But, as I began to piece together the historical sociological research, and how this was quantified by some of the more formal aspects of these networks, I began to understand their utility and appreciate how these networks could be used to great effect; in this article I will sketch my arrival at this rationalization. I will begin with a discussion of the historical research from the field of sociology, followed by a brief overview of the formal properties of some of these networks. I will then discuss how these observations apply to social networking websites in general, and how I think this adds value to the internet and consumers as a whole. This work is opinion; it is not previously published, and is referenced where appropriate.

In a short story entitled Láncszemek (Chains) published in 1929, the Hungarian author Frigyes Karinthy introduced the notion of a shrinking world, which he saw arising as a direct result of technological advances in both communication and travel. Through this observation the author posited that any person chosen at random from amongst the populace as a whole, at that time amounting to roughly 1.5 billion individuals, could be connected to any other person, through no more than five bridging people. It was not until 1967 that this hypothesis was tested by a researcher in experimental social psychology at Harvard University. Stanley Milgram was fascinated by the inter-connectedness of individuals and of measures of their ‘social capital’, and through the following experiment, he sought to investigate this by determining the path length between two randomly selected people (while this bears striking parallels to the writings of Karinthy, it is unclear whether he had any impact on the evolution of Milgram’s ideas on this subject). The experiment Milgram constructed consisted of selected individuals in the U.S. cities of Omaha and Wichita, being asked to deliver packets of information to a target individual in Boston, for which the only known identifier was name. The packets contained a roster and each person who handled a packet on its journey recorded their involvement; should the packet successfully arrive at its destination, a count of the names on the roster gave the researchers their ‘path length’. Analysis of the 64 unbroken chains, those which successfully carried the information packet from start to finish, showed that the average length consisted of 5.5 (6) individuals to place the letter in the hands of the target person [6]. This is a remarkable finding, and is serendipitously close to Karinthy’s original estimate. The result was popularized by John Guare in his play ‘Six Degrees of Separation’ in 1990, and led to this phrase entering the everyday lexicon (and in turn gave rise to the aforementioned game ‘The Six Degrees of Kevin Bacon’). What properties do social networks possess that enables this small-world phenomenon to persist? The answer to this question came just four years after Milgram’s work and was delivered in an article which many consider one of the most important contributions to the field of sociology. In his paper, ‘The Strength of Weak Ties’, Mark Granovetter assigned ‘strengths’ to the interactions between individuals in a social network [7]. For example, familial links, or close friends, might be considered ‘strong’ ties, whilst acquaintances, or friends-of-friends, might be considered ‘weak’ ties. Dr. Granovetter demonstrated that it is the weak ties which actually support the network as a whole, and give rise to its ‘small-worldliness’. For instance, it is likely that two of my close friends, are friends, and there is thus one link between one of us, and any other one of us. Should a connection to one of my friends be removed from the network, the maximum distance from one remaining friend, to the other is now two – a small increase in absolute path length. Such a strong connection, or rather its removal, will not harm the overall integrity of the network or unduly affect its properties. However, the person you swapped notes with in philosophy class and who now lives in Belgium is the quintessential ‘weak tie’ and turns out to be a key social bridge. An occasional email, or visit, is sufficient to maintain this connection, but should this tie be broken you may never hear from each other again. It is important to note that this is not simply the loss of a connection to one other person; it is the loss of a link to a whole different community of individuals in a remote, geographically disparate location. Loss of such a weak connection would result in a distinct loss in connectivity and would dramatically alter the small-world character of the social network. Dr. Granovetter’s observation of the importance of the relative strength of interactions within a network enables us to understand a number of key features of the societal constructs we have been examining here. Firstly, the average path length across such a network is small; we have seen that this is due to the weak ties that bind disparate regions of the network to each other. Secondly, the strong ties (familial, close friends, co-workers etc.) form clusters which are tightly bound. Having identified properties we know these networks have, is it possible for us to construct a model which captures this behavior? Surprisingly, for what initially appears a rather simple problem, this question was not answered until the pioneering work of Watts and Strogatz in 1998 [3].

A social network as we have described it is simply a set of individuals (nodes) connected via ties (edges), and the natural language to describe such an object is the mathematical formalism of graph theory. In 1959 Paul Erdős (a prodigious Hungarian mathematician who, like Kevin Bacon, has his own number for describing the network of his collaborators, itself a small-world network [8]) examined a problem which can be cast in graph theoretic form, but which has remarkably practical applications. Imagine that you have been given the thankless task of designing the placement of roads to connect towns in an impoverished country. Firstly, given that resources are expensive you are constrained in the number of roads you can construct. Secondly, you have some experience with the construction capabilities of the municipal agencies and while you may ask specifically for two towns to be connected by a roadway, the agencies may simply ignore you and connect two random towns. With unlimited resources, roads between all towns would ensure complete coverage, however this is unfeasible given your limited budget, so how then to maximize the coverage between any two towns? Erdős, along with collaborator Alfréd Rényi, showed that for a country consisting of 50 towns one simply needs to build 98 random roads between any two to ensure that the great majority are linked, and while this seems like a lot, it represents a mere 8 percent of the total number of all connections between the 50 towns (1,225) [9]. What we have described is termed a random graph and Erdős and Rényi were able to show that, for a general graph, the larger the number of nodes, the smaller the percentage of links required to tie the network together in an almost completely connected fashion. Does this mean then, that our social network is simply a random graph, composed of haphazard links between individuals? The answer, of course, is no. The problem with such a random graph model of a social network is that while it does have the small-world property, it lacks the clustering which results from individuals having strong ties to family members and other close persons. What then if we built the notion of community into the description of network, and considered ‘nearest neighbor’ connections, i.e. what if we assume that people tend to know others who live near them, and that those in close proximity share common acquaintances and friends? Now the strong ties are well represented, but does such a description preserve the small-world property? If we connected groups of 50 individuals who were in close proximity to each other, and continued in this fashion until everyone on the planet was connected, it would take upwards of 10 million links to pass from one person, to someone geographically removed. The small-world property is lost, and this is clearly not a relevant model for a realistic social network, even though the clustering required to represent the strong ties is present. Something intermediate between the completely random and fully ordered descriptions is necessary to fully describe the rich complexity of a real social network.

In 1998 Watts and Strogatz proposed what would become the archetypal model for such a social construct [3]. The work itself was motivated by a desire to model the collective dynamics of complex systems such as the chirping of crickets, or the behavior of fireflies and the resulting derived description is intuitively simple. Moreover, it formalizes the observations of sociologists Granovetter and Milgram and has offered insight into myriad, seemingly disjoint, fields. As a starting point the authors considered a fully ordered circular network – to be imagined as a closed string of pearls where each ‘pearl’ represents an individual/node of the system. The authors then began by allowing each node to connect to just a few of its nearest neighbors which, as we saw above, results in a high mean path-length between any two points (not a small-world network). The authors then allowed for a small number of connections between random nodes and found that the mean path-length fell rapidly, without unduly changing the degree of clustering. This is precisely the behavior observed in a real social network, and manages to effortlessly capture the findings of both Milgram and Granovetter. This model has been applied to myriad fields in all branches of science, and the interested reader is referred to references [4] and [10] and references therein. The rationalization provided by Watts and Strogatz emphasizes the importance of the ‘random’ connections between nodes. These weak ties, in Mark Granovetter’s parlance, provide the mechanism for the small-world property to propagate throughout these networks. Social networking sites on the internet derive a lot of their current value from their ability to facilitate interactions between weakly connected individuals. However, at present this is done in a somewhat passive way and an explicit inclusion of a measure of the strengths of connections amongst these networks might further enhance the utility of these services. There exists a whole gamut of social networking websites which have at their core the desire to connect people with similar interests or backgrounds [11]. Most of these websites allow a user to add varied amounts of information ranging from education, and employment history, to tastes in music and literature. Having constructed this user profile, one then proceeds to ‘friend’ individuals in an effort to grow their network, the process of which is typically achieved through using inbuilt search functionality with name, affiliation, or interest as a search criterion. This immediately establishes connections to two groups of friends: those we are in frequent contact with (strong ties; family, co-workers etc.), and those that we are in less frequent contact with (weak ties; the friend from philosophy class who now lives in Europe, old house mates etc.). This projection of a social network onto the format of a social networking website provides the first benefit derived from weak ties. For instance, for a number of social networking sites, changes to one’s information or status (be it alterations in employment information, new connections formed, relationship status etc.) are broadcast to all common contacts in a continuous fashion. We now, passively, receive information from both groups of individuals and, while it is likely that I know a close friend is looking for a position at a competitive hedge fund, I am unlikely to know, unless specifically told otherwise, that this is also true of a friend in Belgium. The structure of the network itself pushes informational changes throughout, removing a layer of inefficiency which exists in other ‘pull’ networks such as email. How then could we further utilize the properties of networks to improve upon this? In the scenario described above, both weak and strong connections have an equal weight. One could envisage computing a ‘strength of connection’ value for everyone in a network which could then be used to improve the efficiency of broadcast messages, employment requests, or targeted marketing (something akin to this has recently been implemented by the startup VisiblePath, whereby a user’s Microsoft Outlook mailbox is scanned, along with their calendar, to determine their ‘true network’ of contacts [12]. This company is operating in the business/professional networking sector, and is similar to sites such as LinkedIn or Plaxo). The fastest path to a seemingly disparate group of individuals is through a series of weak connections – precisely Stanley Milgram’s experimental observation; and ideas such as this would facilitate ways of automatically traversing these paths through any given network.

User derived content and participation are at the forefront of recent developments on the internet. This, the Web 2.0 movement, has already seen acquisitions of startup companies such as del.icio.us, flikr, reddit, MySpace and YouTube for sums reaching into the billions of dollars. Most Web 2.0 companies have an element of community about them, and oftentimes a valuation is not explicitly based on the merit, or utility, of the idea, but on the number of users a site has (although one would hope that the two would be closely correlated). As online communities become more prevalent, and advertising and marketing continue to drive revenues for user dominated services, the weak connections tenuously linking users together will only continue to grow in importance.