December 10, 2013

Charting Social Network Analysis Tools

Working on different open data projects throughout the last years, I kept coming back to the idea that there should be an intermediate layer for analyzing and contextualizing data. Such a system would help researchers, journalists and other non-technical users to extract knowledge from a variety of sources. It would enable them to generate an overview of how the people, institutions and companies that are part of virtually any data from a political or economic context relate to each other.

The Hacks/Hackers Media Party in Buenos Aires this year was an event designed to bring together developers and reporters and to discuss what technology can do to enable better journalism. After a session on network analysis where we explored the many existing and emerging tools and platforms I started an ongoing collection of initiatives related to this topic.

The survey focuses on a short description of the projects, distinguishing between a few basic types of efforts (libraries and tools, web platforms, desktop software or data standards), and linking out to the project pages and any available documentation. Since I'm interested in reusing these tools, I also introduced two columns to mark up which tools were open source, and which content (i.e. The network data itself) was licensed openly.

What fascinates me most about this survey is that it reflects the range of objectives and methods that people have started to explore within the field of network analysis. From visualization tools such as sigma.js and Immersion to data-mining projects like Mapa76 and from commercial apps such as ConnectedChina to open standards such as the Popolo Project and PoderVocabulary, people are experimenting wildly. Some of the tools listed have little in common with each other, while others are nearly exact copies of each other, written in a different programming language or using different database backends.

Based on this survey, it seems to me that there are a few efforts that could benefit many of the approaches that people seem to be working on:

  • Converge on APIs for web tools. For web-based platforms, there are many reasons why each power mapping effort would want to have a bespoke interface to represent its data in an appropriate form. Yet many of these efforts could easily share a common API and back-end. The fact that this hasn't happened yet, even though many of the initiatives are clearly aware of each other, is most likely owed to the lack of maturity in existing projects.

  • Develop clearer interaction models. For most of the content-centric efforts in the list, it's not really clear what one would do with them. While sites like Poderopedia and QuienManda allow users to contribute further information about relationships and connections, there is little explanation as to when and how one would get information out of them.

  • Link people and organizations across different databases. Being able to identify a single person across multiple databases would be a useful way to support investigations and to work towards further forms of interoperability in the future. Unfortunately, few of the people and companies in such databases meet the relevancy criteria of Wikipedia (and dbPedia), or appear in other authority data.

  • Package platforms for news organizations. While embedded as an OpenNews fellow at Spiegel Online, I tried to set up a network mapping platform for the organization's fact-checking team. Unfortunately, doing this is no simple task, since most packages have a variety of hard-coded settings and do not lend themselves to easy redeployment. Given the sensitive nature of some of the information that may go into such a system, Software-as-a-Service doesn't seem like an appropriate solution to me.

I'm sure that some of the tools in the survey will make big steps towards these challenges in the next year, so this is a fantastic time to play with network analysis tools. As part of this, I'm hoping to keep the project listing up to date, so please feel free to add initiatives directly into the Spreadsheet, and email the Knight Lab or me with any suggestions!