Blog IV - It's a RDBMS... It's a NoSQL... It's a Graph Database!

Module 3, the last module, we learned about networks. This also included network visualization and analysis. Networks are defined as "A collection of entities (vertices) and relationships (edges) among them." These structures are also commonly called Graphs. The really cool thing about these networks is that both the entities and relationships can hold separate data to help us learn. A social example might be a friend network from a popular platform like Facebook. Each of the friend entities might store the information from your profile such as name, date of birth, and interests. Then the "Friends With" relationship could hold information about the date the friend ship was made and whether it's a family relationship like father, sibling, aunt, etc.

When visualizing these networks there are several different layouts to choose from. They include Force Directed, Geographical, Circular, Clustering, and Hierarchical. Each of these serves it's own purpose and have their own benefits. For example Force Directed tends to minimize node collisions and is easy to read but if we're concerned with the sub-communities within the network we'd need to use Clustering. I shared this example of a network with our class and shows the relationships between different news outlets as they quote each other. This particular network uses clustering as the different colors represents different groups of outlets.

Doing network analysis is where the lectures got really muddy for me. There are many different metrics and measures we can generate for a network. Then these different numbers are used to try and get answers! The different metrics we covered are degree centrality, paths and shortest paths, betweeness centrality, closeness centrality, eigenvector centrality, reciprocity, density, clustering coefficient, average and longest distances, connected components, bridges, and cliques.

I think it's pretty amazing what we can do with networks. One of our required readings was an interview with Albert-laszloBarbasi and in there he talks about some of the different projects he works on. These involved trying to control networks, biological assistance (emphasizing on disease), and social systems and human mobility. The last one being really interesting as he talks about the data captured from cell phones and what we can learn. All in all this type of technology has a huge impact on business decisions. We no longer need to learn about the consumer from surveys when we have impartial data to learn from.

To keep with the theme I wanted to approach this topic from a data engineering perspective. So what I wanted to answer was "How do we store network data?" Previously I had heard of graph databases but had never really explored what they were. I read quite a few articles about the different types of graph databases but I found this one by Graham Cox to  be the best. In this Introduction to Graph Databases article Cox goes over an example of a social network and shows how they would be stored in a RDBMS (e.g. Oracle), Document Store (e.g. MongoDB), and Graph Database (e.g. Neo4J). Not only does he show an example of how they are stored but also how the query/code would look to retrieve the information. I will say though that Cipher language is pretty hard to look at and figure out for someone not familiar with anything similar to it.

Comments

  1. Thanks for explaining about the different layouts! That's one area I got lost. The Graham Cox's article looks interesting. I will definitely read about it.

    ReplyDelete
  2. Wow.. great topic! I was just thinking about how Graphs can be used in other industries. You do make a good point that stability and warehousing this data is also important. I wonder if Graph visualizations can be applied to concepts beyond social media.. Maybe healthcare to mine correlations between diseases ands outcomes.. Just a thought?

    ReplyDelete
  3. Amazing article! Thanks for sharing!

    ReplyDelete

Post a Comment

Popular Posts