Building a Grateful Dead Original Song Co-Writing Network in R Using the igraph Package
For my project, I am using the Grateful Dead song writers data set that I used in this series of posts from my Social & Political Networks course to examine the network features of the co-writers of original Grateful Dead songs.
The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.
There are 25 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 25 nodes in the dataset.
There are a total of 181 (updated and still under review!) unique songs played in the course of their touring career, and the various combinations of co-writing combinations are now represented in a binary affiliation matrix.
I have considered using various measures as network weights, but in the end they have all been attributes and not weights. Unless there is a new metric that rises to the forefront of my analysis, this network will continue to begin analysis as an un-weighted, bipartite matrix.
Loading the dataset and creating the network to begin my analysis:
Inspecting the first 8 columns of the data structure in the affiliation matrix format:
dim(gd_matrix)
[1] 25 181
gd_matrix[1:10, 1:4]
Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen 0 0 0 0
John Barlow 0 0 0 0
Bob Bralove 0 0 0 0
Andrew Charles 0 0 0 0
John Dawson 0 0 0 0
Willie Dixon 0 0 0 0
Jerry Garcia 1 1 0 1
Donna Godchaux 0 0 0 0
Keith Godchaux 0 0 0 0
Gerrit Graham 0 0 0 0
Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.
I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter (78) and between John Barlow and Bob Weir (21).
[1] 25 25
gd_projection[1:10, 1:4]
Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen 1 0 0 0
John Barlow 0 26 1 0
Bob Bralove 0 1 3 0
Andrew Charles 0 0 0 1
John Dawson 0 0 0 0
Willie Dixon 0 0 0 0
Jerry Garcia 0 0 0 0
Donna Godchaux 0 0 0 0
Keith Godchaux 0 0 0 0
Gerrit Graham 0 0 0 0
gd_projection["Jerry Garcia", "Robert Hunter"]
[1] 78
gd_projection["John Barlow", "Bob Weir"]
[1] 21
Converting network data into igraph object using the “graph_from_incidence_matrix()” function gave me all songwriters and songs as total vertices. Graphing after the bipartite projection allowed a more accurate network.
set.seed(11)
gd_igraph_from_im <- graph_from_incidence_matrix(gd_affiliation, directed = FALSE)
Converting network data into igraph object using the “graph_from_incidence_matrix()” function gave me all songwriters and songs as total vertices. Graphing after the bipartite projection allowed a more accurate network.
set.seed(11)
gd_igraph_adj <- graph.adjacency(gd_projection,mode="undirected")
Now to check the vertices in the graph I’ve created to ensure they represent the data accurately, and confirm that all of the attributes have been represented properly (the graph is undirected, unweighted, and is bipartite):
head(V(gd_igraph_from_im)$name)
[1] "Eric Andersen" "John Barlow" "Bob Bralove"
[4] "Andrew Charles" "John Dawson" "Willie Dixon"
head(V(gd_igraph_from_im)$type)
[1] FALSE FALSE FALSE FALSE FALSE FALSE
is_directed(gd_igraph_from_im)
[1] FALSE
is_weighted(gd_igraph_from_im)
[1] FALSE
is_bipartite(gd_igraph_from_im)
[1] TRUE
igraph::vertex_attr_names(gd_igraph_from_im)
[1] "type" "name"
igraph::edge_attr_names(gd_igraph_from_im)
character(0)
Graphing directly from the incidence matrix gives a bipartite network, but when visualizing the network it is not clear if this is the way I want to represent this data.
plot(gd_igraph_from_im, layout=layout.bipartite)
head(V(gd_igraph_adj)$name)
[1] "Eric Andersen" "John Barlow" "Bob Bralove"
[4] "Andrew Charles" "John Dawson" "Willie Dixon"
head(V(gd_igraph_adj)$type)
NULL
is_directed(gd_igraph_adj)
[1] FALSE
is_weighted(gd_igraph_adj)
[1] FALSE
is_bipartite(gd_igraph_adj)
[1] FALSE
igraph::vertex_attr_names(gd_igraph_adj)
[1] "name"
igraph::edge_attr_names(gd_igraph_adj)
character(0)
Graphing from the bipartite projection and using the graph from adjacency matrix function, I have a network of the songwriters only, but I need to do more digging to see if this is the best way to represent the data.
plot(gd_igraph_adj)
Knowing this network has 25 vertices, I want to see if the triad census is working correctly by comparing the following data:
#possible triads in network
25*24*23/6
[1] 2300
sum(igraph::triad.census(gd_igraph_from_im))
[1] 1435820
The igraph created from the incidence matrix is NOT representing the triad census properly.
#possible triads in network
25*24*23/6
[1] 2300
sum(igraph::triad.census(gd_igraph_adj))
[1] 2300
The igraph network created from the adjacency matrix is representing the triad census is working the way it should be.
Looking next at the global v. average local transitivity of the network:
#get global clustering cofficient: igraph
transitivity(gd_igraph_from_im, type="global")
[1] 0
#get average local clustering coefficient: igraph
transitivity(gd_igraph_from_im, type="average")
[1] 0
This is another good sign that the correct choice is to graph from the adjacency matrix throug the bipartite projection going forward.
#get global clustering cofficient: igraph
transitivity(gd_igraph_adj, type="global")
[1] 0.5240964
#get average local clustering coefficient: igraph
transitivity(gd_igraph_adj, type="average")
[1] 0.7755587
This transitivity tells me that the average network transitivity is significantly higher than the global transitivity, indicating, from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.
Looking at the geodesic distance tells me that on average, the path length is just over 2.
average.path.length(gd_igraph_adj,directed=F)
[1] 2.01
Getting a look at the components of the network shows that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.
names(igraph::components(gd_igraph_adj))
[1] "membership" "csize" "no"
igraph::components(gd_igraph_adj)$no
[1] 1
igraph::components(gd_igraph_adj)$csize
[1] 25
This is a great start - now I can get to looking at the network density, centrality, and centralization.
The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. Since I’m using igraph, I know that its’ default output assumes that loops are not included but does not remove them, which can be corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. This gives me confidence that my network density is closer to 2.26.
graph.density(gd_igraph_adj)
[1] 2.453333
graph.density(gd_igraph_adj, loops=TRUE)
[1] 2.264615
The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). I will also begin creating a dataframe for easier review going forward.
It is not surprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner and key collaborator, Robert Hunter, would have a higher degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life. Specifically, I’m surprised that Pigpen has a higher degree than John Barlow, who was the key collaborator and writing partner of Bob Weir. This tells me that the proportion of songs that Weir wrote outside of the songs written with Barlow were much more dramatic than the proportion of songs Garcia wrote outside the songs written with Hunter.
The original lineup of Jerry Garcia, Bob Weir, Phil Lesh, Bill Kreutzmann, and Pigpen as well as Robert Hunter’s presence in the formative years of the band’s most collaborative era, means that this degree ranking makes sense intuitively.
gd_ig_nodes<-data.frame(name=V(gd_igraph_adj)$name, degree=igraph::degree(gd_igraph_adj, loops = FALSE))
gd_ig_nodes$normalized <-igraph::degree(gd_igraph_adj, loops = FALSE, normalized = TRUE)
gd_ig_nodes%>%
arrange(desc(degree))
name degree normalized
Jerry Garcia Jerry Garcia 142 5.91666667
Robert Hunter Robert Hunter 117 4.87500000
Bob Weir Bob Weir 105 4.37500000
Phil Lesh Phil Lesh 81 3.37500000
Bill Kreutzmann Bill Kreutzmann 62 2.58333333
Pigpen Pigpen 51 2.12500000
John Barlow John Barlow 29 1.20833333
Mickey Hart Mickey Hart 18 0.75000000
Brent Mydland Brent Mydland 11 0.45833333
Keith Godchaux Keith Godchaux 10 0.41666667
Bob Bralove Bob Bralove 8 0.33333333
Vince Welnick Vince Welnick 7 0.29166667
Donna Godchaux Donna Godchaux 6 0.25000000
Rob Wasserman Rob Wasserman 6 0.25000000
Dave Parker Dave Parker 5 0.20833333
Robert Petersen Robert Petersen 5 0.20833333
John Dawson John Dawson 2 0.08333333
Willie Dixon Willie Dixon 2 0.08333333
Frank Guida Frank Guida 2 0.08333333
Joe Royster Joe Royster 2 0.08333333
Eric Andersen Eric Andersen 1 0.04166667
Andrew Charles Andrew Charles 1 0.04166667
Gerrit Graham Gerrit Graham 1 0.04166667
Ned Lagin Ned Lagin 1 0.04166667
Peter Monk Peter Monk 1 0.04166667
A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data.
summary(gd_ig_nodes)
name degree normalized
Length:25 Min. : 1.00 Min. :0.04167
Class :character 1st Qu.: 2.00 1st Qu.:0.08333
Mode :character Median : 6.00 Median :0.25000
Mean : 27.04 Mean :1.12667
3rd Qu.: 29.00 3rd Qu.:1.20833
Max. :142.00 Max. :5.91667
Now I want to take a step back and try to visually represent this data better.
#ggraph(gd_igraph_adj, layout = "fr") +
#geom_edge_link() +
#geom_node_point(aes(color = factor(color))) +
#geom_node_text(aes(label = name), repel = TRUE) +
#theme_void() +
#theme(legend.position = "none")
That is starting to look more meaningful!
# Set size to degree centrality
#V(gd_igraph_adj)$size = degree=igraph::degree(gd_igraph_adj)
#Additional customisation for better legibility
#ggraph(gd_igraph_adj, layout = "fr") +
#geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) +
#geom_node_point(aes(size = size, color = factor(color))) +
#geom_node_text(aes(label = name, size = size), repel = TRUE) +
#theme_void() +
#theme(legend.position = "none")
To examine additional centrality and power scores of the nodes, I added to the data frame with the centrality degree and normalized centrality. Now I am adding Bonacich power, rescaled Bonacich power, Eigenvector centrality scores and the breakdown of reflected and derived centrality scores. Additionally, I am adding the closeness, betweenness, and Burt centrality scores.
To calculate the reflected and derived centrality scores, I first run some operations on the adjacency matrix and keep in mind that these two scores make up the entire calculation of the Eigenvector centrality score.
#gd_adjacency <- as.matrix(as_adjacency_matrix(gd_igraph_adj))
#gd_adjacency_2 <- gd_adjacency %*% gd_adjacency
#calculate Bonacich power
#bp_ig1 <- power_centrality(gd_igraph_adj) #with a default index of "1"
#bp_ig2 <- power_centrality(gd_igraph_adj, rescale = TRUE) #rescaled so they sum to "1"
#calculate portion of reflected centrality
#gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
#gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)
#calculate derived centrality
#gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
#gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)
#calculate closeness centrality: igraph
#close <- igraph::closeness(gd_igraph_adj)
#calculate betweenness centrality: igraph
#between <- igraph::betweenness(gd_igraph_adj, directed=FALSE)
#calculate Burt's network constraint
#constraint <- constraint(gd_igraph_adj)
#add these values to the data frame I started
#gd_ig_nodes$eigenvector <- eigen
#gd_ig_nodes$eigen_derived <- gd_derived
#gd_ig_nodes$eigen_reflective <- gd_reflective
#gd_ig_nodes$betweenness <- between
#gd_ig_nodes$closeness <- close
#gd_ig_nodes$bonacich <- bp_ig1
#gd_ig_nodes$bonacich_rescaled <- bp_ig2
#gd_ig_nodes$constraint <- constraint
#options(scipen = 999)
gd_ig_nodes<-read.csv("gd_ig_nodes.csv")
head(gd_ig_nodes[2:7])
name degree normalized eigenvector eigen_derived
1 Eric Andersen 1 0.04166667 0.04883644 0.9875776
2 John Barlow 29 1.20833333 0.07763512 0.7648126
3 Bob Bralove 8 0.33333333 0.12770600 0.9733796
4 Andrew Charles 1 0.04166667 0.04317463 0.9829060
5 John Dawson 2 0.08333333 0.07977694 0.9933775
6 Willie Dixon 2 0.08333333 0.06423801 0.9823529
eigen_reflective
1 0.012422360
2 0.235187424
3 0.026620370
4 0.017094017
5 0.006622517
6 0.017647059
I am also interested in the Eigenvector centrality scores - Both the top as well as the lowest value scores.
gd_ig_nodes%>%
arrange(desc(eigenvector))%>%
slice(1:5)
X name degree normalized eigenvector
1 Bob Weir Bob Weir 105 4.375000 0.4003206
2 Phil Lesh Phil Lesh 81 3.375000 0.3539098
3 Jerry Garcia Jerry Garcia 142 5.916667 0.3305871
4 Robert Hunter Robert Hunter 117 4.875000 0.3233581
5 Bill Kreutzmann Bill Kreutzmann 62 2.583333 0.3219131
eigen_derived eigen_reflective betweenness closeness bonacich
1 0.7826043 0.2173957 121.659478 0.03225806 -0.5325373
2 0.8536971 0.1463029 90.396640 0.02941176 -0.1771572
3 0.6538547 0.3461453 16.584364 0.02631579 -0.2501870
4 0.6286725 0.3713275 24.106816 0.02702703 -0.1701447
5 0.8947211 0.1052789 3.132042 0.02564103 -0.6875388
bonacich_rescaled constraint
1 -0.10060043 0.3367355
2 -0.03346637 0.4521996
3 -0.04726227 0.5061908
4 -0.03214166 0.6332636
5 -0.12988143 0.5159787
Bob Weir having top overall Eigenvector centrality makes sense - he is a core founding member and a prolific songwriter. He did not write as many songs as Jerry Garcia or Robert Hunter, but he wrote with a larger variety of people.
The top derived Eigenvector centrality scores are songwriters who contributed to songs but were outside the band member or key collaborator circle, which again makes sense.
Robert Hunter having the top reflective Eigenvector centrality score is not a shock - he has long held the unofficial title of band member and as the person behind the songwriting magic of the Grateful Dead. His primary songwriting partner was Jerry Garcia, but he also wrote songs with the early, full band and later with almost all of the individual members of the band.
The closeness centrality of a node is defined as the sum of the geodesic distances between that node and all other nodes in a network.
gd_ig_nodes%>%
arrange(desc(closeness))%>%
slice(1:5)
X name degree normalized eigenvector
1 Bob Weir Bob Weir 105 4.375000 0.4003206
2 Phil Lesh Phil Lesh 81 3.375000 0.3539098
3 Robert Hunter Robert Hunter 117 4.875000 0.3233581
4 Jerry Garcia Jerry Garcia 142 5.916667 0.3305871
5 Bill Kreutzmann Bill Kreutzmann 62 2.583333 0.3219131
eigen_derived eigen_reflective betweenness closeness bonacich
1 0.7826043 0.2173957 121.659478 0.03225806 -0.5325373
2 0.8536971 0.1463029 90.396640 0.02941176 -0.1771572
3 0.6286725 0.3713275 24.106816 0.02702703 -0.1701447
4 0.6538547 0.3461453 16.584364 0.02631579 -0.2501870
5 0.8947211 0.1052789 3.132042 0.02564103 -0.6875388
bonacich_rescaled constraint
1 -0.10060043 0.3367355
2 -0.03346637 0.4521996
3 -0.03214166 0.6332636
4 -0.04726227 0.5061908
5 -0.12988143 0.5159787
This evaluation is more difficult as the range is made up of much smaller scaled scores, but seem to be somewhat in line with the Eigenvector centrality scores.
In addition to node-level centrality scores, I also want to calculate the network level centralization index for closeness centrality measures. Here, I get a network level closeness measure of 0.552.
#calculate closeness centralization index: igraph
centr_clo(gd_igraph_adj)$centralization
[1] 0.5519071
Betweenness represents the number of geodesics on which a node sits.
Now I want to take the closeness and betweenness to my centrality data frame and first, sort by and take a look at the nodes with the highest betweenness:
gd_ig_nodes%>%
arrange(desc(betweenness))%>%
slice(1:5)
X name degree normalized eigenvector
1 Bob Weir Bob Weir 105 4.375000 0.4003206
2 Phil Lesh Phil Lesh 81 3.375000 0.3539098
3 Pigpen Pigpen 51 2.125000 0.2438925
4 Robert Hunter Robert Hunter 117 4.875000 0.3233581
5 Jerry Garcia Jerry Garcia 142 5.916667 0.3305871
eigen_derived eigen_reflective betweenness closeness bonacich
1 0.7826043 0.2173957 121.65948 0.03225806 -0.5325373
2 0.8536971 0.1463029 90.39664 0.02941176 -0.1771572
3 0.8868967 0.1131033 44.02857 0.02500000 -0.5155271
4 0.6286725 0.3713275 24.10682 0.02702703 -0.1701447
5 0.6538547 0.3461453 16.58436 0.02631579 -0.2501870
bonacich_rescaled constraint
1 -0.10060043 0.3367355
2 -0.03346637 0.4521996
3 -0.09738708 0.5404552
4 -0.03214166 0.6332636
5 -0.04726227 0.5061908
The most immediate observations I have is that the highest degree node (Jerry Garcia) is not the node with the highest scoring betweenness. That goes to Bob Weir, who is still a relatively high degree node, but significantly lower than Jerry Garcia given that his betweenness score is so much higher (~121 compared to Garcia’s ~16).
I can make a guess that the two highest degree nodes, Jerry Garcia and Robert Hunter, having relatively low betweenness scores can be linked to the fact that the two wrote mostly together. Although the pair wrote the most songs in the originals catalog, Bob Weir wrote many songs with a variety of other songwriters, as suspected by the results in the degree scores. This would give him a higher level of betweenness.
Similarly, Phil Lesh and Pigpen, original band members who wrote relatively fewer songs, contributed to more songs that were written by the entire band, giving them more exposure to connections on the songs that they did write.
Bonacich power is a bit complicated but I have taken away the concept that it measures how powerful a node is based on the relative strength or weakness of their alters.
The lower, negative Bonacich power values imply that nodes become more powerful as their alters become weaker; positive Bonacich power values imply that nodes becomemore powerful as their alters become more powerful.
The cooperative songwriting network is likely not going to be a good example of this power compared to examining cooperative v. antagonistic relationships.
gd_ig_nodes%>%
arrange(desc(bonacich))%>%
slice(1:5)
X name degree normalized eigenvector
1 Frank Guida Frank Guida 2 0.08333333 0.03388727
2 Joe Royster Joe Royster 2 0.08333333 0.03388727
3 Donna Godchaux Donna Godchaux 6 0.25000000 0.23834221
4 Keith Godchaux Keith Godchaux 10 0.41666667 0.27350064
5 Robert Petersen Robert Petersen 5 0.20833333 0.05639403
eigen_derived eigen_reflective betweenness closeness bonacich
1 0.9620253 0.03797468 0.000000000 0.01612903 3.010938
2 0.9620253 0.03797468 0.000000000 0.01612903 3.010938
3 0.9771689 0.02283105 0.000000000 0.02272727 1.206495
4 0.9820014 0.01799856 0.009345794 0.02325581 1.157009
5 0.9367816 0.06321839 0.000000000 0.01818182 1.096478
bonacich_rescaled constraint
1 0.5687896 0.8224000
2 0.5687896 0.8224000
3 0.2279163 0.4514219
4 0.2185680 0.5143887
5 0.2071332 0.7134697
Constraint is a measure of the redundancy of a node’s connections. It is bound between 0 and 1, with 0 being a complete lack of restraint, and 1 being complete redundancy.
gd_ig_nodes %>%
select(name,constraint)
name constraint
1 Eric Andersen 1.0000000
2 John Barlow 0.6706222
3 Bob Bralove 0.4989170
4 Andrew Charles 1.0000000
5 John Dawson 1.2945238
6 Willie Dixon 0.7040590
7 Jerry Garcia 0.5061908
8 Donna Godchaux 0.4514219
9 Keith Godchaux 0.5143887
10 Gerrit Graham 1.0000000
11 Frank Guida 0.8224000
12 Mickey Hart 0.5294014
13 Robert Hunter 0.6332636
14 Bill Kreutzmann 0.5159787
15 Ned Lagin 1.0000000
16 Phil Lesh 0.4521996
17 Peter Monk 1.0000000
18 Brent Mydland 0.9325133
19 Dave Parker 0.5591083
20 Robert Petersen 0.7134697
21 Pigpen 0.5404552
22 Joe Royster 0.8224000
23 Rob Wasserman 0.4756234
24 Bob Weir 0.3367355
25 Vince Welnick 0.5216319
Similarly to the look at derived Eigenvector centrality, the highest network constraint scores go to songwriters that were relatively isolated in their contributions. However, beyond that, the results are very diverse in how they fall. The least constrained node, Bob Weir, is also the highest Eigenvector value as well as highest in betweenness. But second least constrained - Donna Godchaux - contributed to few songs, but most of them full-band collaborations, does indicate to me more of what this measure represents.
For attribution, please cite this work as
Becvar (2022, April 20). Grateful Network: Grateful Network Creation: igraph. Retrieved from http://gratefulnetwork.live/posts/gd-network-creation/
BibTeX citation
@misc{becvar2022grateful,
author = {Becvar, Kristina},
title = {Grateful Network: Grateful Network Creation: igraph},
url = {http://gratefulnetwork.live/posts/gd-network-creation/},
year = {2022}
}