Building a Grateful Dead Original Song Co-Writing Network in R using the Statnet Package
Loading the dataset and creating the network to begin my analysis:
Inspecting the first 8 columns of the data structure in the affiliation matrix format:
dim(gd_matrix)
[1] 25 181
gd_matrix[1:10, 1:4]
Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen 0 0 0 0
John Barlow 0 0 0 0
Bob Bralove 0 0 0 0
Andrew Charles 0 0 0 0
John Dawson 0 0 0 0
Willie Dixon 0 0 0 0
Jerry Garcia 1 1 0 1
Donna Godchaux 0 0 0 0
Keith Godchaux 0 0 0 0
Gerrit Graham 0 0 0 0
Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.
I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter (78) and between John Barlow and Bob Weir (21).
[1] 25 25
gd_projection[1:10, 1:4]
Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen 1 0 0 0
John Barlow 0 26 1 0
Bob Bralove 0 1 3 0
Andrew Charles 0 0 0 1
John Dawson 0 0 0 0
Willie Dixon 0 0 0 0
Jerry Garcia 0 0 0 0
Donna Godchaux 0 0 0 0
Keith Godchaux 0 0 0 0
Gerrit Graham 0 0 0 0
gd_projection["Jerry Garcia", "Robert Hunter"]
[1] 78
gd_projection["John Barlow", "Bob Weir"]
[1] 21
Coercing directly from the original affiliation matrix kept giving
the error, “Error: loops is FALSE, but
x contains loops.”, even when I give the appropriate
arguments. I tried using the “intergraph” package to convert the network
object created in igraph, but it cannot coerce bipartite igraph
networks.
After the bipartite projection, I was able to create the statnet network object.
set.seed(11)
gd_statnet <- as.network(gd_projection,
directed = FALSE,
bipartite = FALSE,
loops = FALSE,
connected = FALSE)
Looking at the features of the statnet network with the appropriate commands
print(gd_statnet)
Network attributes:
vertices = 25
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 65
missing edges= 0
non-missing edges= 65
Vertex attribute names:
vertex.names
No edge attributes
network::list.vertex.attributes(gd_statnet)
[1] "na" "vertex.names"
network::list.edge.attributes(gd_statnet)
[1] "na"
gd_statnet
Network attributes:
vertices = 25
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 65
missing edges= 0
non-missing edges= 65
Vertex attribute names:
vertex.names
No edge attributes
And a first look at the basic network structure
plot(gd_statnet)
Looking at the dyad/triad census info, I have a total of 2600 triads, which is in line with the results I got in igraph as well.
sna::dyad.census(gd_statnet)
Mut Asym Null
[1,] 65 0 235
sna::triad.census(gd_statnet)
003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 1216 0 760 0 0 0 0 0 0 0 237 0 0
120C 210 300
[1,] 0 0 87
sum(triad.census(gd_statnet))
[1] 2300
Looking next at the network transitivity, I can confirm that it is the same as the igraph network transitivity score of 0.5241.
gtrans(gd_statnet)
[1] 0.5240964
Looking at the geodesic distance tells me that on average, I can confirm that the path length in statnet is 1.93, which is not significantly different than result from igraph of 2.01.
gd_gd <- geodist(gd_statnet,na.omit = TRUE, ignore.eval = TRUE, inf.replace = 0)
mean(gd_gd$gdist)
[1] 1.9296
I can determine that there is one component in the statnet network object, as in igraph - 25 songwriters in the giant component and no isolates.
components(gd_statnet)
[1] 1
In this case, the statnet output is far different from the igraph output of ~2.1, so I am not sure what is happening with this aspect of the calculation.
network.density(gd_statnet)
[1] 0.2166667
Calculating the total centrality degree as well as the “in-degree centrality” and “out-degree centrality” clearly uses a different scale than igraph.
gd_stat_nodes <- data.frame(name=gd_statnet%v%"vertex.names",
totdegree=sna::degree(gd_statnet),
indegree=sna::degree(gd_statnet, cmode="indegree"),
outdegree=sna::degree(gd_statnet, cmode="outdegree"))
rescaled_degree <- degree(gd_statnet, g=1, gmode="graph",
diag=FALSE, tmaxdev=FALSE,
cmode="freeman", rescale=TRUE)
gd_stat_nodes$rescaled <- rescaled_degree
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(totdegree))%>%slice(1:5)
name totdegree indegree outdegree rescaled
1 Bob Weir 34 17 17 0.13076923
2 Phil Lesh 28 14 14 0.10769231
3 Robert Hunter 22 11 11 0.08461538
4 Jerry Garcia 20 10 10 0.07692308
5 Bill Kreutzmann 18 9 9 0.06923077
The statnet total degree scores are again, very different from igraph. In igraph, Jerry Garcia is the highest degree node, which gels with my hypothesis given his position as the practical and figurative head of the band and the fact that he contributed to more songs than any other songwriter.
However, more surprising than Jerry Garcia’s position in fourth in highest total degree centrality is his songwriting partner Robert Hunter’s position above him in third highest.
I can understand how Bob Weir has a high centrality despite lower song counts given his high Eigenvector centrality and betweenness status in previous igraph evaluations. However, I am surprised by his position as highest total degree and even more surprised that Phil Lesh is second highest total degree centrality overall. Both are co-founding members of the band, and did contribute to many songs written by the band, but not more than Jerry Garcia. It does not make sense to me, knowing the context and subject matter, that they are ranked higher than Jerry Garcia.
#calculate eigenvector centrality
eigen <- sna::evcent(gd_statnet, gmode="graph")
#add to nodes data frame
gd_stat_nodes$eigenvector <- eigen
gd_adjacency <- as.matrix(gd_statnet)
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency
#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)
#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)
#add to nodes data frame
gd_stat_nodes$eigen_derived <- gd_derived
gd_stat_nodes$eigen_reflective <- gd_reflective
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(eigenvector))%>%slice(1:5)
name totdegree indegree outdegree rescaled eigenvector
1 Bob Weir 34 17 17 0.13076923 0.4003206
2 Phil Lesh 28 14 14 0.10769231 0.3539098
3 Jerry Garcia 20 10 10 0.07692308 0.3305871
4 Robert Hunter 22 11 11 0.08461538 0.3233581
5 Bill Kreutzmann 18 9 9 0.06923077 0.3219131
eigen_derived eigen_reflective
1 0.8333333 0.1666667
2 0.8426966 0.1573034
3 0.8837209 0.1162791
4 0.8750000 0.1250000
5 0.8941176 0.1058824
The most immediate observations I have, is that the highest degree node in the igraph network, Jerry Garcia, was not the highest Eigenvector centrality - but in this network Bob Weir is highest both in degree and in Eigenvector centrality. The only change in the top five is that Jerry Garcia moved ahead of Robert Hunter, which makes sense.
The derived and reflective scores do not make sense - I’m not sure that the formula I used on the igraph network translates to statnet.
name totdegree indegree outdegree rescaled eigenvector
1 Bob Weir 34 17 17 0.13076923 0.4003206
2 Phil Lesh 28 14 14 0.10769231 0.3539098
3 Robert Hunter 22 11 11 0.08461538 0.3233581
4 Jerry Garcia 20 10 10 0.07692308 0.3305871
5 Bill Kreutzmann 18 9 9 0.06923077 0.3219131
eigen_derived eigen_reflective closeness
1 0.8333333 0.1666667 0.7741935
2 0.8426966 0.1573034 0.7058824
3 0.8750000 0.1250000 0.6486486
4 0.8837209 0.1162791 0.6315789
5 0.8941176 0.1058824 0.6153846
The closeness scores are similar to overall degree centrality, at least in the top scores, and these are also in line with the igraph results.
#calculate betweenness
between <- sna::betweenness(gd_statnet, gmode="graph")
#add to nodes data frame
gd_stat_nodes$betweenness <- between
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(betweenness))%>%slice(1:5)
name totdegree indegree outdegree rescaled eigenvector
1 Bob Weir 34 17 17 0.13076923 0.4003206
2 Phil Lesh 28 14 14 0.10769231 0.3539098
3 Pigpen 16 8 8 0.06153846 0.2438925
4 Robert Hunter 22 11 11 0.08461538 0.3233581
5 Jerry Garcia 20 10 10 0.07692308 0.3305871
eigen_derived eigen_reflective closeness betweenness
1 0.8333333 0.1666667 0.7741935 111.73333
2 0.8426966 0.1573034 0.7058824 89.40000
3 0.8857143 0.1142857 0.6000000 44.20000
4 0.8750000 0.1250000 0.6486486 30.45000
5 0.8837209 0.1162791 0.6315789 13.23333
Again, when comparing these results to the igraph results, the highest degree node (Jerry Garcia) was not the node with the highest scoring betweenness. In the statnet network, the highest degree node, Bob Weir, is also the highest betweenness score, and by a significantly similar ratio to Garcia as in the igraph network evaluation.
Similarly to igraph, Pigpen jumps up in the rankings for betweenness, likely because his contributions were primarily full-band compositions.
name totdegree indegree outdegree rescaled eigenvector
1 Donna Godchaux 12 6 6 0.046153846 0.23834221
2 Willie Dixon 4 2 2 0.015384615 0.06423801
3 Eric Andersen 2 1 1 0.007692308 0.04883644
4 Gerrit Graham 2 1 1 0.007692308 0.04883644
5 Joe Royster 4 2 2 0.015384615 0.03388727
eigen_derived eigen_reflective closeness betweenness bonacich
1 0.9062500 0.09375000 0.5454545 0 0.29459882
2 0.9090909 0.09090909 0.4528302 0 0.26898153
3 0.9411765 0.05882353 0.4444444 0 0.02561729
4 0.9411765 0.05882353 0.4444444 0 0.02561729
5 0.8000000 0.20000000 0.3870968 0 -0.07044754
write.csv(gd_stat_nodes, file = "gd_stat_nodes.csv")
gd_statnet %v% "degree"<- degree(gd_statnet) # Degree centrality
gd_statnet %v% "eigenvector"<- evcent(gd_statnet) # Eigenvector centrality
gd_statnet %v% "closeness"<- closeness(gd_statnet) # Closeness centrality
gd_statnet %v% "betweenness"<- betweenness(gd_statnet) # Vertex betweenness centrality
gd_statnet %v% "bonacich"<- bonpow(gd_statnet) # Bonacich power
term closeness totdegree eigenvector betweenness bonacich
1 closeness .97 .96 .79 -.36
2 totdegree .97 .95 .83 -.32
3 eigenvector .96 .95 .65 -.28
4 betweenness .79 .83 .65 -.25
5 bonacich -.36 -.32 -.28 -.25
For attribution, please cite this work as
Becvar (2022, April 21). Grateful Network: Grateful Network Creation: Statnet. Retrieved from http://gratefulnetwork.live/posts/gd-network-creation-statnet/
BibTeX citation
@misc{becvar2022grateful,
author = {Becvar, Kristina},
title = {Grateful Network: Grateful Network Creation: Statnet},
url = {http://gratefulnetwork.live/posts/gd-network-creation-statnet/},
year = {2022}
}