Grateful Network Creation: igraph

network creation network analytics network visualization igraph

Building a Grateful Dead Original Song Co-Writing Network in R Using the igraph Package

Kristina Becvar http://gratefulnetwork.live/ (UMass DACSS Program (My Academic Blog Link))https://kristinabecvar.com
04-20-2022

Network Details

Background

For my project, I am using the Grateful Dead song writers data set that I used in this series of posts from my Social & Political Networks course to examine the network features of the co-writers of original Grateful Dead songs.

The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.

There are 25 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 25 nodes in the dataset.

There are a total of 181 (updated and still under review!) unique songs played in the course of their touring career, and the various combinations of co-writing combinations are now represented in a binary affiliation matrix.

I have considered using various measures as network weights, but in the end they have all been attributes and not weights. Unless there is a new metric that rises to the forefront of my analysis, this network will continue to begin analysis as an un-weighted, bipartite matrix.

Affiliation Matrix

Loading the dataset and creating the network to begin my analysis:

Show code
gd_affiliation <- read.csv('gd_affiliation_matrix.csv', row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)

Inspecting the first 8 columns of the data structure in the affiliation matrix format:

Show code
dim(gd_matrix)
[1]  25 181
Show code
gd_matrix[1:10, 1:4]
               Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen                0                   0         0      0
John Barlow                  0                   0         0      0
Bob Bralove                  0                   0         0      0
Andrew Charles               0                   0         0      0
John Dawson                  0                   0         0      0
Willie Dixon                 0                   0         0      0
Jerry Garcia                 1                   1         0      1
Donna Godchaux               0                   0         0      0
Keith Godchaux               0                   0         0      0
Gerrit Graham                0                   0         0      0

Bipartite Projection

Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.

I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter (78) and between John Barlow and Bob Weir (21).

Show code
gd_projection <- gd_matrix%*%t(gd_matrix)
dim(gd_projection)
[1] 25 25
Show code
gd_projection[1:10, 1:4]
               Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen              1           0           0              0
John Barlow                0          26           1              0
Bob Bralove                0           1           3              0
Andrew Charles             0           0           0              1
John Dawson                0           0           0              0
Willie Dixon               0           0           0              0
Jerry Garcia               0           0           0              0
Donna Godchaux             0           0           0              0
Keith Godchaux             0           0           0              0
Gerrit Graham              0           0           0              0
Show code
gd_projection["Jerry Garcia", "Robert Hunter"]
[1] 78
Show code
gd_projection["John Barlow", "Bob Weir"]
[1] 21

igraph

From Incidence Matrix

Converting network data into igraph object using the “graph_from_incidence_matrix()” function gave me all songwriters and songs as total vertices. Graphing after the bipartite projection allowed a more accurate network.

Show code
set.seed(11)
gd_igraph_from_im <- graph_from_incidence_matrix(gd_affiliation, directed = FALSE)

From Adjacency Matrix

Converting network data into igraph object using the “graph_from_incidence_matrix()” function gave me all songwriters and songs as total vertices. Graphing after the bipartite projection allowed a more accurate network.

Show code
set.seed(11)
gd_igraph_adj <- graph.adjacency(gd_projection,mode="undirected")

Network Analysis: igraph

Now to check the vertices in the graph I’ve created to ensure they represent the data accurately, and confirm that all of the attributes have been represented properly (the graph is undirected, unweighted, and is bipartite):

From Incidence Matrix

head(V(gd_igraph_from_im)$name)
[1] "Eric Andersen"  "John Barlow"    "Bob Bralove"   
[4] "Andrew Charles" "John Dawson"    "Willie Dixon"  
head(V(gd_igraph_from_im)$type)
[1] FALSE FALSE FALSE FALSE FALSE FALSE
is_directed(gd_igraph_from_im)
[1] FALSE
is_weighted(gd_igraph_from_im)
[1] FALSE
is_bipartite(gd_igraph_from_im)
[1] TRUE
igraph::vertex_attr_names(gd_igraph_from_im)
[1] "type" "name"
igraph::edge_attr_names(gd_igraph_from_im)
character(0)

Graphing directly from the incidence matrix gives a bipartite network, but when visualizing the network it is not clear if this is the way I want to represent this data.

Show code
plot(gd_igraph_from_im, layout=layout.bipartite)

From Adjacency Matrix (Bipartite Projection)

head(V(gd_igraph_adj)$name)
[1] "Eric Andersen"  "John Barlow"    "Bob Bralove"   
[4] "Andrew Charles" "John Dawson"    "Willie Dixon"  
head(V(gd_igraph_adj)$type)
NULL
is_directed(gd_igraph_adj)
[1] FALSE
is_weighted(gd_igraph_adj)
[1] FALSE
is_bipartite(gd_igraph_adj)
[1] FALSE
igraph::vertex_attr_names(gd_igraph_adj)
[1] "name"
igraph::edge_attr_names(gd_igraph_adj)
character(0)

Graphing from the bipartite projection and using the graph from adjacency matrix function, I have a network of the songwriters only, but I need to do more digging to see if this is the best way to represent the data.

Show code
plot(gd_igraph_adj)

Triad Check

Knowing this network has 25 vertices, I want to see if the triad census is working correctly by comparing the following data:

Show code
#possible triads in network
25*24*23/6
[1] 2300
Show code
sum(igraph::triad.census(gd_igraph_from_im))
[1] 1435820

The igraph created from the incidence matrix is NOT representing the triad census properly.

Show code
#possible triads in network
25*24*23/6
[1] 2300
Show code
sum(igraph::triad.census(gd_igraph_adj))
[1] 2300

The igraph network created from the adjacency matrix is representing the triad census is working the way it should be.

Transitivity

Looking next at the global v. average local transitivity of the network:

Show code
#get global clustering cofficient: igraph
transitivity(gd_igraph_from_im, type="global")
[1] 0
Show code
#get average local clustering coefficient: igraph
transitivity(gd_igraph_from_im, type="average")
[1] 0

This is another good sign that the correct choice is to graph from the adjacency matrix throug the bipartite projection going forward.

Show code
#get global clustering cofficient: igraph
transitivity(gd_igraph_adj, type="global")
[1] 0.5240964
Show code
#get average local clustering coefficient: igraph
transitivity(gd_igraph_adj, type="average")
[1] 0.7755587

This transitivity tells me that the average network transitivity is significantly higher than the global transitivity, indicating, from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.

Geodesic Distance

Looking at the geodesic distance tells me that on average, the path length is just over 2.

Show code
average.path.length(gd_igraph_adj,directed=F)
[1] 2.01

Components

Getting a look at the components of the network shows that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.

Show code
names(igraph::components(gd_igraph_adj))
[1] "membership" "csize"      "no"        
Show code
igraph::components(gd_igraph_adj)$no 
[1] 1
Show code
igraph::components(gd_igraph_adj)$csize
[1] 25

This is a great start - now I can get to looking at the network density, centrality, and centralization.

Density

The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. Since I’m using igraph, I know that its’ default output assumes that loops are not included but does not remove them, which can be corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. This gives me confidence that my network density is closer to 2.26.

Show code
graph.density(gd_igraph_adj)
[1] 2.453333
Show code
graph.density(gd_igraph_adj, loops=TRUE)
[1] 2.264615

Degree Measure

The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). I will also begin creating a dataframe for easier review going forward.

It is not surprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner and key collaborator, Robert Hunter, would have a higher degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life. Specifically, I’m surprised that Pigpen has a higher degree than John Barlow, who was the key collaborator and writing partner of Bob Weir. This tells me that the proportion of songs that Weir wrote outside of the songs written with Barlow were much more dramatic than the proportion of songs Garcia wrote outside the songs written with Hunter.

The original lineup of Jerry Garcia, Bob Weir, Phil Lesh, Bill Kreutzmann, and Pigpen as well as Robert Hunter’s presence in the formative years of the band’s most collaborative era, means that this degree ranking makes sense intuitively.

Show code
gd_ig_nodes<-data.frame(name=V(gd_igraph_adj)$name, degree=igraph::degree(gd_igraph_adj, loops = FALSE))
gd_ig_nodes$normalized <-igraph::degree(gd_igraph_adj, loops = FALSE, normalized = TRUE)

gd_ig_nodes%>%
  arrange(desc(degree))
                           name degree normalized
Jerry Garcia       Jerry Garcia    142 5.91666667
Robert Hunter     Robert Hunter    117 4.87500000
Bob Weir               Bob Weir    105 4.37500000
Phil Lesh             Phil Lesh     81 3.37500000
Bill Kreutzmann Bill Kreutzmann     62 2.58333333
Pigpen                   Pigpen     51 2.12500000
John Barlow         John Barlow     29 1.20833333
Mickey Hart         Mickey Hart     18 0.75000000
Brent Mydland     Brent Mydland     11 0.45833333
Keith Godchaux   Keith Godchaux     10 0.41666667
Bob Bralove         Bob Bralove      8 0.33333333
Vince Welnick     Vince Welnick      7 0.29166667
Donna Godchaux   Donna Godchaux      6 0.25000000
Rob Wasserman     Rob Wasserman      6 0.25000000
Dave Parker         Dave Parker      5 0.20833333
Robert Petersen Robert Petersen      5 0.20833333
John Dawson         John Dawson      2 0.08333333
Willie Dixon       Willie Dixon      2 0.08333333
Frank Guida         Frank Guida      2 0.08333333
Joe Royster         Joe Royster      2 0.08333333
Eric Andersen     Eric Andersen      1 0.04166667
Andrew Charles   Andrew Charles      1 0.04166667
Gerrit Graham     Gerrit Graham      1 0.04166667
Ned Lagin             Ned Lagin      1 0.04166667
Peter Monk           Peter Monk      1 0.04166667

Summary Statistics

A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data.

Show code
summary(gd_ig_nodes)
     name               degree         normalized     
 Length:25          Min.   :  1.00   Min.   :0.04167  
 Class :character   1st Qu.:  2.00   1st Qu.:0.08333  
 Mode  :character   Median :  6.00   Median :0.25000  
                    Mean   : 27.04   Mean   :1.12667  
                    3rd Qu.: 29.00   3rd Qu.:1.20833  
                    Max.   :142.00   Max.   :5.91667  

Network Visualizations

Now I want to take a step back and try to visually represent this data better.

Show code
#ggraph(gd_igraph_adj, layout = "fr") +
  #geom_edge_link() + 
  #geom_node_point(aes(color = factor(color))) + 
  #geom_node_text(aes(label = name), repel = TRUE) +
  #theme_void() +
  #theme(legend.position = "none") 

That is starting to look more meaningful!

Show code
# Set size to degree centrality 
#V(gd_igraph_adj)$size = degree=igraph::degree(gd_igraph_adj)

#Additional customisation for better legibility 
#ggraph(gd_igraph_adj, layout = "fr") +
  #geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) + 
  #geom_node_point(aes(size = size, color = factor(color))) + 
  #geom_node_text(aes(label = name, size = size), repel = TRUE) +
  #theme_void() +
  #theme(legend.position = "none") 

Centrality Measures

To examine additional centrality and power scores of the nodes, I added to the data frame with the centrality degree and normalized centrality. Now I am adding Bonacich power, rescaled Bonacich power, Eigenvector centrality scores and the breakdown of reflected and derived centrality scores. Additionally, I am adding the closeness, betweenness, and Burt centrality scores.

To calculate the reflected and derived centrality scores, I first run some operations on the adjacency matrix and keep in mind that these two scores make up the entire calculation of the Eigenvector centrality score.

Show code
#gd_adjacency <- as.matrix(as_adjacency_matrix(gd_igraph_adj))
#gd_adjacency_2 <- gd_adjacency %*% gd_adjacency

#calculate Bonacich power
#bp_ig1 <- power_centrality(gd_igraph_adj) #with a default index of "1"
#bp_ig2 <- power_centrality(gd_igraph_adj, rescale = TRUE) #rescaled so they sum to "1"

#calculate portion of reflected centrality
#gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
#gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)

#calculate derived centrality
#gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
#gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)

#calculate closeness centrality: igraph
#close <- igraph::closeness(gd_igraph_adj)

#calculate betweenness centrality: igraph
#between <- igraph::betweenness(gd_igraph_adj, directed=FALSE)

#calculate Burt's network constraint
#constraint <- constraint(gd_igraph_adj)

#add these values to the data frame I started
#gd_ig_nodes$eigenvector <- eigen
#gd_ig_nodes$eigen_derived <- gd_derived
#gd_ig_nodes$eigen_reflective <- gd_reflective
#gd_ig_nodes$betweenness <- between
#gd_ig_nodes$closeness <- close
#gd_ig_nodes$bonacich <- bp_ig1
#gd_ig_nodes$bonacich_rescaled <- bp_ig2
#gd_ig_nodes$constraint <- constraint
#options(scipen = 999)

gd_ig_nodes<-read.csv("gd_ig_nodes.csv")
head(gd_ig_nodes[2:7])
            name degree normalized eigenvector eigen_derived
1  Eric Andersen      1 0.04166667  0.04883644     0.9875776
2    John Barlow     29 1.20833333  0.07763512     0.7648126
3    Bob Bralove      8 0.33333333  0.12770600     0.9733796
4 Andrew Charles      1 0.04166667  0.04317463     0.9829060
5    John Dawson      2 0.08333333  0.07977694     0.9933775
6   Willie Dixon      2 0.08333333  0.06423801     0.9823529
  eigen_reflective
1      0.012422360
2      0.235187424
3      0.026620370
4      0.017094017
5      0.006622517
6      0.017647059

Eigenvector Centrality

I am also interested in the Eigenvector centrality scores - Both the top as well as the lowest value scores.

Show code
gd_ig_nodes%>%
  arrange(desc(eigenvector))%>%
  slice(1:5)
                X            name degree normalized eigenvector
1        Bob Weir        Bob Weir    105   4.375000   0.4003206
2       Phil Lesh       Phil Lesh     81   3.375000   0.3539098
3    Jerry Garcia    Jerry Garcia    142   5.916667   0.3305871
4   Robert Hunter   Robert Hunter    117   4.875000   0.3233581
5 Bill Kreutzmann Bill Kreutzmann     62   2.583333   0.3219131
  eigen_derived eigen_reflective betweenness  closeness   bonacich
1     0.7826043        0.2173957  121.659478 0.03225806 -0.5325373
2     0.8536971        0.1463029   90.396640 0.02941176 -0.1771572
3     0.6538547        0.3461453   16.584364 0.02631579 -0.2501870
4     0.6286725        0.3713275   24.106816 0.02702703 -0.1701447
5     0.8947211        0.1052789    3.132042 0.02564103 -0.6875388
  bonacich_rescaled constraint
1       -0.10060043  0.3367355
2       -0.03346637  0.4521996
3       -0.04726227  0.5061908
4       -0.03214166  0.6332636
5       -0.12988143  0.5159787

Bob Weir having top overall Eigenvector centrality makes sense - he is a core founding member and a prolific songwriter. He did not write as many songs as Jerry Garcia or Robert Hunter, but he wrote with a larger variety of people.

The top derived Eigenvector centrality scores are songwriters who contributed to songs but were outside the band member or key collaborator circle, which again makes sense.

Robert Hunter having the top reflective Eigenvector centrality score is not a shock - he has long held the unofficial title of band member and as the person behind the songwriting magic of the Grateful Dead. His primary songwriting partner was Jerry Garcia, but he also wrote songs with the early, full band and later with almost all of the individual members of the band.

Closeness

The closeness centrality of a node is defined as the sum of the geodesic distances between that node and all other nodes in a network.

Show code
gd_ig_nodes%>%
  arrange(desc(closeness))%>%
  slice(1:5)
                X            name degree normalized eigenvector
1        Bob Weir        Bob Weir    105   4.375000   0.4003206
2       Phil Lesh       Phil Lesh     81   3.375000   0.3539098
3   Robert Hunter   Robert Hunter    117   4.875000   0.3233581
4    Jerry Garcia    Jerry Garcia    142   5.916667   0.3305871
5 Bill Kreutzmann Bill Kreutzmann     62   2.583333   0.3219131
  eigen_derived eigen_reflective betweenness  closeness   bonacich
1     0.7826043        0.2173957  121.659478 0.03225806 -0.5325373
2     0.8536971        0.1463029   90.396640 0.02941176 -0.1771572
3     0.6286725        0.3713275   24.106816 0.02702703 -0.1701447
4     0.6538547        0.3461453   16.584364 0.02631579 -0.2501870
5     0.8947211        0.1052789    3.132042 0.02564103 -0.6875388
  bonacich_rescaled constraint
1       -0.10060043  0.3367355
2       -0.03346637  0.4521996
3       -0.03214166  0.6332636
4       -0.04726227  0.5061908
5       -0.12988143  0.5159787

This evaluation is more difficult as the range is made up of much smaller scaled scores, but seem to be somewhat in line with the Eigenvector centrality scores.

In addition to node-level centrality scores, I also want to calculate the network level centralization index for closeness centrality measures. Here, I get a network level closeness measure of 0.552.

#calculate closeness centralization index: igraph
centr_clo(gd_igraph_adj)$centralization
[1] 0.5519071

Betweenness

Betweenness represents the number of geodesics on which a node sits.

Now I want to take the closeness and betweenness to my centrality data frame and first, sort by and take a look at the nodes with the highest betweenness:

Show code
gd_ig_nodes%>%
  arrange(desc(betweenness))%>%
  slice(1:5)
              X          name degree normalized eigenvector
1      Bob Weir      Bob Weir    105   4.375000   0.4003206
2     Phil Lesh     Phil Lesh     81   3.375000   0.3539098
3        Pigpen        Pigpen     51   2.125000   0.2438925
4 Robert Hunter Robert Hunter    117   4.875000   0.3233581
5  Jerry Garcia  Jerry Garcia    142   5.916667   0.3305871
  eigen_derived eigen_reflective betweenness  closeness   bonacich
1     0.7826043        0.2173957   121.65948 0.03225806 -0.5325373
2     0.8536971        0.1463029    90.39664 0.02941176 -0.1771572
3     0.8868967        0.1131033    44.02857 0.02500000 -0.5155271
4     0.6286725        0.3713275    24.10682 0.02702703 -0.1701447
5     0.6538547        0.3461453    16.58436 0.02631579 -0.2501870
  bonacich_rescaled constraint
1       -0.10060043  0.3367355
2       -0.03346637  0.4521996
3       -0.09738708  0.5404552
4       -0.03214166  0.6332636
5       -0.04726227  0.5061908

The most immediate observations I have is that the highest degree node (Jerry Garcia) is not the node with the highest scoring betweenness. That goes to Bob Weir, who is still a relatively high degree node, but significantly lower than Jerry Garcia given that his betweenness score is so much higher (~121 compared to Garcia’s ~16).

I can make a guess that the two highest degree nodes, Jerry Garcia and Robert Hunter, having relatively low betweenness scores can be linked to the fact that the two wrote mostly together. Although the pair wrote the most songs in the originals catalog, Bob Weir wrote many songs with a variety of other songwriters, as suspected by the results in the degree scores. This would give him a higher level of betweenness.

Similarly, Phil Lesh and Pigpen, original band members who wrote relatively fewer songs, contributed to more songs that were written by the entire band, giving them more exposure to connections on the songs that they did write.

Bonacich Power

Bonacich power is a bit complicated but I have taken away the concept that it measures how powerful a node is based on the relative strength or weakness of their alters.

The lower, negative Bonacich power values imply that nodes become more powerful as their alters become weaker; positive Bonacich power values imply that nodes becomemore powerful as their alters become more powerful.

The cooperative songwriting network is likely not going to be a good example of this power compared to examining cooperative v. antagonistic relationships.

Show code
gd_ig_nodes%>%
  arrange(desc(bonacich))%>%
  slice(1:5)
                X            name degree normalized eigenvector
1     Frank Guida     Frank Guida      2 0.08333333  0.03388727
2     Joe Royster     Joe Royster      2 0.08333333  0.03388727
3  Donna Godchaux  Donna Godchaux      6 0.25000000  0.23834221
4  Keith Godchaux  Keith Godchaux     10 0.41666667  0.27350064
5 Robert Petersen Robert Petersen      5 0.20833333  0.05639403
  eigen_derived eigen_reflective betweenness  closeness bonacich
1     0.9620253       0.03797468 0.000000000 0.01612903 3.010938
2     0.9620253       0.03797468 0.000000000 0.01612903 3.010938
3     0.9771689       0.02283105 0.000000000 0.02272727 1.206495
4     0.9820014       0.01799856 0.009345794 0.02325581 1.157009
5     0.9367816       0.06321839 0.000000000 0.01818182 1.096478
  bonacich_rescaled constraint
1         0.5687896  0.8224000
2         0.5687896  0.8224000
3         0.2279163  0.4514219
4         0.2185680  0.5143887
5         0.2071332  0.7134697

Network Constraint (Burt)

Constraint is a measure of the redundancy of a node’s connections. It is bound between 0 and 1, with 0 being a complete lack of restraint, and 1 being complete redundancy.

Show code
gd_ig_nodes %>%
  select(name,constraint)
              name constraint
1    Eric Andersen  1.0000000
2      John Barlow  0.6706222
3      Bob Bralove  0.4989170
4   Andrew Charles  1.0000000
5      John Dawson  1.2945238
6     Willie Dixon  0.7040590
7     Jerry Garcia  0.5061908
8   Donna Godchaux  0.4514219
9   Keith Godchaux  0.5143887
10   Gerrit Graham  1.0000000
11     Frank Guida  0.8224000
12     Mickey Hart  0.5294014
13   Robert Hunter  0.6332636
14 Bill Kreutzmann  0.5159787
15       Ned Lagin  1.0000000
16       Phil Lesh  0.4521996
17      Peter Monk  1.0000000
18   Brent Mydland  0.9325133
19     Dave Parker  0.5591083
20 Robert Petersen  0.7134697
21          Pigpen  0.5404552
22     Joe Royster  0.8224000
23   Rob Wasserman  0.4756234
24        Bob Weir  0.3367355
25   Vince Welnick  0.5216319

Similarly to the look at derived Eigenvector centrality, the highest network constraint scores go to songwriters that were relatively isolated in their contributions. However, beyond that, the results are very diverse in how they fall. The least constrained node, Bob Weir, is also the highest Eigenvector value as well as highest in betweenness. But second least constrained - Donna Godchaux - contributed to few songs, but most of them full-band collaborations, does indicate to me more of what this measure represents.

Citation

For attribution, please cite this work as

Becvar (2022, April 20). Grateful Network: Grateful Network Creation: igraph. Retrieved from http://gratefulnetwork.live/posts/gd-network-creation/

BibTeX citation

@misc{becvar2022grateful,
  author = {Becvar, Kristina},
  title = {Grateful Network: Grateful Network Creation: igraph},
  url = {http://gratefulnetwork.live/posts/gd-network-creation/},
  year = {2022}
}