Grateful Network Creation: Statnet

network creation network analytics network visualization statnet

Building a Grateful Dead Original Song Co-Writing Network in R using the Statnet Package

Kristina Becvar http://gratefulnetwork.live/ (UMass DACSS Program (My Academic Blog Link))https://kristinabecvar.com
2022-04-21
Show code

Load Network Data

Affiliation Matrix

Loading the dataset and creating the network to begin my analysis:

Show code
gd_affiliation <- read.csv('gd_affiliation_matrix.csv', row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)

Inspecting the first 8 columns of the data structure in the affiliation matrix format:

Show code
dim(gd_matrix)
[1]  25 181
Show code
gd_matrix[1:10, 1:4]
               Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen                0                   0         0      0
John Barlow                  0                   0         0      0
Bob Bralove                  0                   0         0      0
Andrew Charles               0                   0         0      0
John Dawson                  0                   0         0      0
Willie Dixon                 0                   0         0      0
Jerry Garcia                 1                   1         0      1
Donna Godchaux               0                   0         0      0
Keith Godchaux               0                   0         0      0
Gerrit Graham                0                   0         0      0

Bipartite Projection

Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.

I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter (78) and between John Barlow and Bob Weir (21).

Show code
gd_projection <- gd_matrix%*%t(gd_matrix)
dim(gd_projection)
[1] 25 25
Show code
gd_projection[1:10, 1:4]
               Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen              1           0           0              0
John Barlow                0          26           1              0
Bob Bralove                0           1           3              0
Andrew Charles             0           0           0              1
John Dawson                0           0           0              0
Willie Dixon               0           0           0              0
Jerry Garcia               0           0           0              0
Donna Godchaux             0           0           0              0
Keith Godchaux             0           0           0              0
Gerrit Graham              0           0           0              0
Show code
gd_projection["Jerry Garcia", "Robert Hunter"]
[1] 78
Show code
gd_projection["John Barlow", "Bob Weir"]
[1] 21

Statnet Network

Coercing directly from the original affiliation matrix kept giving the error, “Error: loops is FALSE, but x contains loops.”, even when I give the appropriate arguments. I tried using the “intergraph” package to convert the network object created in igraph, but it cannot coerce bipartite igraph networks.

After the bipartite projection, I was able to create the statnet network object.

Show code
set.seed(11)
gd_statnet <- as.network(gd_projection,
               directed = FALSE, 
               bipartite = FALSE,
               loops = FALSE,
               connected = FALSE)

Network Features

Looking at the features of the statnet network with the appropriate commands

Show code
print(gd_statnet)
 Network attributes:
  vertices = 25 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 65 
    missing edges= 0 
    non-missing edges= 65 

 Vertex attribute names: 
    vertex.names 

No edge attributes
Show code
network::list.vertex.attributes(gd_statnet)
[1] "na"           "vertex.names"
Show code
network::list.edge.attributes(gd_statnet)
[1] "na"
Show code
gd_statnet
 Network attributes:
  vertices = 25 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 65 
    missing edges= 0 
    non-missing edges= 65 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Visualization

And a first look at the basic network structure

Show code
plot(gd_statnet)

Dyad & Triad Census

Looking at the dyad/triad census info, I have a total of 2600 triads, which is in line with the results I got in igraph as well.

Show code
sna::dyad.census(gd_statnet)
     Mut Asym Null
[1,]  65    0  235
Show code
sna::triad.census(gd_statnet)
      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 1216   0 760    0    0    0    0    0    0    0 237    0    0
     120C 210 300
[1,]    0   0  87
Show code
sum(triad.census(gd_statnet))
[1] 2300

Transitivity

Looking next at the network transitivity, I can confirm that it is the same as the igraph network transitivity score of 0.5241.

Show code
gtrans(gd_statnet)
[1] 0.5240964

Geodesic Distance

Looking at the geodesic distance tells me that on average, I can confirm that the path length in statnet is 1.93, which is not significantly different than result from igraph of 2.01.

Show code
gd_gd <- geodist(gd_statnet,na.omit = TRUE, ignore.eval = TRUE, inf.replace = 0)
mean(gd_gd$gdist)
[1] 1.9296

Components

I can determine that there is one component in the statnet network object, as in igraph - 25 songwriters in the giant component and no isolates.

Show code
components(gd_statnet)
[1] 1

Density

In this case, the statnet output is far different from the igraph output of ~2.1, so I am not sure what is happening with this aspect of the calculation.

Show code
network.density(gd_statnet)
[1] 0.2166667

Centrality

Total Centrality

Calculating the total centrality degree as well as the “in-degree centrality” and “out-degree centrality” clearly uses a different scale than igraph.

Show code
gd_stat_nodes <- data.frame(name=gd_statnet%v%"vertex.names",
    totdegree=sna::degree(gd_statnet),
    indegree=sna::degree(gd_statnet, cmode="indegree"),
    outdegree=sna::degree(gd_statnet, cmode="outdegree"))
rescaled_degree <- degree(gd_statnet, g=1, gmode="graph", 
        diag=FALSE, tmaxdev=FALSE, 
        cmode="freeman", rescale=TRUE)
gd_stat_nodes$rescaled <- rescaled_degree
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(totdegree))%>%slice(1:5)
             name totdegree indegree outdegree   rescaled
1        Bob Weir        34       17        17 0.13076923
2       Phil Lesh        28       14        14 0.10769231
3   Robert Hunter        22       11        11 0.08461538
4    Jerry Garcia        20       10        10 0.07692308
5 Bill Kreutzmann        18        9         9 0.06923077

The statnet total degree scores are again, very different from igraph. In igraph, Jerry Garcia is the highest degree node, which gels with my hypothesis given his position as the practical and figurative head of the band and the fact that he contributed to more songs than any other songwriter.

However, more surprising than Jerry Garcia’s position in fourth in highest total degree centrality is his songwriting partner Robert Hunter’s position above him in third highest.

I can understand how Bob Weir has a high centrality despite lower song counts given his high Eigenvector centrality and betweenness status in previous igraph evaluations. However, I am surprised by his position as highest total degree and even more surprised that Phil Lesh is second highest total degree centrality overall. Both are co-founding members of the band, and did contribute to many songs written by the band, but not more than Jerry Garcia. It does not make sense to me, knowing the context and subject matter, that they are ranked higher than Jerry Garcia.

Eigenvector Centrality

Show code
#calculate eigenvector centrality
eigen <- sna::evcent(gd_statnet, gmode="graph")
#add to nodes data frame
gd_stat_nodes$eigenvector <- eigen

gd_adjacency <- as.matrix(gd_statnet)
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency

#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)

#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)

#add to nodes data frame
gd_stat_nodes$eigen_derived <- gd_derived
gd_stat_nodes$eigen_reflective <- gd_reflective

#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(eigenvector))%>%slice(1:5)
             name totdegree indegree outdegree   rescaled eigenvector
1        Bob Weir        34       17        17 0.13076923   0.4003206
2       Phil Lesh        28       14        14 0.10769231   0.3539098
3    Jerry Garcia        20       10        10 0.07692308   0.3305871
4   Robert Hunter        22       11        11 0.08461538   0.3233581
5 Bill Kreutzmann        18        9         9 0.06923077   0.3219131
  eigen_derived eigen_reflective
1     0.8333333        0.1666667
2     0.8426966        0.1573034
3     0.8837209        0.1162791
4     0.8750000        0.1250000
5     0.8941176        0.1058824

The most immediate observations I have, is that the highest degree node in the igraph network, Jerry Garcia, was not the highest Eigenvector centrality - but in this network Bob Weir is highest both in degree and in Eigenvector centrality. The only change in the top five is that Jerry Garcia moved ahead of Robert Hunter, which makes sense.

The derived and reflective scores do not make sense - I’m not sure that the formula I used on the igraph network translates to statnet.

Closeness

Show code
#calculate closeness
close <- closeness(gd_statnet, gmode="graph")
#add to nodes data frame
gd_stat_nodes$closeness <- close
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(closeness))%>%slice(1:5)
             name totdegree indegree outdegree   rescaled eigenvector
1        Bob Weir        34       17        17 0.13076923   0.4003206
2       Phil Lesh        28       14        14 0.10769231   0.3539098
3   Robert Hunter        22       11        11 0.08461538   0.3233581
4    Jerry Garcia        20       10        10 0.07692308   0.3305871
5 Bill Kreutzmann        18        9         9 0.06923077   0.3219131
  eigen_derived eigen_reflective closeness
1     0.8333333        0.1666667 0.7741935
2     0.8426966        0.1573034 0.7058824
3     0.8750000        0.1250000 0.6486486
4     0.8837209        0.1162791 0.6315789
5     0.8941176        0.1058824 0.6153846

The closeness scores are similar to overall degree centrality, at least in the top scores, and these are also in line with the igraph results.

Betweenness

Show code
#calculate betweenness
between <- sna::betweenness(gd_statnet, gmode="graph")
#add to nodes data frame
gd_stat_nodes$betweenness <- between
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(betweenness))%>%slice(1:5)
           name totdegree indegree outdegree   rescaled eigenvector
1      Bob Weir        34       17        17 0.13076923   0.4003206
2     Phil Lesh        28       14        14 0.10769231   0.3539098
3        Pigpen        16        8         8 0.06153846   0.2438925
4 Robert Hunter        22       11        11 0.08461538   0.3233581
5  Jerry Garcia        20       10        10 0.07692308   0.3305871
  eigen_derived eigen_reflective closeness betweenness
1     0.8333333        0.1666667 0.7741935   111.73333
2     0.8426966        0.1573034 0.7058824    89.40000
3     0.8857143        0.1142857 0.6000000    44.20000
4     0.8750000        0.1250000 0.6486486    30.45000
5     0.8837209        0.1162791 0.6315789    13.23333

Again, when comparing these results to the igraph results, the highest degree node (Jerry Garcia) was not the node with the highest scoring betweenness. In the statnet network, the highest degree node, Bob Weir, is also the highest betweenness score, and by a significantly similar ratio to Garcia as in the igraph network evaluation.

Similarly to igraph, Pigpen jumps up in the rankings for betweenness, likely because his contributions were primarily full-band compositions.

Bonacich Power

Show code
#calculate bonacich power
bonpow <- sna::bonpow(gd_statnet, gmode="graph")
#add to nodes data frame
gd_stat_nodes$bonacich <- bonpow
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(bonacich))%>%slice(1:5)
            name totdegree indegree outdegree    rescaled eigenvector
1 Donna Godchaux        12        6         6 0.046153846  0.23834221
2   Willie Dixon         4        2         2 0.015384615  0.06423801
3  Eric Andersen         2        1         1 0.007692308  0.04883644
4  Gerrit Graham         2        1         1 0.007692308  0.04883644
5    Joe Royster         4        2         2 0.015384615  0.03388727
  eigen_derived eigen_reflective closeness betweenness    bonacich
1     0.9062500       0.09375000 0.5454545           0  0.29459882
2     0.9090909       0.09090909 0.4528302           0  0.26898153
3     0.9411765       0.05882353 0.4444444           0  0.02561729
4     0.9411765       0.05882353 0.4444444           0  0.02561729
5     0.8000000       0.20000000 0.3870968           0 -0.07044754
Show code
write.csv(gd_stat_nodes, file = "gd_stat_nodes.csv")

Add as Attributes

Show code
gd_statnet %v% "degree"<- degree(gd_statnet)       # Degree centrality
gd_statnet %v% "eigenvector"<- evcent(gd_statnet)       # Eigenvector centrality
gd_statnet %v% "closeness"<- closeness(gd_statnet)    # Closeness centrality
gd_statnet %v% "betweenness"<- betweenness(gd_statnet)  # Vertex betweenness centrality
gd_statnet %v% "bonacich"<- bonpow(gd_statnet) # Bonacich power

Correlations of Centrality Measures

Show code
correlations <- gd_stat_nodes %>% 
  select(totdegree,eigenvector,betweenness,closeness,bonacich)%>%
  correlate() %>%
  rearrange()
fashion(correlations)
         term closeness totdegree eigenvector betweenness bonacich
1   closeness                 .97         .96         .79     -.36
2   totdegree       .97                   .95         .83     -.32
3 eigenvector       .96       .95                     .65     -.28
4 betweenness       .79       .83         .65                 -.25
5    bonacich      -.36      -.32        -.28        -.25         

Citation

For attribution, please cite this work as

Becvar (2022, April 21). Grateful Network: Grateful Network Creation: Statnet. Retrieved from http://gratefulnetwork.live/posts/gd-network-creation-statnet/

BibTeX citation

@misc{becvar2022grateful,
  author = {Becvar, Kristina},
  title = {Grateful Network: Grateful Network Creation: Statnet},
  url = {http://gratefulnetwork.live/posts/gd-network-creation-statnet/},
  year = {2022}
}