Comparing Network Packages and Moving Forward with Clustering, Blockmodels, Roles, and Community Structures
I previously analyzed the network using igraph and statnet, and need to make a decision about which package serves the network best.
Loading the dataset and creating the network to begin my analysis:
set.seed(11)
gd_igraph <- graph.adjacency(gd_projection,mode="undirected", weighted = NULL)
set.seed(11)
gd_statnet <- as.network(gd_projection,
directed = FALSE,
bipartite = FALSE,
loops = FALSE,
connected = TRUE)
I am going to load the data frame I saved from both igraph and statnet package analysis as well as a dataframe with key comparable results from each package
Call:
lm(formula = igraph_degree ~ statnet_degree, data = gd_compare)
Residuals:
Min 1Q Median 3Q Max
-31.415 -16.319 0.516 7.502 76.626
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -14.4884 7.3890 -1.961 0.0621 .
statnet_degree 3.9931 0.5505 7.254 2.21e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.36 on 23 degrees of freedom
Multiple R-squared: 0.6958, Adjusted R-squared: 0.6826
F-statistic: 52.62 on 1 and 23 DF, p-value: 2.206e-07
Call:
lm(formula = ig_eigen ~ stat_eigen, data = gd_compare)
Residuals:
Min 1Q Median 3Q Max
-3.071e-17 -1.200e-17 -5.143e-18 3.155e-18 1.343e-16
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.441e-17 1.015e-17 4.377e+00 0.00022 ***
stat_eigen 1.000e+00 5.073e-17 1.971e+16 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.023e-17 on 23 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.886e+32 on 1 and 23 DF, p-value: < 2.2e-16
Call:
lm(formula = ig_between ~ stat_between, data = gd_compare)
Residuals:
Min 1Q Median 3Q Max
-7.2702 0.4472 0.6129 0.6129 4.8884
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.61294 0.51241 -1.196 0.244
stat_between 1.05057 0.01668 62.977 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.354 on 23 degrees of freedom
Multiple R-squared: 0.9942, Adjusted R-squared: 0.994
F-statistic: 3966 on 1 and 23 DF, p-value: < 2.2e-16
Call:
lm(formula = ig_close ~ stat_close, data = gd_compare)
Residuals:
Min 1Q Median 3Q Max
-5.941e-10 -2.298e-10 -7.996e-11 2.380e-10 4.002e-10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.453e-11 3.053e-10 4.800e-02 0.962
stat_close 4.167e-02 5.819e-10 7.161e+07 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.918e-10 on 23 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 5.127e+15 on 1 and 23 DF, p-value: < 2.2e-16
Call:
lm(formula = ig_bonpow ~ stat_bonpow, data = gd_compare)
Residuals:
Min 1Q Median 3Q Max
-1.2600 -0.4787 -0.1791 0.1350 2.3819
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6720 0.2827 2.377 0.0261 *
stat_bonpow 0.6107 0.2827 2.160 0.0414 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.929 on 23 degrees of freedom
Multiple R-squared: 0.1687, Adjusted R-squared: 0.1326
F-statistic: 4.667 on 1 and 23 DF, p-value: 0.0414
There is high, nearly perfect in some cases, correlation in the scores given by the igraph and statnet packages in most measures. The largest differences are in the Bonacich power scores. The other difference, and most significant for my purposes, is the degree centrality scores. Although they are statistically correlated to a high degree, I have no independent way to verify which evaluation is “correct”, so I will go forward using both networks depending on the needs of the task and capabilities of the packages.
Creating the matrix element then taking a look at the summary using the equivalence function “sedist”, the default measure of assessing the approximate structural equivalence of actors, or “complete”.
#calculate equivalence from specified distance marix
gd_stat_se<-equiv.clust(gd_statnet, equiv.fun="sedist", method="hamming",mode="graph")
#summary of object produced by sedist()
#summary(gd_stat_se)
#plot equivalence clustering
plot(gd_stat_se,labels=gd_stat_se$glabels)
I need to look at the other methods of clustering as well.
#with average cluster.method
gd_avg_se<-equiv.clust(gd_statnet, equiv.fun="sedist", cluster.method="average", method="hamming",mode="graph")
#plot:
plot(gd_avg_se,labels=gd_stat_se$glabels)
#with average cluster.method
gd_sing_se<-equiv.clust(gd_statnet, equiv.fun="sedist", cluster.method="single", method="hamming",mode="graph")
#plot:
plot(gd_sing_se,labels=gd_stat_se$glabels)
#with average cluster.method
gd_wrd_se<-equiv.clust(gd_statnet, equiv.fun="sedist", cluster.method="ward.D", method="hamming",mode="graph")
#plot:
plot(gd_wrd_se,labels=gd_stat_se$glabels)
I understand that the number of partitions (or roles) will depend on the height at which the dendrogram is cut. Using the tutorial example, I set the height at 15 and the result is 9 clusters. Using the alternate view from the tutorial, I also set the height at 10, and identify 10 distinct clusters.
#plot equivalence clustering
plot(gd_wrd_se,labels=gd_wrd_se$glabels)
#partition the clusters
rect.hclust(gd_wrd_se$cluster,h=15)
#plot equivalence clustering
plot(gd_wrd_se,labels=gd_wrd_se$glabels)
#partition the clusters
rect.hclust(gd_wrd_se$cluster,h=10)
For my own experimenting, looking at it with a higher height (“20”), it spreads the clusters to 5.
#plot equivalence clustering
plot(gd_wrd_se,labels=gd_wrd_se$glabels)
#partition the clusters
rect.hclust(gd_wrd_se$cluster,h=20)
Inspecting the goodness of fit of the partitions that result from the clustering steps above using blockmodeling to try and get a better sense of how well the partitioning worked. Using the blockmodel command in statnet and specifying “k=x” means that “x” will indicate how many partitions to create, and “h=x” means that “x” will indicate the height to cut the dendogram.
#blockmodel and select partitions
blk_mod<-blockmodel(gd_statnet,gd_wrd_se,k=2)
#print blockmodel object
blk_mod
Network Blockmodel:
Block membership:
Eric Andersen John Barlow Bob Bralove Andrew Charles
1 1 1 1
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
1 1 2 2
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
2 1 1 2
Robert Hunter Bill Kreutzmann Ned Lagin Phil Lesh
2 2 1 2
Peter Monk Brent Mydland Dave Parker Robert Petersen
1 1 2 1
Pigpen Joe Royster Rob Wasserman Bob Weir
2 1 1 2
Vince Welnick
1
Reduced form blockmodel:
Eric Andersen John Barlow Bob Bralove Andrew Charles John Dawson Willie Dixon Jerry Garcia Donna Godchaux Keith Godchaux Gerrit Graham Frank Guida Mickey Hart Robert Hunter Bill Kreutzmann Ned Lagin Phil Lesh Peter Monk Brent Mydland Dave Parker Robert Petersen Pigpen Joe Royster Rob Wasserman Bob Weir Vince Welnick
Block 1 Block 2
Block 1 0.07619048 0.1333333
Block 2 0.13333333 0.8222222
plot.block<-function(x=blk_mod, main=NULL, cex.lab=1){
plot.sociomatrix(x$blocked.data, labels=list(x$plabels,x$plabels),
main=main, drawlines = FALSE, cex.lab=cex.lab)
for (j in 2:length(x$plabels)) if (x$block.membership[j] !=
x$block.membership[j-1])
abline(v = j - 0.5, h = j - 0.5, lty = 3, xpd=FALSE)
}
plot.block(blk_mod,main="Grateful Dead Songwriting: 2 Partitions", cex.lab=.4)
#blockmodel and select partitions
blk_mod2<-blockmodel(gd_statnet, gd_wrd_se,k=5)
#print blockmodel object
blk_mod2$block.model
Block 1 Block 2 Block 3 Block 4 Block 5
Block 1 0.04545455 0.05555556 0.04166667 0.4166667 0.4166667
Block 2 0.05555556 1.00000000 0.12500000 0.0000000 1.0000000
Block 3 0.04166667 0.12500000 0.71428571 1.0000000 1.0000000
Block 4 0.41666667 0.00000000 1.00000000 NaN 1.0000000
Block 5 0.41666667 1.00000000 1.00000000 1.0000000 NaN
#plot blockmodel partitions
plot.block(blk_mod2,main="Grateful Dead Songwriting, 5 Partitions", cex.lab=.5)
library(concoR)
#select partitions with concor
concoR::concor_hca(list(gd_projection), p=2)
block vertex
1 1 Eric Andersen
2 1 John Barlow
6 2 Bob Bralove
10 3 Andrew Charles
16 4 John Dawson
7 2 Willie Dixon
17 4 Jerry Garcia
18 4 Donna Godchaux
19 4 Keith Godchaux
3 1 Gerrit Graham
11 3 Frank Guida
20 4 Mickey Hart
21 4 Robert Hunter
22 4 Bill Kreutzmann
12 3 Ned Lagin
23 4 Phil Lesh
13 3 Peter Monk
4 1 Brent Mydland
24 4 Dave Parker
14 3 Robert Petersen
25 4 Pigpen
15 3 Joe Royster
8 2 Rob Wasserman
5 1 Bob Weir
9 2 Vince Welnick
#select partitions with concor
blks<-concoR::concor_hca(list(gd_projection), p=2)
#blockmodel with concor
blk_mod <- blockmodel(gd_statnet, blks$block, plabels=blks$vertex)
#plot blockmodel object
plot.block(blk_mod,main="Grateful Dead Songwriting, concoR 4 Partitions", cex.lab=.5)
In my semester assignment posts, I looked more deeply into the role and blockmodeling of the network. In the end I was able to find a model that represents the network most intuitively - the optimized 5-partition blockmodel.
#select partitions with optimization
blks3<-blockmodeling::optRandomParC(gd_projection, k=5, rep=10, approaches="ss", blocks="com")
#blockmodel with optimized partition
blk_mod3<-blockmodel(gd_projection, blks3$best$best1$clu, plabels=rownames(gd_projection))
#print blockmodel object
blk_mod3$block.model
plot.block(blk_mod3,main="Grateful Dead Songwriting, 5 Optimized Partitions", cex.lab=.5)
#I can assign "block.membership" as a vertex attribute to my 5-partition blockmodel for igraph
V(gd_igraph)$role<-blk_mod3$block.membership[match(V(gd_igraph)$name,blk_mod3$plabels)]
#and statnet
gd_statnet%v%"role"<-blk_mod3$block.membership[match(gd_statnet%v%"vertex.names", blk_mod3$plabels)]
#add to node dataframes
#attach role to centrality dataframe to create nodes dataframe
gd_stat_nodes$block <- blk_mod3$block.membership
gd_ig_nodes$block <- blk_mod3$block.membership
I can assign “block.membership” as a vertex attribute to my 5-partition blockmodel, then use the role attribute to change the color of plotted nodes in a network plot.
#plot(gd_igraph, layout=layout_with_kk, vertex.color=V(gd_igraph)$role)
#library(GGally)
#GGally::ggnet2(gd_statnet,
#node.color="role",
#node.size=degree(gd_statnet, gmode="graph"),
#node.label = "vertex.names",
#node.alpha = .5)
The method attempts to detect dense sub-graphs by optimizing modularity scores on igraph networks that are un-directed. I’ll start with inspecting the names that are part of the new object.
The fast and greedy function was giving me an error code of:
Error in cluster_fast_greedy(gd_network_ig):At fast_community.c:660: fast-greedy community finding works only on graphs without multiple edges, Invalid value
Some community sourcing of opinions led me to run the “simplify()” function to correct this. And it did, during my semester assignment. But now, simplify() is simply not working. I’m not going too deep into this now, because frankly this community was the least intuitive when I ran it previously.
[1] FALSE
The walktrap community detection created one community containing all of the songwriters in the giant component. I will not plot this community.
#Run clustering algorithm: walktrap
wt_gd <- walktrap.community(gd_igraph)
#Inspect community membership
igraph::groups(wt_gd)
$`1`
[1] "Eric Andersen" "John Barlow" "Bob Bralove"
[4] "Andrew Charles" "John Dawson" "Willie Dixon"
[7] "Jerry Garcia" "Donna Godchaux" "Keith Godchaux"
[10] "Gerrit Graham" "Frank Guida" "Mickey Hart"
[13] "Robert Hunter" "Bill Kreutzmann" "Ned Lagin"
[16] "Phil Lesh" "Peter Monk" "Brent Mydland"
[19] "Dave Parker" "Robert Petersen" "Pigpen"
[22] "Joe Royster" "Rob Wasserman" "Bob Weir"
[25] "Vince Welnick"
Saving the scores for evaluation and later analysis; I will continue to add the other community modularity scores into this vector as I run them.
mods<-c(walktrap=modularity(wt_gd))
mods
walktrap
0.2483386
In this evaluation, each of the nodes was indicated to be in its’ own community. I will not plot this community.
#Run clustering algorithm: leading label
lab_gd<-label.propagation.community(gd_igraph)
#Inspect community membership
igraph::groups(lab_gd)
$`1`
[1] "Eric Andersen"
$`2`
[1] "John Barlow"
$`3`
[1] "Bob Bralove"
$`4`
[1] "Andrew Charles"
$`5`
[1] "John Dawson"
$`6`
[1] "Willie Dixon"
$`7`
[1] "Jerry Garcia"
$`8`
[1] "Donna Godchaux"
$`9`
[1] "Keith Godchaux"
$`10`
[1] "Gerrit Graham"
$`11`
[1] "Frank Guida"
$`12`
[1] "Mickey Hart"
$`13`
[1] "Robert Hunter"
$`14`
[1] "Bill Kreutzmann"
$`15`
[1] "Ned Lagin"
$`16`
[1] "Phil Lesh"
$`17`
[1] "Peter Monk"
$`18`
[1] "Brent Mydland"
$`19`
[1] "Dave Parker"
$`20`
[1] "Robert Petersen"
$`21`
[1] "Pigpen"
$`22`
[1] "Joe Royster"
$`23`
[1] "Rob Wasserman"
$`24`
[1] "Bob Weir"
$`25`
[1] "Vince Welnick"
#collect modularity scores to compare
mods<-c(mods, label=modularity(lab_gd))
Again, each of the nodes was indicated to be in its’ own community. I will not plot this community.
#Run clustering algorithm: edge betweenness
edge_gd <- label.propagation.community(gd_igraph)
#Inspect community membership
igraph::groups(edge_gd)
$`1`
[1] "Eric Andersen"
$`2`
[1] "John Barlow"
$`3`
[1] "Bob Bralove"
$`4`
[1] "Andrew Charles"
$`5`
[1] "John Dawson"
$`6`
[1] "Willie Dixon"
$`7`
[1] "Jerry Garcia"
$`8`
[1] "Donna Godchaux"
$`9`
[1] "Keith Godchaux"
$`10`
[1] "Gerrit Graham"
$`11`
[1] "Frank Guida"
$`12`
[1] "Mickey Hart"
$`13`
[1] "Robert Hunter"
$`14`
[1] "Bill Kreutzmann"
$`15`
[1] "Ned Lagin"
$`16`
[1] "Phil Lesh"
$`17`
[1] "Peter Monk"
$`18`
[1] "Brent Mydland"
$`19`
[1] "Dave Parker"
$`20`
[1] "Robert Petersen"
$`21`
[1] "Pigpen"
$`22`
[1] "Joe Royster"
$`23`
[1] "Rob Wasserman"
$`24`
[1] "Bob Weir"
$`25`
[1] "Vince Welnick"
#collect modularity scores to compare
mods<-c(mods, edge=modularity(edge_gd))
This method has created four communities to examine.
#Run clustering algorithm: leading eigenvector
eigen_gd <- leading.eigenvector.community(gd_igraph)
#Inspect community membership
igraph::groups(eigen_gd)
$`1`
[1] "Eric Andersen" "John Barlow" "Bob Bralove" "Willie Dixon"
[5] "Gerrit Graham" "Brent Mydland" "Rob Wasserman" "Bob Weir"
[9] "Vince Welnick"
$`2`
[1] "John Dawson" "Jerry Garcia" "Robert Hunter"
$`3`
[1] "Andrew Charles" "Frank Guida" "Bill Kreutzmann"
[4] "Ned Lagin" "Phil Lesh" "Peter Monk"
[7] "Dave Parker" "Robert Petersen" "Pigpen"
[10] "Joe Royster"
$`4`
[1] "Donna Godchaux" "Keith Godchaux" "Mickey Hart"
#collect modularity scores to compare
mods<-c(mods, eigen=modularity(eigen_gd))
igraph colors the nodes by community
#plot network with community coloring
plot(eigen_gd,gd_igraph)
This method has also created 4 communities to examine.
#Run clustering algorithm: spinglass
spin_gd <- spinglass.community(gd_igraph)
#Inspect community membership
igraph::groups(spin_gd)
$`1`
[1] "Frank Guida" "Dave Parker" "Pigpen" "Joe Royster"
$`2`
[1] "Andrew Charles" "Ned Lagin" "Phil Lesh"
[4] "Peter Monk" "Brent Mydland" "Robert Petersen"
$`3`
[1] "John Dawson" "Jerry Garcia" "Donna Godchaux"
[4] "Keith Godchaux" "Mickey Hart" "Robert Hunter"
[7] "Bill Kreutzmann"
$`4`
[1] "Eric Andersen" "John Barlow" "Bob Bralove" "Willie Dixon"
[5] "Gerrit Graham" "Rob Wasserman" "Bob Weir" "Vince Welnick"
#collect modularity scores to compare
mods<-c(mods, spin=modularity(spin_gd))
Again, igraph colors the nodes by community
#plot network with community coloring
plot(spin_gd,gd_igraph)
It would be worth comparing these scores on a weighted network in the future since it would take that into consideration. For now, I’m going to compare only the Eigenvector and Spinglass since those are the two community models that produced actual divisions.
#compare community partition modularity scores
modularity(eigen_gd)
[1] 0.4516602
#compare community partition modularity scores
modularity(spin_gd)
[1] 0.04933856
#compare community partitions using variation of information
compare(eigen_gd,spin_gd,method="vi")
[1] 0.8922415
#compare community partitions
compare(eigen_gd,spin_gd,method="nmi")
[1] 0.6568228
#compare community partitions
compare(eigen_gd,spin_gd,method="split.join")
[1] 11
#compare community partitions
compare(eigen_gd,spin_gd,method="rand")
[1] 0.81
#compare community partitions
compare(eigen_gd,spin_gd,method="adjusted.rand")
[1] 0.5103093
One method I did not explore in the semester assignment was the Louvain clustering method. This method gives 7 communities that are somewhat unexpected, but give an interesting and valid perspective on the membership. The modularity of this method is 0.455, just above the modularity score of the eigenvector community modularity and very competitive as an option.
louvain <- cluster_louvain(gd_igraph)
#collect modularity scores to compare
mods<-c(mods, louvain=modularity(louvain))
#Inspect community membership
igraph::groups(spin_gd)
$`1`
[1] "Frank Guida" "Dave Parker" "Pigpen" "Joe Royster"
$`2`
[1] "Andrew Charles" "Ned Lagin" "Phil Lesh"
[4] "Peter Monk" "Brent Mydland" "Robert Petersen"
$`3`
[1] "John Dawson" "Jerry Garcia" "Donna Godchaux"
[4] "Keith Godchaux" "Mickey Hart" "Robert Hunter"
[7] "Bill Kreutzmann"
$`4`
[1] "Eric Andersen" "John Barlow" "Bob Bralove" "Willie Dixon"
[5] "Gerrit Graham" "Rob Wasserman" "Bob Weir" "Vince Welnick"
visualizing the louvain community
On inspection of the community structures created by each algorithm, I felt that the louvain community best represented the network. The modularity scores of the communities confirm that this is a reasonable observation. The higher modularity scores represent divisions with dense edge connections between the vertices within a community but sparse connections between vertices in different communities.
A better visualization of this network with community membership
colrs <- adjustcolor( c("gray50", "tomato", "gold", "yellowgreen", "cornflowerblue", "orange"), alpha=.6)
plot(gd_igraph, vertex.color=colrs[V(gd_igraph)$community])
I am going to create a data frame to save with the modularity scores
gd_modularity <- c(mods)
gd_modularity <- as.data.frame(gd_modularity)
write.csv(gd_modularity, file = "gd_modularity.csv")
Now that I have a community membership model that makes sense, I’ll save it in the node data frames.
membership <- louvain$membership
membership
[1] 1 1 2 3 4 2 4 5 5 1 6 5 4 3 3 3 3 1 3 3 3 6 2 1 2
#adding to the gd_nodes
gd_ig_nodes$louvain<-membership
gd_stat_nodes$louvain<-membership
Finding the k-core where every node has degree of at least “k”. I will add this so my node data frames before finishing up so I have that basic measure for furter analysis.
kc <- coreness(gd_igraph, mode="all")
kc
Eric Andersen John Barlow Bob Bralove Andrew Charles
3 73 12 3
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
4 4 264 12
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
15 3 4 34
Robert Hunter Bill Kreutzmann Ned Lagin Phil Lesh
264 92 3 110
Peter Monk Brent Mydland Dave Parker Robert Petersen
3 40 7 13
Pigpen Joe Royster Rob Wasserman Bob Weir
92 4 9 135
Vince Welnick
12
#adding to the gd_nodes
gd_ig_nodes$kcore<-kc
gd_stat_nodes$kcore<-kc
For attribution, please cite this work as
Becvar (2022, April 25). Grateful Network: Network Differences and Advanced Analysis. Retrieved from http://gratefulnetwork.live/posts/igraph-analysis/
BibTeX citation
@misc{becvar2022network,
author = {Becvar, Kristina},
title = {Grateful Network: Network Differences and Advanced Analysis},
url = {http://gratefulnetwork.live/posts/igraph-analysis/},
year = {2022}
}