Further Analysis

statistics quantitative data analysis final project IT Army of Ukraine

DACSS 603 Final Project Work: “Further Analysis”

Kristina Becvar
2022-04-26

Reading in Data

Continuing this post by reading in the data I put together in my exploratory analysis. A full accounting of the variables and descriptions are in the “About” tab of this GitHub Page.

Show code
#all IVS data
ivs_clean <- read.csv("ivs-df-clean.csv")
ivs_clean <- as_tibble(ivs_clean)
names(ivs_clean)[1] <- 'country'
head(ivs_clean)
# A tibble: 6 x 72
  country weight imp_family imp_friends imp_leisure imp_politics
  <chr>    <dbl>      <int>       <int>       <int>        <int>
1 Albania  0.697          2           1           2            3
2 Albania  0.697          1           1           4            4
3 Albania  0.697          1           2           2            4
4 Albania  0.697          1           2           2            4
5 Albania  0.697          1           1           2            4
6 Albania  0.697          1           3           3            4
# ... with 66 more variables: imp_work <int>, imp_religion <int>,
#   sat_happiness <int>, sat_health <int>, sat_life <int>,
#   sat_control <int>, willingness_fight <int>,
#   interest_politics <int>, prop_petition <int>,
#   prop_boycotts <int>, prop_demonstrations <int>,
#   prop_strikes <int>, self_position <int>, conf_churches <int>,
#   conf_armed <int>, conf_press <int>, conf_unions <int>, ...

Saving other data for potential review

Show code
ivs_right <- ivs_clean %>%
  group_by(country) %>%
  select(country, right_surveillance, right_monitor, right_collect)
Show code
#integrated data
all_data <- read.csv("integrated_data.csv")
all_data <- as_tibble(all_data)
names(all_data)[1] <- 'country'
head(all_data)
# A tibble: 6 x 23
  country   population region    users family friends leisure politics
  <chr>     <chr>      <chr>     <int>  <dbl>   <dbl>   <dbl>    <dbl>
1 Albania   3,088,385  Southern~    57   1.02    1.73    2.01     3.30
2 Andorra   85,645     Southern~     6   1.12    1.54    1.42     2.94
3 Argentina 45,864,941 South Am~    11   1.09    1.54    1.81     2.81
4 Armenia   3,011,609  Western ~    16   1.11    1.74    1.99     2.79
5 Australia 25,809,973 Oceania     717   1.11    1.48    1.65     2.41
6 Austria   8,884,864  Western ~  3276   1.20    1.45    1.63     2.51
# ... with 15 more variables: work <dbl>, religion <dbl>,
#   willingness <dbl>, petition <dbl>, boycott <dbl>,
#   demonstration <dbl>, strikes <dbl>, identity <dbl>,
#   marital <dbl>, parents <dbl>, children <dbl>, household <dbl>,
#   education <dbl>, income <dbl>, weights <dbl>

Multiple Linear Regression

Importance to Respondents

Show code
#Linear regression of "importance" variables
mlm1 <- lm(users ~ family + friends + leisure + politics + work + religion, data = all_data, na.action = na.exclude)
summary(mlm1)

Call:
lm(formula = users ~ family + friends + leisure + politics + 
    work + religion, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1678.1  -610.4  -342.4   130.6 11437.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -405.6     5035.6  -0.081    0.936
family        3368.6     4464.5   0.755    0.453
friends       -730.0     1476.1  -0.495    0.623
leisure        153.5     1244.6   0.123    0.902
politics     -1405.8      877.5  -1.602    0.114
work          1052.6     1416.3   0.743    0.460
religion       168.8      512.1   0.330    0.743

Residual standard error: 1847 on 60 degrees of freedom
Multiple R-squared:  0.1275,    Adjusted R-squared:  0.04025 
F-statistic: 1.461 on 6 and 60 DF,  p-value: 0.2069

Removing the largest p-value first:

Show code
#Linear regression of "importance" variables
mlm1b <- lm(users ~ family + friends + politics + work + religion, data = all_data, na.action = na.exclude)
summary(mlm1b)

Call:
lm(formula = users ~ family + friends + politics + work + religion, 
    data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1698.1  -614.1  -341.5   117.6 11443.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -164.8     4604.1  -0.036    0.972
family        3289.9     4382.8   0.751    0.456
friends       -656.9     1340.8  -0.490    0.626
politics     -1407.4      870.3  -1.617    0.111
work          1068.7     1398.8   0.764    0.448
religion       157.4      499.5   0.315    0.754

Residual standard error: 1832 on 61 degrees of freedom
Multiple R-squared:  0.1273,    Adjusted R-squared:  0.05574 
F-statistic: 1.779 on 5 and 61 DF,  p-value: 0.1305

Removing the next largest p-value:

Show code
#Linear regression of "importance" variables
mlm1c <- lm(users ~ family + friends + politics + work, data = all_data, na.action = na.exclude)
summary(mlm1c)

Call:
lm(formula = users ~ family + friends + politics + work, data = all_data, 
    na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1738.3  -621.8  -293.5    60.6 11473.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -416.8     4501.0  -0.093    0.927
family        3689.2     4165.0   0.886    0.379
friends       -745.2     1301.6  -0.573    0.569
politics     -1407.4      863.9  -1.629    0.108
work          1263.9     1245.1   1.015    0.314

Residual standard error: 1819 on 62 degrees of freedom
Multiple R-squared:  0.1259,    Adjusted R-squared:  0.06946 
F-statistic: 2.232 on 4 and 62 DF,  p-value: 0.07575

Removing the next largest p-value:

Show code
#Linear regression of "importance" variables
mlm1d <- lm(users ~ family + politics + work, data = all_data, na.action = na.exclude)
summary(mlm1d)

Call:
lm(formula = users ~ family + politics + work, data = all_data, 
    na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1744.5  -645.5  -276.6    63.3 11525.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -970.1     4372.6  -0.222   0.8251  
family        3039.9     3986.2   0.763   0.4485  
politics     -1512.6      839.7  -1.801   0.0764 .
work          1474.1     1183.4   1.246   0.2175  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1809 on 63 degrees of freedom
Multiple R-squared:  0.1212,    Adjusted R-squared:  0.07939 
F-statistic: 2.897 on 3 and 63 DF,  p-value: 0.04195

Removing the next largest p-value:

Show code
#Linear regression of "importance" variables
mlm1e <- lm(users ~ politics + work, data = all_data, na.action = na.exclude)
summary(mlm1e)

Call:
lm(formula = users ~ politics + work, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1775.5  -704.3  -219.2    45.5 11535.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1463.3     2979.8   0.491   0.6250  
politics     -1417.5      827.6  -1.713   0.0916 .
work          1927.1     1020.2   1.889   0.0634 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1803 on 64 degrees of freedom
Multiple R-squared:  0.1131,    Adjusted R-squared:  0.08541 
F-statistic: 4.082 on 2 and 64 DF,  p-value: 0.02146

Political Inclinations of Respondents

Show code
#Linear regression of "politics" variables
mlm2 <- lm(users ~ willingness + petition + boycott + demonstration + strikes + identity, data = all_data, na.action = na.exclude)
summary(mlm2)

Call:
lm(formula = users ~ willingness + petition + boycott + demonstration + 
    strikes + identity, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-2609.2  -623.3  -264.6   -26.5 10359.4 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)    -212.45    5053.29  -0.042    0.967  
willingness    -687.64    1753.88  -0.392    0.697  
petition      -1328.59    1274.62  -1.042    0.302  
boycott        1454.15    1765.01   0.824    0.414  
demonstration -3373.22    1984.10  -1.700    0.095 .
strikes        3377.05    1452.75   2.325    0.024 *
identity         31.85     588.76   0.054    0.957  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1893 on 53 degrees of freedom
  (7 observations deleted due to missingness)
Multiple R-squared:  0.1779,    Adjusted R-squared:  0.08481 
F-statistic: 1.911 on 6 and 53 DF,  p-value: 0.09602

Removing the highest p-value

Show code
#Linear regression of "politics" variables
mlm2b <- lm(users ~ willingness + petition + boycott + demonstration + strikes, data = all_data, na.action = na.exclude)
summary(mlm2b)

Call:
lm(formula = users ~ willingness + petition + boycott + demonstration + 
    strikes, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-2569.1  -609.5  -241.1    20.6 10478.8 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     -967.7     3878.5  -0.250   0.8038  
willingness     -435.5     1560.6  -0.279   0.7811  
petition       -1366.1     1175.7  -1.162   0.2498  
boycott         1371.3     1630.5   0.841   0.4036  
demonstration  -2960.9     1719.6  -1.722   0.0902 .
strikes         3327.3     1347.2   2.470   0.0163 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1770 on 61 degrees of freedom
Multiple R-squared:  0.1856,    Adjusted R-squared:  0.1189 
F-statistic: 2.781 on 5 and 61 DF,  p-value: 0.02506

Removing the next highest p-value

Show code
#Linear regression of "politics" variables
mlm2c <- lm(users ~ petition + boycott + demonstration + strikes, data = all_data, na.action = na.exclude)
summary(mlm2c)

Call:
lm(formula = users ~ petition + boycott + demonstration + strikes, 
    data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-2700.7  -641.6  -283.2    57.8 10493.7 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)      -1602       3119  -0.514   0.6092  
petition         -1498       1069  -1.401   0.1661  
boycott           1500       1552   0.966   0.3376  
demonstration    -2954       1707  -1.731   0.0884 .
strikes           3266       1319   2.476   0.0160 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1756 on 62 degrees of freedom
Multiple R-squared:  0.1846,    Adjusted R-squared:  0.132 
F-statistic: 3.509 on 4 and 62 DF,  p-value: 0.01202

Removing the next highest p-value

Show code
#Linear regression of "politics" variables
mlm2d <- lm(users ~ petition + demonstration + strikes, data = all_data, na.action = na.exclude)
summary(mlm2d)

Call:
lm(formula = users ~ petition + demonstration + strikes, data = all_data, 
    na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-2590.3  -688.4  -235.1   113.7 10732.1 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     -80.13    2690.23  -0.030   0.9763  
petition       -870.42     848.77  -1.026   0.3090  
demonstration -2611.25    1668.38  -1.565   0.1226  
strikes        3321.60    1317.16   2.522   0.0142 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1755 on 63 degrees of freedom
Multiple R-squared:  0.1723,    Adjusted R-squared:  0.1329 
F-statistic: 4.372 on 3 and 63 DF,  p-value: 0.007358

Removing the next highest p-value

Show code
#Linear regression of "politics" variables
mlm2e <- lm(users ~ demonstration + strikes, data = all_data, na.action = na.exclude)
summary(mlm2e)

Call:
lm(formula = users ~ demonstration + strikes, data = all_data, 
    na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-2615.7  -689.7  -272.3    99.1 10722.4 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)      815.7     2545.5   0.320 0.749678    
demonstration  -3877.1     1122.9  -3.453 0.000989 ***
strikes         3417.7     1314.3   2.600 0.011554 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1756 on 64 degrees of freedom
Multiple R-squared:  0.1585,    Adjusted R-squared:  0.1322 
F-statistic: 6.027 on 2 and 64 DF,  p-value: 0.003997

Looking at only “demonstration”

Show code
#Linear regression of "politics" variables
mlm2f <- lm(users ~ demonstration, data = all_data, na.action = na.exclude)
summary(mlm2f)

Call:
lm(formula = users ~ demonstration, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1266.7  -766.3  -367.9    54.0 11672.1 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     5120.7     2017.4   2.538   0.0135 *
demonstration  -1906.5      864.6  -2.205   0.0310 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1832 on 65 degrees of freedom
Multiple R-squared:  0.0696,    Adjusted R-squared:  0.05528 
F-statistic: 4.862 on 1 and 65 DF,  p-value: 0.031

Plotting this model

Show code
plot(mlm2f)

Looking at only “strikes”

Show code
#Linear regression of "politics" variables
mlm2g <- lm(users ~ strikes, data = all_data, na.action = na.exclude)
summary(mlm2g)

Call:
lm(formula = users ~ strikes, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
 -827.2  -676.0  -592.6  -182.7 12475.0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -222.8     2731.8  -0.082    0.935
strikes        355.3     1048.2   0.339    0.736

Residual standard error: 1898 on 65 degrees of freedom
Multiple R-squared:  0.001764,  Adjusted R-squared:  -0.01359 
F-statistic: 0.1149 on 1 and 65 DF,  p-value: 0.7358

Plotting the best model

Show code
plot(mlm2e)

Demographics of Respondents

Show code
#Linear regression of "demographics" variables
mlm3 <- lm(users ~ marital + parents + children + household + education + income, data = all_data, na.action = na.exclude)
summary(mlm3)

Call:
lm(formula = users ~ marital + parents + children + household + 
    education + income, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1678.0  -748.7  -399.7   136.5 11849.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3333.9     4885.7   0.682    0.498
marital       -437.5      692.4  -0.632    0.530
parents      -2125.9     2979.5  -0.714    0.478
children      -182.7      905.7  -0.202    0.841
household     -125.1      798.5  -0.157    0.876
education      534.3      405.4   1.318    0.193
income        -111.3      482.0  -0.231    0.818

Residual standard error: 1863 on 59 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1261,    Adjusted R-squared:  0.03726 
F-statistic: 1.419 on 6 and 59 DF,  p-value: 0.2226

Remove highest p-value first

Show code
#Linear regression of "demographics" variables
mlm3b <- lm(users ~ marital + parents + children + education + income, data = all_data, na.action = na.exclude)
summary(mlm3b)

Call:
lm(formula = users ~ marital + parents + children + education + 
    income, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1705.3  -732.8  -371.8   160.1 11859.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3484.2     4751.5   0.733    0.466
marital       -431.2      685.6  -0.629    0.532
parents      -2522.6     1558.7  -1.618    0.111
children      -259.1      757.3  -0.342    0.733
education      562.5      360.3   1.561    0.124
income        -126.4      468.5  -0.270    0.788

Residual standard error: 1847 on 60 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1258,    Adjusted R-squared:  0.05291 
F-statistic: 1.726 on 5 and 60 DF,  p-value: 0.1424

Remove next highest p-value

Show code
#Linear regression of "demographics" variables
mlm3b <- lm(users ~ marital + parents + education + income, data = all_data, na.action = na.exclude)
summary(mlm3b)

Call:
lm(formula = users ~ marital + parents + education + income, 
    data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1687.4  -785.4  -314.5   132.6 11881.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2784.6     4257.6   0.654   0.5156  
marital       -363.3      651.5  -0.558   0.5791  
parents      -2579.5     1538.6  -1.677   0.0988 .
education      589.4      349.0   1.689   0.0964 .
income        -118.2      464.5  -0.255   0.7999  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1834 on 61 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1241,    Adjusted R-squared:  0.06662 
F-statistic:  2.16 on 4 and 61 DF,  p-value: 0.08419

Remove next highest p-value

Show code
#Linear regression of "demographics" variables
mlm3c <- lm(users ~ marital + parents + education, data = all_data, na.action = na.exclude)
summary(mlm3c)

Call:
lm(formula = users ~ marital + parents + education, data = all_data, 
    na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1674.3  -777.4  -306.5    72.5 11854.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2201.8     3528.0   0.624   0.5348  
marital       -364.8      639.4  -0.570   0.5704  
parents      -2479.7     1455.2  -1.704   0.0933 .
education      568.9      319.1   1.783   0.0794 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1806 on 63 degrees of freedom
Multiple R-squared:  0.1244,    Adjusted R-squared:  0.08268 
F-statistic: 2.983 on 3 and 63 DF,  p-value: 0.03786

Remove next highest p-value

Show code
#Linear regression of "demographics" variables
mlm3d <- lm(users ~ parents + education, data = all_data, na.action = na.exclude)
summary(mlm3d)

Call:
lm(formula = users ~ parents + education, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1564.6  -760.7  -299.0    92.3 11920.8 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    874.7     2638.4   0.332   0.7413  
parents      -2186.2     1354.0  -1.615   0.1113  
education      559.7      317.0   1.766   0.0822 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1796 on 64 degrees of freedom
Multiple R-squared:  0.1199,    Adjusted R-squared:  0.09235 
F-statistic: 4.358 on 2 and 64 DF,  p-value: 0.01682

Users to Politics and Education

I’ll look at a model that uses combinations of variables from demographics as well as political variables.

Show code
#Linear regression of "demographics" variables
mlm4 <- lm(users ~  politics + education, data = all_data, na.action = na.exclude)
summary(mlm4)

Call:
lm(formula = users ~ politics + education, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1586.2  -754.7  -274.5   122.4 11471.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1724.5     2634.3   0.655   0.5151  
politics     -1595.8      801.2  -1.992   0.0507 .
education      691.4      295.5   2.340   0.0224 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1778 on 64 degrees of freedom
Multiple R-squared:  0.1375,    Adjusted R-squared:  0.1105 
F-statistic:   5.1 on 2 and 64 DF,  p-value: 0.008809

Users to Strike Propensity and Education

I’ll look at a model that uses combinations of variables from demographics as well as political variables.

Show code
#Linear regression of "demographics" variables
mlm5 <- lm(users ~ strikes + education, data = all_data, na.action = na.exclude)
summary(mlm5)

Call:
lm(formula = users ~ strikes + education, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1746.0  -732.5  -386.1   192.2 12134.9 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -4479.7     3138.2  -1.427   0.1583  
strikes        635.5     1015.2   0.626   0.5336  
education      756.8      304.6   2.485   0.0156 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1827 on 64 degrees of freedom
Multiple R-squared:  0.08957,   Adjusted R-squared:  0.06112 
F-statistic: 3.148 on 2 and 64 DF,  p-value: 0.04964

One more combination to try and understand if different variables can create a better model

Show code
#Linear regression of "politics" variables
mlm7 <- lm(users ~ demonstration + strikes + education, data = all_data, na.action = na.exclude)
summary(mlm7)

Call:
lm(formula = users ~ demonstration + strikes + education, data = all_data, 
    na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-2708.0  -701.5  -351.0   210.3 10702.2 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)    -2395.9     3043.2  -0.787  0.43406   
demonstration  -3377.8     1134.9  -2.976  0.00414 **
strikes         3225.9     1294.4   2.492  0.01534 * 
education        547.2      296.0   1.849  0.06918 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1724 on 63 degrees of freedom
Multiple R-squared:  0.2018,    Adjusted R-squared:  0.1638 
F-statistic: 5.309 on 3 and 63 DF,  p-value: 0.002512
Show code
par(mfrow = c(1,1))
plot(mlm7, 1:6)

Simple Linear Regression

Potential Correlation with Strike Propensity

I am going to look further at the potential correlation between countries with a propensity to engage in strikes and engage in DDOS attacks.

Show code
#Linear regression of "politics" variable "strikes"
lm_strikes <- lm(users ~ strikes, data = all_data, na.action = na.exclude)
summary(lm_strikes)

Call:
lm(formula = users ~ strikes, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
 -827.2  -676.0  -592.6  -182.7 12475.0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -222.8     2731.8  -0.082    0.935
strikes        355.3     1048.2   0.339    0.736

Residual standard error: 1898 on 65 degrees of freedom
Multiple R-squared:  0.001764,  Adjusted R-squared:  -0.01359 
F-statistic: 0.1149 on 1 and 65 DF,  p-value: 0.7358

Potential Correlation with Education Level

I am going to look further at the potential correlation between education level with a propensity to engage in strikes and engage in DDOS attacks.

Show code
#Linear regression of "educational level" and "users"
lm_education <- lm(users ~ education, data = all_data, na.action = na.exclude)
summary(lm_education)

Call:
lm(formula = users ~ education, data = all_data, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-1533.6  -810.1  -400.2   154.9 12164.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -2730.8     1422.6  -1.920   0.0593 .
education      735.6      301.3   2.441   0.0174 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1818 on 65 degrees of freedom
Multiple R-squared:  0.084, Adjusted R-squared:  0.06991 
F-statistic: 5.961 on 1 and 65 DF,  p-value: 0.01736

It may be that there are no correlations to be found here. But I want to run the analyses again using a column found in the codebook indicating ‘weights’ for population that should be used when comparing multiple variables to account for population size.

Adding Population Weights

Importance to Respondents

I’ll re-run the model with users and importance variables, but using the weights column. This gives me a warning that this is an “essentially perfect fit” and that the summary may be reliable. This result is consistent for each of the weighted models.

Show code
#Linear regression of "importance" variables + weighted   
mlm1w <- lm(users ~ family + friends + leisure + politics + work + religion, data = all_data, na.action = na.exclude, weights)
summary(mlm1w)

Call:
lm(formula = users ~ family + friends + leisure + politics + 
    work + religion, data = all_data, subset = weights, na.action = na.exclude)

Residuals:
         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (6 not defined because of singularities)
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
family             NA         NA        NA       NA    
friends            NA         NA        NA       NA    
leisure            NA         NA        NA       NA    
politics           NA         NA        NA       NA    
work               NA         NA        NA       NA    
religion           NA         NA        NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

Political Inclinations of Respondents

Show code
#Linear regression of "politics" variables + weighted
mlm2w <- lm(users ~ willingness + petition + boycott + demonstration + strikes + identity, data = all_data, na.action = na.exclude, weights)
summary(mlm2w)

Call:
lm(formula = users ~ willingness + petition + boycott + demonstration + 
    strikes + identity, data = all_data, subset = weights, na.action = na.exclude)

Residuals:
         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (6 not defined because of singularities)
               Estimate Std. Error   t value Pr(>|t|)    
(Intercept)   5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
willingness          NA         NA        NA       NA    
petition             NA         NA        NA       NA    
boycott              NA         NA        NA       NA    
demonstration        NA         NA        NA       NA    
strikes              NA         NA        NA       NA    
identity             NA         NA        NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

Demographics of Respondents

Show code
#Linear regression of "demographics" variables + weighted
mlm3w <- lm(users ~ marital + parents + children + household + education + income, data = all_data, na.action = na.exclude, weights)
summary(mlm3w)

Call:
lm(formula = users ~ marital + parents + children + household + 
    education + income, data = all_data, subset = weights, na.action = na.exclude)

Residuals:
         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (6 not defined because of singularities)
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
marital            NA         NA        NA       NA    
parents            NA         NA        NA       NA    
children           NA         NA        NA       NA    
household          NA         NA        NA       NA    
education          NA         NA        NA       NA    
income             NA         NA        NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

The best model I generated with the weights function added

Show code
#Linear regression of "politics" variables
mlm7b <- lm(users ~ demonstration + strikes + education, data = all_data, na.action = na.exclude, weights)
summary(mlm7b)

Call:
lm(formula = users ~ demonstration + strikes + education, data = all_data, 
    subset = weights, na.action = na.exclude)

Residuals:
         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (3 not defined because of singularities)
               Estimate Std. Error   t value Pr(>|t|)    
(Intercept)   5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
demonstration        NA         NA        NA       NA    
strikes              NA         NA        NA       NA    
education            NA         NA        NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

Government Right to Access

Show code
ivs_right <- read.csv("government_rights.csv")
names(ivs_right)[1] <- 'country'
head(ivs_right)
    country surveillance  monitor  collect users
1   Albania     2.443206 3.027178 2.951220    57
2   Andorra     2.261952 3.588645 3.569721     6
3 Argentina     2.499501 3.044865 2.775673    11
4   Armenia     2.359333 2.657333 2.552000    16
5 Australia     1.787645 2.877551 2.896856   717
6   Austria     2.495134 3.085158 3.333333  3276

Exploratory Model

Show code
right <- lm(users ~ surveillance + monitor + collect, data = ivs_right)
summary(right)

Call:
lm(formula = users ~ surveillance + monitor + collect, data = ivs_right)

Residuals:
    Min      1Q  Median      3Q     Max 
-1285.4  -692.4  -360.4    10.9 11564.8 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)    -788.8     1407.0  -0.561    0.577
surveillance   -992.7      847.5  -1.171    0.246
monitor       -1297.9     1726.1  -0.752    0.455
collect        2516.6     1533.5   1.641    0.106

Residual standard error: 1855 on 63 degrees of freedom
Multiple R-squared:  0.0762,    Adjusted R-squared:  0.0322 
F-statistic: 1.732 on 3 and 63 DF,  p-value: 0.1695

removing highest p-value

Show code
right2 <- lm(users ~ surveillance + collect, data = ivs_right)
summary(right2)

Call:
lm(formula = users ~ surveillance + collect, data = ivs_right)

Residuals:
    Min      1Q  Median      3Q     Max 
-1537.0  -711.1  -431.3    10.0 11813.0 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)    -876.5     1397.4  -0.627   0.5327  
surveillance  -1278.4      755.1  -1.693   0.0953 .
collect        1487.9      690.5   2.155   0.0350 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1848 on 64 degrees of freedom
Multiple R-squared:  0.0679,    Adjusted R-squared:  0.03878 
F-statistic: 2.331 on 2 and 64 DF,  p-value: 0.1054

removing highest p-value

Show code
right3 <- lm(users ~ collect, data = ivs_right)
summary(right3)

Call:
lm(formula = users ~ collect, data = ivs_right)

Residuals:
    Min      1Q  Median      3Q     Max 
-1070.4  -707.3  -573.1  -203.9 12247.4 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1136.8     1408.7  -0.807    0.423
collect        620.0      469.2   1.321    0.191

Residual standard error: 1875 on 65 degrees of freedom
Multiple R-squared:  0.02616,   Adjusted R-squared:  0.01117 
F-statistic: 1.746 on 1 and 65 DF,  p-value: 0.191

Citation

For attribution, please cite this work as

Becvar (2022, April 26). IT Army: Further Analysis. Retrieved from https://kbec19.github.io/it-army/posts/further-analysis/

BibTeX citation

@misc{becvar2022further,
  author = {Becvar, Kristina},
  title = {IT Army: Further Analysis},
  url = {https://kbec19.github.io/it-army/posts/further-analysis/},
  year = {2022}
}