statistics quantitative data analysis final project IT Army of Ukraine

DACSS 603 Final Project Work: “Further Analysis”

Kristina Becvar

Reading in Data

Continuing this post by reading in the data I put together in my exploratory analysis. A full accounting of the variables and descriptions are in the “About” tab of this GitHub Page.

Show code
#all IVS data
ivs_clean <- read.csv("ivs-df-clean.csv")
ivs_clean <- as_tibble(ivs_clean)
names(ivs_clean)[1] <- 'country'
# A tibble: 6 x 72
  country weight imp_family imp_friends imp_leisure imp_politics
  <chr>    <dbl>      <int>       <int>       <int>        <int>
1 Albania  0.697          2           1           2            3
2 Albania  0.697          1           1           4            4
3 Albania  0.697          1           2           2            4
4 Albania  0.697          1           2           2            4
5 Albania  0.697          1           1           2            4
6 Albania  0.697          1           3           3            4
# ... with 66 more variables: imp_work <int>, imp_religion <int>,
#   sat_happiness <int>, sat_health <int>, sat_life <int>,
#   sat_control <int>, willingness_fight <int>,
#   interest_politics <int>, prop_petition <int>,
#   prop_boycotts <int>, prop_demonstrations <int>,
#   prop_strikes <int>, self_position <int>, conf_churches <int>,
#   conf_armed <int>, conf_press <int>, conf_unions <int>, ...

Saving other data for potential review

Show code
ivs_right <- ivs_clean %>%
  group_by(country) %>%
  select(country, right_surveillance, right_monitor, right_collect)
Show code
#integrated data
all_data <- read.csv("integrated_data.csv")
all_data <- as_tibble(all_data)
names(all_data)[1] <- 'country'
# A tibble: 6 x 23
  country   population region    users family friends leisure politics
  <chr>     <chr>      <chr>     <int>  <dbl>   <dbl>   <dbl>    <dbl>
1 Albania   3,088,385  Southern~    57   1.02    1.73    2.01     3.30
2 Andorra   85,645     Southern~     6   1.12    1.54    1.42     2.94
3 Argentina 45,864,941 South Am~    11   1.09    1.54    1.81     2.81
4 Armenia   3,011,609  Western ~    16   1.11    1.74    1.99     2.79
5 Australia 25,809,973 Oceania     717   1.11    1.48    1.65     2.41
6 Austria   8,884,864  Western ~  3276   1.20    1.45    1.63     2.51
# ... with 15 more variables: work <dbl>, religion <dbl>,
#   willingness <dbl>, petition <dbl>, boycott <dbl>,
#   demonstration <dbl>, strikes <dbl>, identity <dbl>,
#   marital <dbl>, parents <dbl>, children <dbl>, household <dbl>,
#   education <dbl>, income <dbl>, weights <dbl>

Multiple Linear Regression

Importance to Respondents

Show code
#Linear regression of "importance" variables
mlm1 <- lm(users ~ family + friends + leisure + politics + work + religion, data = all_data, na.action = na.exclude)

lm(formula = users ~ family + friends + leisure + politics + 
    work + religion, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1678.1  -610.4  -342.4   130.6 11437.5 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -405.6     5035.6  -0.081    0.936
family        3368.6     4464.5   0.755    0.453
friends       -730.0     1476.1  -0.495    0.623
leisure        153.5     1244.6   0.123    0.902
politics     -1405.8      877.5  -1.602    0.114
work          1052.6     1416.3   0.743    0.460
religion       168.8      512.1   0.330    0.743

Residual standard error: 1847 on 60 degrees of freedom
Multiple R-squared:  0.1275,    Adjusted R-squared:  0.04025 
F-statistic: 1.461 on 6 and 60 DF,  p-value: 0.2069

Removing the largest p-value first:

Show code
#Linear regression of "importance" variables
mlm1b <- lm(users ~ family + friends + politics + work + religion, data = all_data, na.action = na.exclude)

lm(formula = users ~ family + friends + politics + work + religion, 
    data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1698.1  -614.1  -341.5   117.6 11443.6 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -164.8     4604.1  -0.036    0.972
family        3289.9     4382.8   0.751    0.456
friends       -656.9     1340.8  -0.490    0.626
politics     -1407.4      870.3  -1.617    0.111
work          1068.7     1398.8   0.764    0.448
religion       157.4      499.5   0.315    0.754

Residual standard error: 1832 on 61 degrees of freedom
Multiple R-squared:  0.1273,    Adjusted R-squared:  0.05574 
F-statistic: 1.779 on 5 and 61 DF,  p-value: 0.1305

Removing the next largest p-value:

Show code
#Linear regression of "importance" variables
mlm1c <- lm(users ~ family + friends + politics + work, data = all_data, na.action = na.exclude)

lm(formula = users ~ family + friends + politics + work, data = all_data, 
    na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1738.3  -621.8  -293.5    60.6 11473.1 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -416.8     4501.0  -0.093    0.927
family        3689.2     4165.0   0.886    0.379
friends       -745.2     1301.6  -0.573    0.569
politics     -1407.4      863.9  -1.629    0.108
work          1263.9     1245.1   1.015    0.314

Residual standard error: 1819 on 62 degrees of freedom
Multiple R-squared:  0.1259,    Adjusted R-squared:  0.06946 
F-statistic: 2.232 on 4 and 62 DF,  p-value: 0.07575

Removing the next largest p-value:

Show code
#Linear regression of "importance" variables
mlm1d <- lm(users ~ family + politics + work, data = all_data, na.action = na.exclude)

lm(formula = users ~ family + politics + work, data = all_data, 
    na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1744.5  -645.5  -276.6    63.3 11525.3 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -970.1     4372.6  -0.222   0.8251  
family        3039.9     3986.2   0.763   0.4485  
politics     -1512.6      839.7  -1.801   0.0764 .
work          1474.1     1183.4   1.246   0.2175  
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1809 on 63 degrees of freedom
Multiple R-squared:  0.1212,    Adjusted R-squared:  0.07939 
F-statistic: 2.897 on 3 and 63 DF,  p-value: 0.04195

Removing the next largest p-value:

Show code
#Linear regression of "importance" variables
mlm1e <- lm(users ~ politics + work, data = all_data, na.action = na.exclude)

lm(formula = users ~ politics + work, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1775.5  -704.3  -219.2    45.5 11535.7 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1463.3     2979.8   0.491   0.6250  
politics     -1417.5      827.6  -1.713   0.0916 .
work          1927.1     1020.2   1.889   0.0634 .
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1803 on 64 degrees of freedom
Multiple R-squared:  0.1131,    Adjusted R-squared:  0.08541 
F-statistic: 4.082 on 2 and 64 DF,  p-value: 0.02146

Political Inclinations of Respondents

Show code
#Linear regression of "politics" variables
mlm2 <- lm(users ~ willingness + petition + boycott + demonstration + strikes + identity, data = all_data, na.action = na.exclude)

lm(formula = users ~ willingness + petition + boycott + demonstration + 
    strikes + identity, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-2609.2  -623.3  -264.6   -26.5 10359.4 

              Estimate Std. Error t value Pr(>|t|)  
(Intercept)    -212.45    5053.29  -0.042    0.967  
willingness    -687.64    1753.88  -0.392    0.697  
petition      -1328.59    1274.62  -1.042    0.302  
boycott        1454.15    1765.01   0.824    0.414  
demonstration -3373.22    1984.10  -1.700    0.095 .
strikes        3377.05    1452.75   2.325    0.024 *
identity         31.85     588.76   0.054    0.957  
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1893 on 53 degrees of freedom
  (7 observations deleted due to missingness)
Multiple R-squared:  0.1779,    Adjusted R-squared:  0.08481 
F-statistic: 1.911 on 6 and 53 DF,  p-value: 0.09602

Removing the highest p-value

Show code
#Linear regression of "politics" variables
mlm2b <- lm(users ~ willingness + petition + boycott + demonstration + strikes, data = all_data, na.action = na.exclude)

lm(formula = users ~ willingness + petition + boycott + demonstration + 
    strikes, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-2569.1  -609.5  -241.1    20.6 10478.8 

              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     -967.7     3878.5  -0.250   0.8038  
willingness     -435.5     1560.6  -0.279   0.7811  
petition       -1366.1     1175.7  -1.162   0.2498  
boycott         1371.3     1630.5   0.841   0.4036  
demonstration  -2960.9     1719.6  -1.722   0.0902 .
strikes         3327.3     1347.2   2.470   0.0163 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1770 on 61 degrees of freedom
Multiple R-squared:  0.1856,    Adjusted R-squared:  0.1189 
F-statistic: 2.781 on 5 and 61 DF,  p-value: 0.02506

Removing the next highest p-value

Show code
#Linear regression of "politics" variables
mlm2c <- lm(users ~ petition + boycott + demonstration + strikes, data = all_data, na.action = na.exclude)

lm(formula = users ~ petition + boycott + demonstration + strikes, 
    data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-2700.7  -641.6  -283.2    57.8 10493.7 

              Estimate Std. Error t value Pr(>|t|)  
(Intercept)      -1602       3119  -0.514   0.6092  
petition         -1498       1069  -1.401   0.1661  
boycott           1500       1552   0.966   0.3376  
demonstration    -2954       1707  -1.731   0.0884 .
strikes           3266       1319   2.476   0.0160 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1756 on 62 degrees of freedom
Multiple R-squared:  0.1846,    Adjusted R-squared:  0.132 
F-statistic: 3.509 on 4 and 62 DF,  p-value: 0.01202

Removing the next highest p-value

Show code
#Linear regression of "politics" variables
mlm2d <- lm(users ~ petition + demonstration + strikes, data = all_data, na.action = na.exclude)

lm(formula = users ~ petition + demonstration + strikes, data = all_data, 
    na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-2590.3  -688.4  -235.1   113.7 10732.1 

              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     -80.13    2690.23  -0.030   0.9763  
petition       -870.42     848.77  -1.026   0.3090  
demonstration -2611.25    1668.38  -1.565   0.1226  
strikes        3321.60    1317.16   2.522   0.0142 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1755 on 63 degrees of freedom
Multiple R-squared:  0.1723,    Adjusted R-squared:  0.1329 
F-statistic: 4.372 on 3 and 63 DF,  p-value: 0.007358

Removing the next highest p-value

Show code
#Linear regression of "politics" variables
mlm2e <- lm(users ~ demonstration + strikes, data = all_data, na.action = na.exclude)

lm(formula = users ~ demonstration + strikes, data = all_data, 
    na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-2615.7  -689.7  -272.3    99.1 10722.4 

              Estimate Std. Error t value Pr(>|t|)    
(Intercept)      815.7     2545.5   0.320 0.749678    
demonstration  -3877.1     1122.9  -3.453 0.000989 ***
strikes         3417.7     1314.3   2.600 0.011554 *  
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1756 on 64 degrees of freedom
Multiple R-squared:  0.1585,    Adjusted R-squared:  0.1322 
F-statistic: 6.027 on 2 and 64 DF,  p-value: 0.003997

Looking at only “demonstration”

Show code
#Linear regression of "politics" variables
mlm2f <- lm(users ~ demonstration, data = all_data, na.action = na.exclude)

lm(formula = users ~ demonstration, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1266.7  -766.3  -367.9    54.0 11672.1 

              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     5120.7     2017.4   2.538   0.0135 *
demonstration  -1906.5      864.6  -2.205   0.0310 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1832 on 65 degrees of freedom
Multiple R-squared:  0.0696,    Adjusted R-squared:  0.05528 
F-statistic: 4.862 on 1 and 65 DF,  p-value: 0.031

Plotting this model

Show code

Looking at only “strikes”

Show code
#Linear regression of "politics" variables
mlm2g <- lm(users ~ strikes, data = all_data, na.action = na.exclude)

lm(formula = users ~ strikes, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
 -827.2  -676.0  -592.6  -182.7 12475.0 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -222.8     2731.8  -0.082    0.935
strikes        355.3     1048.2   0.339    0.736

Residual standard error: 1898 on 65 degrees of freedom
Multiple R-squared:  0.001764,  Adjusted R-squared:  -0.01359 
F-statistic: 0.1149 on 1 and 65 DF,  p-value: 0.7358

Plotting the best model

Show code

Demographics of Respondents

Show code
#Linear regression of "demographics" variables
mlm3 <- lm(users ~ marital + parents + children + household + education + income, data = all_data, na.action = na.exclude)

lm(formula = users ~ marital + parents + children + household + 
    education + income, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1678.0  -748.7  -399.7   136.5 11849.2 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3333.9     4885.7   0.682    0.498
marital       -437.5      692.4  -0.632    0.530
parents      -2125.9     2979.5  -0.714    0.478
children      -182.7      905.7  -0.202    0.841
household     -125.1      798.5  -0.157    0.876
education      534.3      405.4   1.318    0.193
income        -111.3      482.0  -0.231    0.818

Residual standard error: 1863 on 59 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1261,    Adjusted R-squared:  0.03726 
F-statistic: 1.419 on 6 and 59 DF,  p-value: 0.2226

Remove highest p-value first

Show code
#Linear regression of "demographics" variables
mlm3b <- lm(users ~ marital + parents + children + education + income, data = all_data, na.action = na.exclude)

lm(formula = users ~ marital + parents + children + education + 
    income, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1705.3  -732.8  -371.8   160.1 11859.3 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3484.2     4751.5   0.733    0.466
marital       -431.2      685.6  -0.629    0.532
parents      -2522.6     1558.7  -1.618    0.111
children      -259.1      757.3  -0.342    0.733
education      562.5      360.3   1.561    0.124
income        -126.4      468.5  -0.270    0.788

Residual standard error: 1847 on 60 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1258,    Adjusted R-squared:  0.05291 
F-statistic: 1.726 on 5 and 60 DF,  p-value: 0.1424

Remove next highest p-value

Show code
#Linear regression of "demographics" variables
mlm3b <- lm(users ~ marital + parents + education + income, data = all_data, na.action = na.exclude)

lm(formula = users ~ marital + parents + education + income, 
    data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1687.4  -785.4  -314.5   132.6 11881.5 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2784.6     4257.6   0.654   0.5156  
marital       -363.3      651.5  -0.558   0.5791  
parents      -2579.5     1538.6  -1.677   0.0988 .
education      589.4      349.0   1.689   0.0964 .
income        -118.2      464.5  -0.255   0.7999  
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1834 on 61 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1241,    Adjusted R-squared:  0.06662 
F-statistic:  2.16 on 4 and 61 DF,  p-value: 0.08419

Remove next highest p-value

Show code
#Linear regression of "demographics" variables
mlm3c <- lm(users ~ marital + parents + education, data = all_data, na.action = na.exclude)

lm(formula = users ~ marital + parents + education, data = all_data, 
    na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1674.3  -777.4  -306.5    72.5 11854.1 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2201.8     3528.0   0.624   0.5348  
marital       -364.8      639.4  -0.570   0.5704  
parents      -2479.7     1455.2  -1.704   0.0933 .
education      568.9      319.1   1.783   0.0794 .
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1806 on 63 degrees of freedom
Multiple R-squared:  0.1244,    Adjusted R-squared:  0.08268 
F-statistic: 2.983 on 3 and 63 DF,  p-value: 0.03786

Remove next highest p-value

Show code
#Linear regression of "demographics" variables
mlm3d <- lm(users ~ parents + education, data = all_data, na.action = na.exclude)

lm(formula = users ~ parents + education, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1564.6  -760.7  -299.0    92.3 11920.8 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    874.7     2638.4   0.332   0.7413  
parents      -2186.2     1354.0  -1.615   0.1113  
education      559.7      317.0   1.766   0.0822 .
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1796 on 64 degrees of freedom
Multiple R-squared:  0.1199,    Adjusted R-squared:  0.09235 
F-statistic: 4.358 on 2 and 64 DF,  p-value: 0.01682

Users to Politics and Education

I’ll look at a model that uses combinations of variables from demographics as well as political variables.

Show code
#Linear regression of "demographics" variables
mlm4 <- lm(users ~  politics + education, data = all_data, na.action = na.exclude)

lm(formula = users ~ politics + education, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1586.2  -754.7  -274.5   122.4 11471.3 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1724.5     2634.3   0.655   0.5151  
politics     -1595.8      801.2  -1.992   0.0507 .
education      691.4      295.5   2.340   0.0224 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1778 on 64 degrees of freedom
Multiple R-squared:  0.1375,    Adjusted R-squared:  0.1105 
F-statistic:   5.1 on 2 and 64 DF,  p-value: 0.008809

Users to Strike Propensity and Education

I’ll look at a model that uses combinations of variables from demographics as well as political variables.

Show code
#Linear regression of "demographics" variables
mlm5 <- lm(users ~ strikes + education, data = all_data, na.action = na.exclude)

lm(formula = users ~ strikes + education, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1746.0  -732.5  -386.1   192.2 12134.9 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -4479.7     3138.2  -1.427   0.1583  
strikes        635.5     1015.2   0.626   0.5336  
education      756.8      304.6   2.485   0.0156 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1827 on 64 degrees of freedom
Multiple R-squared:  0.08957,   Adjusted R-squared:  0.06112 
F-statistic: 3.148 on 2 and 64 DF,  p-value: 0.04964

One more combination to try and understand if different variables can create a better model

Show code
#Linear regression of "politics" variables
mlm7 <- lm(users ~ demonstration + strikes + education, data = all_data, na.action = na.exclude)

lm(formula = users ~ demonstration + strikes + education, data = all_data, 
    na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-2708.0  -701.5  -351.0   210.3 10702.2 

              Estimate Std. Error t value Pr(>|t|)   
(Intercept)    -2395.9     3043.2  -0.787  0.43406   
demonstration  -3377.8     1134.9  -2.976  0.00414 **
strikes         3225.9     1294.4   2.492  0.01534 * 
education        547.2      296.0   1.849  0.06918 . 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1724 on 63 degrees of freedom
Multiple R-squared:  0.2018,    Adjusted R-squared:  0.1638 
F-statistic: 5.309 on 3 and 63 DF,  p-value: 0.002512
Show code
par(mfrow = c(1,1))
plot(mlm7, 1:6)

Simple Linear Regression

Potential Correlation with Strike Propensity

I am going to look further at the potential correlation between countries with a propensity to engage in strikes and engage in DDOS attacks.

Show code
#Linear regression of "politics" variable "strikes"
lm_strikes <- lm(users ~ strikes, data = all_data, na.action = na.exclude)

lm(formula = users ~ strikes, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
 -827.2  -676.0  -592.6  -182.7 12475.0 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -222.8     2731.8  -0.082    0.935
strikes        355.3     1048.2   0.339    0.736

Residual standard error: 1898 on 65 degrees of freedom
Multiple R-squared:  0.001764,  Adjusted R-squared:  -0.01359 
F-statistic: 0.1149 on 1 and 65 DF,  p-value: 0.7358

Potential Correlation with Education Level

I am going to look further at the potential correlation between education level with a propensity to engage in strikes and engage in DDOS attacks.

Show code
#Linear regression of "educational level" and "users"
lm_education <- lm(users ~ education, data = all_data, na.action = na.exclude)

lm(formula = users ~ education, data = all_data, na.action = na.exclude)

    Min      1Q  Median      3Q     Max 
-1533.6  -810.1  -400.2   154.9 12164.3 

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -2730.8     1422.6  -1.920   0.0593 .
education      735.6      301.3   2.441   0.0174 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1818 on 65 degrees of freedom
Multiple R-squared:  0.084, Adjusted R-squared:  0.06991 
F-statistic: 5.961 on 1 and 65 DF,  p-value: 0.01736

It may be that there are no correlations to be found here. But I want to run the analyses again using a column found in the codebook indicating ‘weights’ for population that should be used when comparing multiple variables to account for population size.

Adding Population Weights

Importance to Respondents

I’ll re-run the model with users and importance variables, but using the weights column. This gives me a warning that this is an “essentially perfect fit” and that the summary may be reliable. This result is consistent for each of the weighted models.

Show code
#Linear regression of "importance" variables + weighted   
mlm1w <- lm(users ~ family + friends + leisure + politics + work + religion, data = all_data, na.action = na.exclude, weights)

lm(formula = users ~ family + friends + leisure + politics + 
    work + religion, data = all_data, subset = weights, na.action = na.exclude)

         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (6 not defined because of singularities)
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
family             NA         NA        NA       NA    
friends            NA         NA        NA       NA    
leisure            NA         NA        NA       NA    
politics           NA         NA        NA       NA    
work               NA         NA        NA       NA    
religion           NA         NA        NA       NA    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

Political Inclinations of Respondents

Show code
#Linear regression of "politics" variables + weighted
mlm2w <- lm(users ~ willingness + petition + boycott + demonstration + strikes + identity, data = all_data, na.action = na.exclude, weights)

lm(formula = users ~ willingness + petition + boycott + demonstration + 
    strikes + identity, data = all_data, subset = weights, na.action = na.exclude)

         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (6 not defined because of singularities)
               Estimate Std. Error   t value Pr(>|t|)    
(Intercept)   5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
willingness          NA         NA        NA       NA    
petition             NA         NA        NA       NA    
boycott              NA         NA        NA       NA    
demonstration        NA         NA        NA       NA    
strikes              NA         NA        NA       NA    
identity             NA         NA        NA       NA    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

Demographics of Respondents

Show code
#Linear regression of "demographics" variables + weighted
mlm3w <- lm(users ~ marital + parents + children + household + education + income, data = all_data, na.action = na.exclude, weights)

lm(formula = users ~ marital + parents + children + household + 
    education + income, data = all_data, subset = weights, na.action = na.exclude)

         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (6 not defined because of singularities)
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
marital            NA         NA        NA       NA    
parents            NA         NA        NA       NA    
children           NA         NA        NA       NA    
household          NA         NA        NA       NA    
education          NA         NA        NA       NA    
income             NA         NA        NA       NA    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

The best model I generated with the weights function added

Show code
#Linear regression of "politics" variables
mlm7b <- lm(users ~ demonstration + strikes + education, data = all_data, na.action = na.exclude, weights)

lm(formula = users ~ demonstration + strikes + education, data = all_data, 
    subset = weights, na.action = na.exclude)

         1        1.1 
-5.024e-15  5.024e-15 

Coefficients: (3 not defined because of singularities)
               Estimate Std. Error   t value Pr(>|t|)    
(Intercept)   5.700e+01  5.024e-15 1.134e+16   <2e-16 ***
demonstration        NA         NA        NA       NA    
strikes              NA         NA        NA       NA    
education            NA         NA        NA       NA    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.105e-15 on 1 degrees of freedom

Government Right to Access

Show code
ivs_right <- read.csv("government_rights.csv")
names(ivs_right)[1] <- 'country'
    country surveillance  monitor  collect users
1   Albania     2.443206 3.027178 2.951220    57
2   Andorra     2.261952 3.588645 3.569721     6
3 Argentina     2.499501 3.044865 2.775673    11
4   Armenia     2.359333 2.657333 2.552000    16
5 Australia     1.787645 2.877551 2.896856   717
6   Austria     2.495134 3.085158 3.333333  3276

Exploratory Model

Show code
right <- lm(users ~ surveillance + monitor + collect, data = ivs_right)

lm(formula = users ~ surveillance + monitor + collect, data = ivs_right)

    Min      1Q  Median      3Q     Max 
-1285.4  -692.4  -360.4    10.9 11564.8 

             Estimate Std. Error t value Pr(>|t|)
(Intercept)    -788.8     1407.0  -0.561    0.577
surveillance   -992.7      847.5  -1.171    0.246
monitor       -1297.9     1726.1  -0.752    0.455
collect        2516.6     1533.5   1.641    0.106

Residual standard error: 1855 on 63 degrees of freedom
Multiple R-squared:  0.0762,    Adjusted R-squared:  0.0322 
F-statistic: 1.732 on 3 and 63 DF,  p-value: 0.1695

removing highest p-value

Show code
right2 <- lm(users ~ surveillance + collect, data = ivs_right)

lm(formula = users ~ surveillance + collect, data = ivs_right)

    Min      1Q  Median      3Q     Max 
-1537.0  -711.1  -431.3    10.0 11813.0 

             Estimate Std. Error t value Pr(>|t|)  
(Intercept)    -876.5     1397.4  -0.627   0.5327  
surveillance  -1278.4      755.1  -1.693   0.0953 .
collect        1487.9      690.5   2.155   0.0350 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1848 on 64 degrees of freedom
Multiple R-squared:  0.0679,    Adjusted R-squared:  0.03878 
F-statistic: 2.331 on 2 and 64 DF,  p-value: 0.1054

removing highest p-value

Show code
right3 <- lm(users ~ collect, data = ivs_right)

lm(formula = users ~ collect, data = ivs_right)

    Min      1Q  Median      3Q     Max 
-1070.4  -707.3  -573.1  -203.9 12247.4 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1136.8     1408.7  -0.807    0.423
collect        620.0      469.2   1.321    0.191

Residual standard error: 1875 on 65 degrees of freedom
Multiple R-squared:  0.02616,   Adjusted R-squared:  0.01117 
F-statistic: 1.746 on 1 and 65 DF,  p-value: 0.191


