Last updated: 2022-01-15

Checks: 5 1

Knit directory: Padgett-Dissertation/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity itโ€™s best to always run the code in an empty environment.

The command set.seed(20210401) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Tracking code development and connecting the code version to the results is critical for reproducibility. To start using Git, open the Terminal and type git init in your project directory.


This project is not being versioned with Git. To obtain the full reproducibility benefits of using workflowr, please see ?wflow_start.


# Load packages & utility functions
source("code/load_packages.R")
source("code/load_utility_functions.R")
# environment options
options(scipen = 999, digits=3)

Model Comparison

Comparing Reliability

# true reliability value in population
getOmega <- function(lambda, N_items){
  theta = 1-lambda**2
  (lambda*N_items)**2/((lambda*N_items)**2 + N_items*theta)
}

getOmega(0.7, 5)
[1] 0.828
# simulated induced prior on omega
prior_lambda <- function(){
  y <- -1
  while(y < 0){
    y <- rnorm(1, 0, 2)
  }
  return(y)
}

prior_omega <- function(lambda, theta){
  (sum(lambda)**2)/(sum(lambda)**2 + sum(theta))
}
nsim=1000
sim_omega <- numeric(nsim)
for(i in 1:nsim){
  lam_vec <- c(
    prior_lambda(), prior_lambda(), prior_lambda(),prior_lambda(), prior_lambda()
  )
  tht_vec <- rep(1, 5)
  sim_omega[i] <- prior_omega(lam_vec, tht_vec)
}
prior_data <- data.frame(omega=sim_omega)
ggplot(prior_data, aes(x=omega))+
  geom_density(adjust=2)+
  labs(title="Induced prior on omega")+
  theme_classic()

# read in data
o1 <- readr::read_csv(paste0(getwd(),"/data/study_1/extracted_omega_m1.csv"))
Warning: Missing column names filled in: 'X1' [1]

-- Column specification --------------------------------------------------------
cols(
  X1 = col_double(),
  model_1 = col_double()
)
o2 <- readr::read_csv(paste0(getwd(),"/data/study_1/extracted_omega_m2.csv"))
Warning: Missing column names filled in: 'X1' [1]

-- Column specification --------------------------------------------------------
cols(
  X1 = col_double(),
  model_2 = col_double()
)
o3 <- readr::read_csv(paste0(getwd(),"/data/study_1/extracted_omega_m3.csv"))
Warning: Missing column names filled in: 'X1' [1]

-- Column specification --------------------------------------------------------
cols(
  X1 = col_double(),
  model_3 = col_double()
)
o4 <- readr::read_csv(paste0(getwd(),"/data/study_1/extracted_omega_m4.csv"))
Warning: Missing column names filled in: 'X1' [1]

-- Column specification --------------------------------------------------------
cols(
  X1 = col_double(),
  model_4 = col_double()
)
dat_omega <- cbind(o1[,2], o2[,2], o3[,2], o4[,2])

plot.dat <- dat_omega %>%
  pivot_longer(
    cols=everything(),
    names_to = "model",
    values_to = "est"
  ) %>%
  mutate(
    model = factor(model, levels=paste0('model_',1:4), labels=paste0('Model ',1:4))
  )

sum.dat <- plot.dat %>%
  group_by(model) %>%
  summarise(
    Mean = mean(est),
    SD = sd(est),
    Q025 = quantile(est, 0.025),
    Q1 = quantile(est, 0.25),
    Median = median(est),
    Q3 = quantile(est, 0.75),
    Q975 = quantile(est, 0.975),
  )

kable(sum.dat,format = "html", digits=3) %>%
  kable_styling(full_width = T)
model Mean SD Q025 Q1 Median Q3 Q975
Model 1 0.516 0.063 0.377 0.478 0.523 0.561 0.620
Model 2 0.527 0.057 0.402 0.492 0.532 0.567 0.623
Model 3 0.709 0.060 0.570 0.674 0.717 0.752 0.802
Model 4 0.747 0.051 0.633 0.714 0.753 0.784 0.829
ggplot(plot.dat,aes(x=est, y=model, group=model))+
ggdist::stat_halfeye(
    adjust=2, justification=-0.1,.width=0, point_colour=NA,
    normalize="all", fill="grey75"
  ) +
  geom_boxplot(
    width=.15, outlier.color = NA, alpha=0.5
  ) +
  geom_vline(xintercept = getOmega(0.7, 5),
             linetype="dashed")+
  annotate("text", x=0.9, y=2, label=paste0("Population\n Reliability\n", round(getOmega(0.7, 5), 3)))+
  labs(x="Reliability Estimates",
       y="Estimating Model")+
  lims(x=c(0.25, 1))+
  theme_classic()
Warning: Removed 3 rows containing missing values (stat_slabinterval).
Warning: Removed 3 rows containing non-finite values (stat_boxplot).

Test of Model Impact on Reliability Estimates

ANOVA

anova_assumptions_check(
  dat = plot.dat, outcome = 'est',
  factors = c('model'),
  model = as.formula('est ~ model'))

 ============================= 

 Tests and Plots of Normality:


 Shapiro-Wilks Test of Normality of Residuals:

    Shapiro-Wilk normality test

data:  res
W = 1, p-value <0.0000000000000002


 K-S Test for Normality of Residuals:

    One-sample Kolmogorov-Smirnov test

data:  aov.out$residuals
D = 0.4, p-value <0.0000000000000002
alternative hypothesis: two-sided
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


 ============================= 

 Tests of Homogeneity of Variance

 
 Levenes Test:  model 
 
 
Levene's Test for Homogeneity of Variance (center = "mean")
         Df F value              Pr(>F)    
group     3    45.7 <0.0000000000000002 ***
      15996                                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit <- aov(est ~ model, data=plot.dat)
summary(fit)
               Df Sum Sq Mean Sq F value              Pr(>F)    
model           3  173.7    57.9   17229 <0.0000000000000002 ***
Residuals   15996   53.8     0.0                                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# tukey
TukeyHSD(fit)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = est ~ model, data = plot.dat)

$model
                  diff     lwr    upr p adj
Model 2-Model 1 0.0110 0.00767 0.0143     0
Model 3-Model 1 0.1930 0.18963 0.1963     0
Model 4-Model 1 0.2311 0.22772 0.2344     0
Model 3-Model 2 0.1820 0.17862 0.1853     0
Model 4-Model 2 0.2200 0.21671 0.2234     0
Model 4-Model 3 0.0381 0.03476 0.0414     0
# ets^2
summary(lm(est ~ model, data=plot.dat))

Call:
lm(formula = est ~ model, data = plot.dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.3352 -0.0346  0.0066  0.0417  0.1655 

Coefficients:
             Estimate Std. Error t value            Pr(>|t|)    
(Intercept)  0.515732   0.000917  562.63 <0.0000000000000002 ***
modelModel 2 0.011005   0.001296    8.49 <0.0000000000000002 ***
modelModel 3 0.192956   0.001296  148.85 <0.0000000000000002 ***
modelModel 4 0.231050   0.001296  178.23 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.058 on 15996 degrees of freedom
Multiple R-squared:  0.764, Adjusted R-squared:  0.764 
F-statistic: 1.72e+04 on 3 and 15996 DF,  p-value: <0.0000000000000002

Comparison posteriors using probabilities

Next, instead of treating the posterior

ggplot(plot.dat, aes(est, group=model, color=model, linetype=model)) +
  stat_ecdf(
    geom = "step",
    pad=T
  ) +
  labs(x="Reliability (omega)",
       y="Empirical Cumulative Distribution")+
  scale_color_viridis_d()+
  theme_classic()

Manuscript Tables and Figures

Tables

print(
  xtable(
    sum.dat,
    , caption = c("Summary of posterior distribution of reliability")
    ,align = "llrrrrrrr"
  ),
  include.rownames=F,
  booktabs=T
)
% latex table generated in R 4.0.5 by xtable 1.8-4 package
% Sat Jan 15 13:43:52 2022
\begin{table}[ht]
\centering
\begin{tabular}{lrrrrrrr}
  \toprule
model & Mean & SD & Q025 & Q1 & Median & Q3 & Q975 \\ 
  \midrule
Model 1 & 0.52 & 0.06 & 0.38 & 0.48 & 0.52 & 0.56 & 0.62 \\ 
  Model 2 & 0.53 & 0.06 & 0.40 & 0.49 & 0.53 & 0.57 & 0.62 \\ 
  Model 3 & 0.71 & 0.06 & 0.57 & 0.67 & 0.72 & 0.75 & 0.80 \\ 
  Model 4 & 0.75 & 0.05 & 0.63 & 0.71 & 0.75 & 0.78 & 0.83 \\ 
   \bottomrule
\end{tabular}
\caption{Summary of posterior distribution of reliability} 
\end{table}

Figures

p <- ggplot(plot.dat,aes(x=est, y=model, group=model))+
ggdist::stat_halfeye(
    adjust=2, justification=-0.1,.width=0, point_colour=NA,
    normalize="all", fill="grey75"
  ) +
  geom_boxplot(
    width=.15, outlier.color = NA, alpha=0.5
  ) +
  geom_vline(xintercept = getOmega(0.7, 5),
             linetype="dashed")+
  annotate("text", x=0.9, y=2, label=paste0("Population\n Reliability\n", round(getOmega(0.7, 5), 3)))+
  labs(x="Reliability Estimates",
       y="Estimating Model")+
  lims(x=c(0.25, 1))+
  theme_bw() +
  theme(panel.grid = element_blank())
p
Warning: Removed 3 rows containing missing values (stat_slabinterval).
Warning: Removed 3 rows containing non-finite values (stat_boxplot).

ggsave(filename = "fig/study1_posterior_omega.pdf",plot=p,width = 7, height=4,units="in")
Warning: Removed 3 rows containing missing values (stat_slabinterval).

Warning: Removed 3 rows containing non-finite values (stat_boxplot).
ggsave(filename = "fig/study1_posterior_omega.png",plot=p,width = 7, height=4,units="in")
Warning: Removed 3 rows containing missing values (stat_slabinterval).

Warning: Removed 3 rows containing non-finite values (stat_boxplot).
ggsave(filename = "fig/study1_posterior_omega.eps",plot=p,width = 7, height=4,units="in")
Warning: Removed 3 rows containing missing values (stat_slabinterval).

Warning: Removed 3 rows containing non-finite values (stat_boxplot).
Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-transparency is
not supported on this device: reported only once per page

sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] car_3.0-10           carData_3.0-4        mvtnorm_1.1-1       
 [4] LaplacesDemon_16.1.4 runjags_2.2.0-2      lme4_1.1-26         
 [7] Matrix_1.3-2         sirt_3.9-4           R2jags_0.6-1        
[10] rjags_4-12           eRm_1.0-2            diffIRT_1.5         
[13] statmod_1.4.35       xtable_1.8-4         kableExtra_1.3.4    
[16] lavaan_0.6-7         polycor_0.7-10       bayesplot_1.8.0     
[19] ggmcmc_1.5.1.1       coda_0.19-4          data.table_1.14.0   
[22] patchwork_1.1.1      forcats_0.5.1        stringr_1.4.0       
[25] dplyr_1.0.5          purrr_0.3.4          readr_1.4.0         
[28] tidyr_1.1.3          tibble_3.1.0         ggplot2_3.3.5       
[31] tidyverse_1.3.0      workflowr_1.6.2     

loaded via a namespace (and not attached):
 [1] minqa_1.2.4          TAM_3.5-19           colorspace_2.0-0    
 [4] rio_0.5.26           ellipsis_0.3.1       ggridges_0.5.3      
 [7] rprojroot_2.0.2      fs_1.5.0             rstudioapi_0.13     
[10] farver_2.1.0         fansi_0.4.2          lubridate_1.7.10    
[13] xml2_1.3.2           splines_4.0.5        mnormt_2.0.2        
[16] knitr_1.31           jsonlite_1.7.2       nloptr_1.2.2.2      
[19] broom_0.7.5          dbplyr_2.1.0         ggdist_3.0.1        
[22] compiler_4.0.5       httr_1.4.2           backports_1.2.1     
[25] assertthat_0.2.1     cli_2.3.1            later_1.1.0.1       
[28] htmltools_0.5.1.1    tools_4.0.5          gtable_0.3.0        
[31] glue_1.4.2           Rcpp_1.0.7           cellranger_1.1.0    
[34] jquerylib_0.1.3      vctrs_0.3.6          svglite_2.0.0       
[37] nlme_3.1-152         psych_2.0.12         xfun_0.21           
[40] ps_1.6.0             openxlsx_4.2.3       rvest_1.0.0         
[43] lifecycle_1.0.0      MASS_7.3-53.1        scales_1.1.1        
[46] ragg_1.1.1           hms_1.0.0            promises_1.2.0.1    
[49] parallel_4.0.5       RColorBrewer_1.1-2   curl_4.3            
[52] yaml_2.2.1           sass_0.3.1           reshape_0.8.8       
[55] stringi_1.5.3        highr_0.8            zip_2.1.1           
[58] boot_1.3-27          rlang_0.4.10         pkgconfig_2.0.3     
[61] systemfonts_1.0.1    distributional_0.3.0 evaluate_0.14       
[64] lattice_0.20-41      labeling_0.4.2       tidyselect_1.1.0    
[67] GGally_2.1.1         plyr_1.8.6           magrittr_2.0.1      
[70] R6_2.5.0             generics_0.1.0       DBI_1.1.1           
[73] foreign_0.8-81       pillar_1.5.1         haven_2.3.1         
[76] withr_2.4.1          abind_1.4-5          modelr_0.1.8        
[79] crayon_1.4.1         utf8_1.1.4           tmvnsim_1.0-2       
[82] rmarkdown_2.7        grid_4.0.5           readxl_1.3.1        
[85] CDM_7.5-15           pbivnorm_0.6.0       git2r_0.28.0        
[88] reprex_1.0.0         digest_0.6.27        webshot_0.5.2       
[91] httpuv_1.5.5         textshaping_0.3.1    stats4_4.0.5        
[94] munsell_0.5.0        viridisLite_0.3.0    bslib_0.2.4         
[97] R2WinBUGS_2.1-21