When experiments are carried out, only one randomization actually occurs. However, this is just one of many possible randomizations. One way of testing whether any covariates are related to treatment status is to use randomization inference to generate many of these potential randomizations.
We will be using part of the dataset framing
in the mediation
package. For the sake of this assignment, let’s assume that only the treatment tone
was randomized. We want to test that none of the pre-treatment covariates are related to tone. These pre-treatment covariates, for the sake of this assignment, are age
, educ
, gender
, and income
.
In order to test the balance, first run a linear regression, where the outcome is the treatment status. Then get the F-statistic. The F-Statistic tests whether the covariates predict treatment better than the intercept alone would. You use the summary(model)$fstatistic
to to extract the f-statistic vector from the regression object. Note: this is not prefectly correct. It may be better to run a logistic regression and calculate a likelihood ratio test. You can do this too if you want!
However, you don’t know what the p-value for this F-Statistic should be. There were many possible randomizations. The formula for the total number of possible randomization is \(\frac{N!}{n!(N-n)!}\). In our example we have 130 who didn’t get tone
and 135 who did. This is a very large number!
lm()
set.seed()
. Remember the function sample()
, and replicate()
could also help.for
loop and apply
to do this simulations and show their equivalencepf()
and qf()
function.