One way Anova manually in r

On This Page

One-Way ANOVA (Analysis of Variance) is a statistical test used to determine if there are any statistically significant differences between the means of three or more independent (unrelated) groups.

Let $Y_{ij}$ be the $j$ th observation of $i$ th group . We can write the model as

Model

\[Y_{ij} = \mu + \tau_i + \epsilon_{ij}\]

Now using LSE of $\mu , \alpha_i,\epsilon_{ij} $ The reduced model assumes that there are no differences between group means:

\[Y_{ij} = \mu + \epsilon_{ij}\]

Required ANOVA Table

The ANOVA summary table partitions the total variation into components due to the model (between groups) and error (within groups).

Source Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-statistic
Between Groups $SSB$ $k-1$ $MSB = \frac{SSB}{k-1}$ $F = \frac{MSB}{MSW}$
Within Groups $SSW$ $N-k$ $MSW = \frac{SSW}{N-k}$  
Total $SST$ $N-1$    

Example

Consider the following data:

Calculate Group Means and Overall Mean

Compute the Sum of Squares

\[SSB = \sum_{i=1}^{k} n_i (\mu_i - \mu)^2\]

where $n_i$ is the number of observations in group $i$.

\[SSB = 3(2.67 - 2.78)^2 + 3(2.67 - 2.78)^2 + 3(3.00 - 2.78)^2\] \[SSB = 3 \times (-0.11)^2 + 3 \times (-0.11)^2 + 3 \times 0.22^2\] \[SSB = 0.3636\]

Within-group sum of squares

\[SSW = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (Y_{ij} - \mu_i)^2\]

For Group 1:

\[SSW_1 = (1 - 2.67)^2 + (2 - 2.67)^2 + (5 - 2.67)^2 = 2.7889 + 0.4489 + 5.4489 = 8.6867\]

For Group 2:

\[SSW_2 = (2 - 2.67)^2 + (4 - 2.67)^2 + (2 - 2.67)^2 = 0.4489 + 1.7689 + 0.4489 = 2.6667\]

For Group 3:

\[SSW_3 = (2 - 3.00)^2 + (3 - 3.00)^2 + (4 - 3.00)^2 = 1 + 0 + 1 = 2\] \[SSW = SSW_1 + SSW_2 + SSW_3 = 8.6867 + 2.6667 + 2 = 13.3534\]

Now calculate Mean squares and F statistics in ANOVA table

ANOVA Summary Table

Source Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-statistic
Between Groups 0.3636 2 SSB/2= 0.1818 0.0817
Within Groups 13.3534 6 SSW/6 2.2256  
Total 13.717 8    

Conclusion

Compare the F-statistic to the critical value from the F-distribution table at a chosen significance level (e.g., $\alpha = 0.05$). For $df_1 = 2$ and $df_2 = 6$, the critical value is approximately 5.14. Since 0.0817 < 5.14, we fail to reject the null hypothesis and conclude that there are no significant differences between the group means.

Ref:

  1. Ftable $\alpha=0.05$
  2. Example from

R code

data <- data.frame(
  group1 = c(1, 2, 5),
  group2 = c(2, 4, 2),
  group3 = c(2, 3, 4)
)
# Calculate group means
group_means <- c(mean(data$group1), mean(data$group2), mean(data$group3))
# Calculate overall mean
overall_mean <- mean(unlist(data))
# Number of groups and total number of observations
k <- ncol(data)
N <- length(unlist(data))
# Calculate Sum of Squares Between (SSB)
ssb <- 0
for (i in 1:k) {
  ssb <- ssb + length(data[, i]) * (group_means[i] - overall_mean)^2
}
# Calculate Sum of Squares Within (SSW)
ssw <- 0
for (i in 1:k) {
  ssw <- ssw + sum((data[, i] - group_means[i])^2)
}
# Calculate Total Sum of Squares (SST)
sst <- ssw + ssb
# Degrees of Freedom
df_between <- k - 1
df_within <- N - k
df_total <- df_between + df_within
# Mean Squares
msb <- ssb / df_between
msw <- ssw / df_within
# F-statistic
f_statistic <- msb / msw

R output

Drop Your Email

Add Your Note