ANOVA analysis

By hearing the term ANOVA, if you are a statistics student, there is one thing that is crossing your mind- how can I perform the test in R and make a conclusion? Do not worry. We got you covered. Here we address your apprehensions by providing you with a test that our diligent and experienced experts have conducted.  

Data preparation

Our R assignment experts did not jump straight into fitting the model. Instead, they prepared the data in a way that was easy for analysis.
The R assignment help expert summed up the three readings at each site (Control, D, H1, H2, H5, H6, and X) and obtained the summed numbers and store them in new variables named D_tot, H1_tot, H2_tot, H5_tot, H6_tot, X_tot, and Control_tot. This helps to analyze the total differences between each site (D, H1, H2, H5, H6, X, and Control) more easily and present a summarized view of the pollution at each site.

Descriptive statistics

As in all tasks, descriptive statistics is a requirement for any analysis. It helps us get the general insights about the data. We begin the analysis by presenting the descriptive statistics of these sites. The boxplot of the pollution levels at the six sites, and with a control sample, is shown in the figure below
Boxplot of pollution levels at different sites
Figure 1: Boxplots of pollution levels at the 6 sites and sample sites.
The boxplots indicate that there are no outliers in the dataset. The mean levels of pollution at the six sites are close to each other and appear too close to the pollution levels in the control sample. There is a significant overlap among the treatment samples and the control samples, which indicates that one may not find out statistically significant differences in pollution levels in the treatment and the control samples. We also note that the tails of the distribution are not equal. The right tails of these distributions are significantly longer than the left tails.
Summary
Figure 2: Descriptive statistics of the control site and test samples
The average pollution level at the control site is 18.034, while the average pollution level at tributary D is 18.062, at site H1 is 18.073, at site H2 is 18.078, at site H5 is 18.07, at site H6 is 18.056, and at tributary X is 18.098. The lowest pollution levels are at the control site.
To figure out if there is a statistically significant difference in pollution levels in the treatment samples and the control samples, we could employ the use of a one-way ANOVA to test for the difference in the means

stating the hypothesis statement and testing for the assumptions

hypothesis statement
The hypotheses for the ANOVA are stated below:
H0: µControl = µD = µH1 = µH2 = µH5 = µH6 = µX
H1: At least one of the means is not equal to the other means
In the hypothesis statement, µY indicates the average pollution level at site “Y.”
Testing for the ANOVA assumptions
We test the various assumptions for ANOVA before proceeding with the test. One of the basic assumptions for ANOVA is that the data for the test are normally distributed. This can be easily verified using the Shapiro Wilk test where the null hypothesis for the test is that the distribution has a normal distribution. The significance level for the test is 5%. Thus, if the test returns a p-value greater than 5%, we would conclude that the distribution is normal, else the data is not normally distributed.
shapiro test
Figure 3: Result of the Shapiro Wilk test for pollution levels at tributary D (D_tot variable)
The Shapiro Wilk test was conducted for the variable “D_tot,” and the W statistic is equal to 0.94, p-value < 0.001. Thus, at a 5% significance level, we reject the null hypothesis that the variable “D_tot” is normally distributed. Thus, we conclude that the distribution is not normally distributed.
Histogram of D Totals
Figure 4: Histogram of pollution levels at tributary D
The histogram shown in the figure is for the pollutions levels from samples extracted at tributary D. The histogram is not symmetric, as one would expect from a normal distribution. The histogram indicates positive skew. Since the assumptions have failed, meaning we have to use another test.

Mann Whitney’s U test
Our online R homework tutors resorted to using Mann Whitney’s U test to figure out pairwise differences between the different sample sites and compare them to the control group.
wilcox test
Figure 5: Result of Mann Whitney’s U test of pollution levels at tributary D and Control sample
Mann Whitney’s U test was conducted on pollution levels at tributary D and control samples. The W-stat is 500940, and the p-value is 0.942> 0.05. Thus, the pollution levels at the control site and tributary D are equal
wilcox test 2
Figure 6: Result of Mann Whitney’s U test for pollution levels at sample site H1 and Control sample
Mann Whitney’s U test was conducted for pollution levels at site H1 and control samples. The W-stat is 500950, and the p-value is 0.9413> 0.05. Thus, the pollution levels at the control site and sample site H1 are equal
wilcox test 3
Figure 7: Result of Mann Whitney’s U test for pollution levels at sample site H2 and Control sample
Mann Whitney’s U test conducted for pollution levels at site H2 and control sample. The W-stat is 500620, and the p-value is 0.9616> 0.05. Thus, the pollution levels at the control site and sample siteH2 are equal
wilcox test 4
Figure 8: Result of Mann Whitney’s U test of pollution levels at sample sitesH5, H6 & X and Control sample
Mann Whitney’s U tests conducted indicate that p-values are greater than 0.05. Thus, it indicates that the pollution levels are equal.
We conduct Mann Whitney’s U test between samples collected at Hun River and tributaries D and X
wilcox test 5
Figure 8: Result of Mann Whitney’s U test for pollution levels at sample sites at Hun River and tributary X.
The Mann Whitney’s U tests conducted between the sample site at Hun River and tributary X indicate that p-values are greater than 0.05. Thus, it indicates that the pollution levels are equal.
wilcox test 6
Figure 9: Result of Mann Whitney’s U test of pollution levels at sample sites at Hun River and tributary D.
The Mann Whitney’s U tests conducted between the sample site at Hun River and tributary D indicate that p-values are greater than 0.05. Thus, it indicates that the pollution levels are equal. Further, we test for consistent up-regulation/down-regulation at different test sites. Results don’t indicate any such trends.

Results

We conducted various tests to analyze the research hypotheses we set out to test for at the beginning. The results indicate that the pollution levels are reasonably close to each other at different test sites (Hun River, Tributaries D & X) and the control site.

Discussion

The analysis used Mann Whitney’s U test for testing the differences in pollution levels at Hun River, Tributaries D & X, and compared them to pollution levels at the control site. The tests indicate that pollution levels don’t significantly differ from different sites.