Tuesday, April 26, 2016

Using R to work through Sokal and Rohlf's Biometry: Chapter 4 (Descriptive Statistics), exercises

Previously in this series: Chapter 4 (sections 4.6-4.9 in Sokal and Rohlf's Biometry).

I sure fell off the Sokal and Rohlf horse.  Time to get back on track!  Let's finish up Chapter 4.  From this point onwards, I will do only the exercises that add to the code base I've written for the rest of the chapter.  In this case there is just one.

Exercises 4 (selected)

#Exercise 4.3
#Use the data to get mean, standard deviation, and coefficient of variation,
#then repeat using groupings.
e2.7<-c(4.32, 4.25, 4.82, 4.17, 4.24, 4.28, 3.91, 3.97, 4.29, 4.03, 4.71, 4.20,
        4.00, 4.42, 3.96, 4.51, 3.96, 4.09, 3.66, 3.86, 4.48, 4.15, 4.10, 4.36,
        3.89, 4.29, 4.38, 4.18, 4.02, 4.27, 4.16, 4.24, 3.74, 4.38, 3.77, 4.05,
        4.42, 4.49, 4.40, 4.05, 4.20, 4.05, 4.06, 3.56, 3.87, 3.97, 4.08, 3.94,
        4.10, 4.32, 3.66, 3.89, 4.00, 4.67, 4.70, 4.58, 4.33, 4.11, 3.97, 3.99,
        3.81, 4.24, 3.97, 4.17, 4.33, 5.00, 4.20, 3.82, 4.16, 4.60, 4.41, 3.70,
        3.88, 4.38, 4.31, 4.33, 4.81, 3.72, 3.70, 4.06, 4.23, 3.99, 3.83, 3.89,
        4.67, 4.00, 4.24, 4.07, 3.74, 4.46, 4.30, 3.58, 3.93, 4.88, 4.20, 4.28,
        3.89, 3.98, 4.60, 3.86, 4.38, 4.58, 4.14, 4.66, 3.97, 4.22, 3.47, 3.92,
        4.91, 3.95, 4.38, 4.12, 4.52, 4.35, 3.91, 4.10, 4.09, 4.09, 4.34, 4.09)

#a. without groupings
(e2.7.mean<-mean(e2.7))
(e2.7.sd<-sd(e2.7))
(cv.e2.7<-(e2.7.sd*100/e2.7.mean))

#b. with groupings.
#To create groupings, we use hist() but with plot=FALSE, which we did not use before.
(e2.7.hist<-hist(e2.7,
                 breaks=seq(min(e2.7),
                            max(e2.7),
                            length.out=11), #use 11 to get 10 groups (number of groups you want + 1)
                            #R will chose binning automatically if you do not use the breaks argument.
                 plot=FALSE))
#It comes out as a list so we need the midpoints (or "class marks" in the books' terminology)
#and the counts as a data frame.
#We also multiply the two to get class sums as in section 4.1 (and box 4.2)
(e2.7.grouped<-data.frame("frequencies"=e2.7.hist[[2]],
                         "classmark"=e2.7.hist[[4]],
                          "classsums"=e2.7.hist[[2]]*e2.7.hist[[4]]))

(e2.7.samplesize<-sum(e2.7.grouped$frequencies))
(e.2.7.summing<-sum(e2.7.grouped$classmark*e2.7.grouped$frequencies))
(mean.moo<-e.2.7.summing/e2.7.samplesize)

#These numbers are the same as before because I didn't code the class marks.
#You can see how to do that in the previous post.
#In the modern computing age we are unlikely to group the data to get averages,
#but you can see where it would be useful if you are given data in that format,
#or want to calculate by hand.

Tuesday, April 05, 2016

Winter: the sequel

The snow keeps melting and then coming back.  Seven Sisters Falls, Manitoba.  20 March 2016.