Tuesday, November 15, 2016

Using R to work through Sokal and Rohlf's Biometry: Chapter 5 (Descriptive Statistics), section 5.3

Previously in this series: Chapter 5 (Introduction to Probability Distributions: Binomial and Poisson), sections 5.1-5.2.

Section 5.3: the Poisson distribution

?Distributions #Let's look in help again to find the Poisson distribution.
?dpois #We can see that R does similar calculations for Poisson in addition to the binomial distribution.

#To get column 3 (expected absolute frequencies) in table 5.5,
#there is the recursion formula given in equations 5.11 and 5.12.
#5.12 is for calculations with sample means to get the relative expected frequency,
#and in the text below it notes what term to add to get the absolute ones given in column 3 of table 5.5

abs.expected.freq.pois<-400*dpois(c(0:9),
                                  lambda=1.8,
                                  log=FALSE)

#I also want to do equation 5.12 manually for relative expected frequencies.
#This is not normally needed because R does it so nicely with the base stats function dpois().

#Some reading:
#http://stackoverflow.com/questions/5273857/are-recursive-functions-used-in-r
#this one has a simple one for factorials that makes it clearest to me:
#http://www.programiz.com/r-programming/recursion

rel.exp.freq.pois<-function(samplemean, i) {
  if (i==0) return (exp(-samplemean))
  else      return (rel.exp.freq.pois(samplemean,
                                      i-1)*(samplemean/i))
}

#To get absolute frequencies, multiply by 400.
#For one example:
400*rel.exp.freq.pois(1.8, 1)

#To get all the frequencies at once, use lapply.
rel.freq.pois<-unlist(lapply(c(0:9),
                                     rel.exp.freq.pois,
                                     samplemean=1.8))
(abs.freq.pois<-400*rel.freq.pois)

#On page 82, they show how to calculate the coefficient of dispersion.  Here is a function that will do it.
coefficient.of.dispersion<-function(data) {
  var(data)/mean(data)
}
#You can input any set of data here and get the coefficient of dispersion.

#Figure 5.3 shows Poisson distributions with different means.
#We can make this with the dpois() function generating y data.
#Oddly, to add the extra lines, it is easiest to use points() with
#type="l" (for lines).
plot(dpois(c(0:18),
      lambda=0.1,
      log=FALSE),
     type="l",
     ylab="Relative expected frequency",
     xlab="Number of rare events per sample")
points(dpois(c(0:18),
           lambda=1,
           log=FALSE),
       type="l")
points(dpois(c(0:18),
             lambda=2,
             log=FALSE),
       type="l")
points(dpois(c(0:18),
             lambda=3,
             log=FALSE),
       type="l")
points(dpois(c(0:18),
             lambda=10,
             log=FALSE),
       type="l")

#The remaining tables and examples in this section do not add anything new to code.

No comments:

Post a Comment

Comments and suggestions welcome.