Tuesday, November 29, 2016

Using R to work through Sokal and Rohlf's Biometry: Chapter 5 (Descriptive Statistics), section 5.4 and selected exercises

Previously in this series: Chapter 5 (Descriptive Statistics), section 5.3.


Section 5.4: Other Discrete Probability Distributions

?dhyper #hypergeometric
?dmultinom #multinomial
?dnbinom #negative binomial
#There are also options in various packages later for regression to use different distributions,
#although of these I've only ever used negative binomial.

#The logarithmic distribution requires an extra package.
#https://cran.r-project.org/web/views/Distributions.html
#extraDistr, VGAM, or gamlss.dst
#I installed extraDistr to test it out.
library(extraDistr)
?LogSeries
#and
?dlgser #The distributions in this package follow the same us of d, p, and q prefixes.


Exercises 5

#I'm doing the exercises that require new coding beyond what we have done already.

#Exercise 5.5
#The organism is present or absent in any given slide, assuming the person is diseased
#(if the person has the disease, their samples contain the organism but it's not very common).
#We want a false negative <1% of the time.  The organism is visible in 20% of the slides.
#At first I thought this was a Bayesian problem but I don't see how to do it that way.
#p=0.2 (organism visible), thus q=0.8 (organism present but not visible).
#We need to find the power of q=0.8 that equals 0.01 or smaller
#(1% false negative, which would be present but not visible).
#So, I made a while loop that cycled through to see how many times we need to raise 0.8
#to get to 0.01.  This explanation helped me figure out how to do this type of loop.

n<-0
i<-1
print(n)
while (i > 0.01) {
  print(i)
  print(n)
     i<-0.8^n
     n=n+1
}
#the last number printed is the number of slides needed.

No comments:

Post a Comment

Comments and suggestions welcome.