Tuesday, November 24, 2015

"Winter is here"

So declared one of the Environment Canada weather bulletins last Thursday morning.  And several days before that, a hilarious "Winter is coming" preceded their more detailed forecast.  Sure enough, we had several centimeters/an inch or two of snow!
Snow, with dog for scale.  I first saw that phrase in an older paper about... sparrows?  grasslands?  I wish I could remember what paper it was, but it appears to be a not-uncommon unit of measurement, as a google search found several other people using this helpful scale.  19 November 2015, Winnipeg, Manitoba, Canada.

Tuesday, November 17, 2015

Error bars on points in R

Putting error bars on figures is surprisingly complicated sometimes (even in Excel...).  I have to google it every time for R.  Various packages can do it, but base R also has some functions that will help (segments and arrows).  I like arrows() because you don't have to draw extra bits to get the nice line on the end of your error bars, which it seems you need to do if you use segments().

#Plot your center points; you could use your averages or other value here.
#Play around with the values in x and y to see what numbers move which part.
plot(x=c(1,2),
     y=c(2,3),
     xlim=c(0,3),
     ylim=c(0,5),
     xlab="x",
     ylab="y",
     pch=21, bg="black")

#Here are error bars on the left point at (1,2).
arrows(x0=c(1,1),
       y0=c(2,2),
       x1=c(1,1),
       y1=c(1,3),
       length=0.25,
       angle=90)

#you can draw each error bar one at a time to get a better idea of how the code works.
#I show these in blue so you can see as it draws over the original bar on the plot.
arrows(x0=c(1),
       y0=c(2),
       x1=c(1),
       y1=c(1),
       length=0.25,
       angle=90,
       col="blue")

#top error bar.
arrows(x0=c(1),
       y0=c(2),
       x1=c(1),
       y1=c(3),
       length=0.25,
       angle=90,
       col="blue")

#Then let's re-do the plot and draw error bars for both points at the same time.
plot(x=c(1,2),
     y=c(2,3),
     xlim=c(0,3),
     ylim=c(0,5),
     xlab="x",
     ylab="y",
     pch=21, bg="black")

arrows(x0=c(1,1,2,2),
       y0=c(2,2,3,3),
       x1=c(1,1,2,2),
       y1=c(1,3,2,4),
       length=0.25,
       angle=90)

#You can see you end up with a list of points,
#so you can set these values from other calculations you make of
#standard deviation, standard error, confidence intervals,
#or whatever you are using.

Tuesday, November 10, 2015

Sorting, grouping, and selecting data in R

I got started sorting data in SQL.  Nice select functions where some variable equals some value and you can get distinct or unique values.  R confused me. I was delighted to find an R package that allows the use of SQL selects in R, but it can occasionally be a bit clumsy due to differences in table and object naming.  I kept seeing references to dplyr as the modern way to use R natively to organize my data.  So, I have decided it is time to start learning dplyr and already it is helping a ton.  I recommend starting with the dplyr vignette.  I also found this tutorial with different sample data to be helpful.  I tried out chaining (the mysterious %>% operator I have seen lurking in code occasionally) and it was fantastic.  No more weird intermediate variables!  The tutorial describes a different-package-specific version of chaining, but dplyr implements it as well (the help file says it was formerly '%.%' but the '%>%' version has become standard) so it worked fine even though I hadn't installed the other 'magrittr' package mentioned on that page.  So far the biggest help to me is the distinct() function, which gets unique combinations of factors as I get when I do simpler and simpler SQL selects (instead of getting repeated rows of categorical data when I try to subset using base R for some variable that does have additional unique variables that I am not currently interested in).

Tuesday, November 03, 2015

Installing libraries on a linux operating system

Now that my computer can dual-boot into Ubuntu and Windows, I am going to start getting things set up for my original need for a Linux-type operating system: a program for estimation of genomic clines called bgc.  Installing the GNU Scientific Library was first.  I searched for it in the software center, and I was surprised to have 15+ options in the search results.  I eventually chose "GNU Scientific Library (GSL) -- library package".  It noted in "more info" that "To compile your own programs you also need to install libgsl0-dev", which I figure I do since bgc requires compiling.  I installed that one too.  Easy enough!

HDF5 at first seemed like it would be similarly easy, but all the binaries ready for installing were for either Windows or CentOS Linux, which I assume would be different.  So I have to compile it.  I followed "Compiling the easy way" from the Ubuntu website.

Step 1.  I wasn't sure if I was supposed to remove the $ before $USER.  Turns out I was.  I got an error when I tried $claire for my user name.  "chown: missing operand after ‘/usr/local/src’"  It worked fine when I just put claire though.  The hdf5 instructions say I need 'gcc' to compile it.  This Ubuntu help says build-essential contains gcc, and the "Compiling the easy way" did have me install it in this step.

Step 2. Obtain the package!  I got a .tar with Unix line endings (since the other options were Mac or Windows line endings) and .tar.gz seemed more complicated.  I moved it to the /user/local/src folder (which is under "Home") and right-clicked, then "Extract here".  I looked in the folder and there seemed to be some configure files, so I'll go ahead and try the next step

Step 3. The link to auto-apt confused me, so i went with the manual version, installing apt-file and its update ("sudo apt-get install apt-file" then "sudo apt-file update" which did not take as long as the instructions threatened on my reasonably fast connection, maybe five minutes at most).  I went into my directory (/usr/local/src) in the terminal as instructed, and ran ./configure and it gave me a different error message.  I navigated one level lower (to /usr/local/src/hdf5-1.8.15-patch1) and then it worked!  It did not end with any dependency errors, but it also did not obviously with config.status: creating Makefile.  I scrolled back up and found it, though, along with lots of other creating of filenames in various directories, and "no obvious errors" as the Ubuntu page said, so on to Step 4.

Step 4. It says to give the command "make".  I did.  Many things began happening, including warnings about overflows and floating points and such.  However, it kept going, so I let it be.  The hdf5 instructions said to "make install" in their step 4.3.1 so hopefully that didn't mess it up.  It took maybe five minutes.  As I read the Ubuntu instructions further "sudo checkinstall" replaces "make install".  I did the sudo checkinstall and the terminal asked if i wanted to create a default set of package docs.  Only one option ("y") so I said yes.  Then it wanted me to write a description for the package and end it with an empty line or EOF.  I wrote hdf5 and then pressed enter.  But enter just gave me a new line prefixed with >>.  So, end of file (EOF) is ctrl+D in linux?  But a blank line might work too?  I tried the blank line first.  That worked (pressing enter twice to create the blank line).  Next up it gave me a list of values for building the package (maintainer, summary, name, version, etc).  I couldn't think of any changes so I pressed enter to continue.  Then I messed up and tried to copy the message and ctrl+c did something, not sure what.  So I ran "sudo checkinstall" again and it seemed happy to return to the same message.  I pressed "y" when it asked me this:

Some of the files created by the installation are inside the build
directory: /usr/local/src/hdf5-1.8.15-patch1

You probably don't want them to be included in the package,
especially if they are inside your home directory.
Do you want me to list them?  [n]: y

This got me to a weird screen with (END) at the bottom.  Help me, google!  Pressing Q and then enter got me back to the main screen where it asked:

Should I exclude them from the package? (Saying yes is a good idea)  [y]: y

The install failed.

dpkg-deb: error: parsing file '/var/tmp/tmp.d7tq23dvxW/package/DEBIAN/control' near line 7 package 'hdf5-1.8.15':
 error in 'Version' field string 'patch1-1': version number does not start with digit

So I tried again.

I keep forgetting and pressing ctrl+c.
*** SIGINT received ***

Restoring overwritten files from backup...OK

Cleaning up...OK

Bye.

According to wikipedia, if I understand correctly, this just interrupts the current process.  So, it should be okay.

Anyway, the new install with version number changed to 1.8.15-patch1 installed fine!

******************************************************************
 Done. The new package has been installed and saved to

 /usr/local/src/hdf5-1.8.15-patch1/hdf5-1.8.15_1.8.15-patch1-1_amd64.deb

 You can remove it from your system anytime using:

      dpkg -r hdf5-1.8.15
*******************************************************************

I'm still not really sure if that's where I wanted it installed, but this stackexchange answer seems to indicate it is an appropriate place and I think section 4.6 in the hdf5 instructions agrees.

Step 4.5 in the hdf5 instructions suggests testing the library, so I ran "make check" and waited for it to go through its paces, which took about three minutes.  It didn't return any errors so I appear to have installed it!   Soon I will compile bgc and see if that works; presumably if the hdf5 install didn't go right in non-obvious way, I will find out then.