1. Creating the haplotypes
To get the haplotypes, I aligned my sequences in Geneious. Presumably you could do this in whatever your program of choice is. I made sure my sequences had appropriate sample numbers (to merge with attributes later in R). I trimmed the sequences so they matched in length. End gaps could be counted as the same length, but then you'll get lots of haplotypes that are simply different lengths of gaps at the end. I exported the data from Geneious as a .fasta file.
2. Creating the network
Now you have a .fasta file of sequences that are all the same length. I used code from this blog post to create the haplotype network. It worked great using functions in the R package pegas. The code there uses site to make up the slices of the pie, but you can also use other attributes (I used phenotype and transect) of the sampled individuals.
3a. Manually changing the colors
The haplotype network produced from step 2 has a rainbow color palette, which is pretty, but it may not fit with your desired color schemes or with black-and-white printing in a publication. Some of the colors can be hard to tell apart too. In my case, it was hard to interpret because I needed light and dark versions of two sets of birds within the haplotypes. I was having trouble with the igraph package that was suggested to edit colors, so eventually I decided to just export the figure as a scalable vector graphic (.svg).
I then edited the .svg file's colors in Inkscape, a free, open-source vector graphics editor. Once you open your figure in Inkscape, click on the figure. That selects it. Then you can right-click and find "ungroup" in the menu or go to the Object menu for "ungroup". Usually then you can select individual pieces of the figure, but if not, keep ungrouping pieces of the figure until you can.
Now, you should be able to grab pieces of the pie and change their color. You can do more than one by holding down the shift key while selecting additional pie slices and the legend (to make sure the legend color matches your pie slice color). Just double-click on the color you want in the bar on the bottom part of the screen. Repeat as needed for the other pie slices.
One problem I ran into was distinguishing a few of the colors on the original figure. You can use the ink dropper tool (press F7 or find it in the toolbar on the left side of the screen) to make sure you are correctly identifying particular colors as the same ones and not accidentally merging different sites or attributes in your final figure. By holding the ink dropper tool over a color, you show the color's hexadecimal code in a gray area below the color palette bar on the bottom of the screen.
From within Inkscape, you can then use "save as" to get your image in another vector graphic format (.ps or .eps) for publication, or use "export bitmap" (also in the File menu) to create a new raster graphic .png for publication or presentation.
3b. Updated 13 Feb. 2018: Automated colors!
I have figured out how to use the R package igraph to automate the coloring of the pie slices.
Unfortunately I haven't figured out how to put the tick marks back on for number of changes between haplotypes, but the line lengths are still proportional to number of changes. (If anyone knows how to do that, please let me know.)
Below is a complete example that will take you from creating the network to the final plot.
library(pegas)
library(igraph)
library(plyr)
library(reshape)
########################
#Step 1a. Load your alignment and use haploNet() to create the haplotype network.
#Load the pegas package sample data "woodmouse".
#This step is adapted from pegas help.
data(woodmouse)
set.seed(42)
x <- woodmouse[sample(15, size = 30, replace = TRUE), ]
h <- haplotype(x)
(net <- haploNet(h))
plot(net)
########################
#Step 1b. Have sites.
#I'm going to add in some arbitrary sample attributes that we'll use for pie chart testing.
#For your actual dataset, you would load a dataframe of localities for each sample.
#The samples and their sites should be in the same order as the samples in the file you used for haplotypes in 1a.
#Here, we randomly generate assignments to sites A, B, and C for the woodmouse dataset, which now has 30 samples.
sitenames <- c("A", "B", "C")
set.seed(84)
sites <- data.frame("site" = sample(sitenames,
size = 30,
replace = TRUE))
########################
#Step 2. Assign haplotypes to your site dataframe.
#This code is adapted from a blog post by Jimmy O'Donnell (jodonnellbio at gmail) with help from Kim Tenggardjaja.
#Original posting: http://jimmyodonnell.wordpress.com/2013/11/23/haplotype-networks-in-r-updated/
#Adaptations to this code by Claire Curry (curryclairem@gmail.com)
#Now, fill in the haplotypes for each site.
#You will have to do this with your real dataset.
#The site and haplotype dataframes must be in the SAME SAMPLE ORDER.
#Add empty column to sites dataframe for haplotypes
sites<-cbind(sites,
haplotype=rep(NA,
nrow(sites)))
#Assign haplotypes to the samples automatically using a loop
for (i in 1:length(labels(h))){
sites$haplotype[attr(h,
"index")[[i]]]<-i
}
########################
#Step 3. Convert to igraph format from haplonet.
#Here's my new adaptation to plot colors automatically in igraph, as suggested on the original blog post.
#Get igraph network
net.igraph <- as.igraph(net,
directed = FALSE, #Leave network arrows off as they are not relevant to our case.
use.labels = TRUE,
altlinks = FALSE) #only get main links between haplotypes or you'll end up with tons of extra lines
#Get the haplotype numbers in their weird order so you can order colors and sizes correctly.
newnums <- as.roman(as.character(V(net.igraph)$name))
V(net.igraph)$name <- newnums
########################
#Step 4.Count up haplotype frequencies and site frequencies.
#This code is adapted from a blog post by Jimmy O'Donnell (jodonnellbio at gmail) with help from Kim Tenggardjaja.
#Original posting: http://jimmyodonnell.wordpress.com/2013/11/23/haplotype-networks-in-r-updated/
#Adaptations to this code by Claire Curry (curryclairem@gmail.com)
#Count how many individuals for each haplotype
count.hap0<-ddply(sites,
.(haplotype),
summarise,
freq=length(haplotype))
#Be sure to add in our new ordering!
count.hap <- count.hap0[order(match(rownames(count.hap0),
newnums)),"freq"]
#for each plot, build frequency matrix that tells how to fill the pies by site frequency.
dfcount_site<-ddply(sites,
.(site, haplotype),
summarise,
freq=length(site))
dc.site00 <- cast(haplotype ~ site,
data = dfcount_site,
value = "freq",
fill = 0)
dc.site0 <-dc.site00
########################
#Step 5. Convert the site frequencies into the correct list format and the correct order to match the igraph object.
#Now, onto more igraph adaptations.
dc.site0$haplotype<-NULL #Remove the haplotype row before turning this into a matrix.
#Turn the haplotype frequencies into a matrix in the NEW IGRAPH ORDER.
dc.site1<-t(as.matrix(dc.site0[order(match(rownames(dc.site0),
newnums)),]))
#You must do this step to match the ordering of the cast dataframe haplotypes to the ordering used in igraph,
#otherwise the colors and sizes will be off.
#Then split the matrix out into a list.
dc.site <- split(dc.site1,
rep(1:ncol(dc.site1),
each = nrow(dc.site1)))
#Remember, if you go back to your haplotype counts to check that colors and sizes are correct,
#that we rearranged dc.site to be in the order that the haplotypes are stored
#You can check V(net.igraph)$name to see them again.
#dc.site0's haplotype column shows the haplotype numbers, but doing order(match() then puts dc.site1 and dc.site into
#the order the haplotypes are stored in the igraph network.
#Hence, we use the following step to make it more obvious by naming each list element with the haplotype number.
names(dc.site) <- V(net.igraph)$name
########################
#Step 6. Plot the network.
plot(net.igraph, #the igraph object is always a list of 10 so far as I can tell, regardless of how many haplotypes you used.
vertex.shape="pie",
vertex.pie = dc.site,
#vertex.pie specifies where you are getting the pie slice counts.
vertex.pie.color=list(c("black",
"orange",
"blue")),
#The colors are the number of categories you have, must be in a list, in the same order as your sites.
vertex.size = 5*(count.hap),
#number of samples for each haplotype, multiplied to make bigger if desired.
#vertex.label = NA)#, #for no vertex labels (ie no hapotype numbers added)
#vertex.label.cex = 1)#, #if you want bigger haplotype numbers.
vertex.label.dist = 2.5) #how far from pie you want the haplotype numbers)
#Add a legend.
legend(x= "bottomleft",
bty="n",
pt.bg=c("black",
"orange",
"blue"),
col="black",
pch=22,
legend=c("Site A",
"Site B",
"Site C"))
No comments:
Post a Comment
Comments and suggestions welcome.