Monday, January 13, 2014

Melting your data with FIRE!

Actually, it's just with reshape2, a package for restructuring your data.  This is handy if you have an analysis that requires data in a format other than how you have it, or if you are given data in a format not to your liking.  So far my main use of this has been to shift my data for profile analysis (multivariate repeated measures).

First, let's generate a dataset.

groups<-rep(c("group1","group2","group3","group4"),10)
individual<-rep(1:10,each=4)
levels<-rep(c("A","B"),each=20)
response<-c((rnorm(20, mean = 2.3, sd = 1)),rnorm(20, mean = 4.7, sd = 2.3))

#Generating some random normal distributions using rnorm.
#That first number is the length (i.e., how many numbers you want in each set).
(original.data<-data.frame(individual,levels,groups, response))

#So, here's how I have my data to begin with:


#   individual levels groups   response
#1           1      A group1  3.9294112
#2           1      A group2  1.7416195
#3           1      A group3  3.2474873
# etc. to row 40 (40 observations of 10 individuals with
#4 observations each and 1 observation per group).


#Use str(original.data) to check if the response variable is numeric and the others are factors.
str(original.data) #yep!

#And here's what I want it to look like:


#   individual levels    group1    group2   group3   group4
#1           1      A  2.510969 2.6601334 2.968813 4.294844
#2           2      A  3.096240 2.5438054 2.316189 2.561755
#3           3      A  2.641018 1.7682642 1.540212 1.674456
#4           4      A  3.720303 1.7542154 2.829152 3.165893
#5           5      A  2.339273 0.7974890 3.084249 3.243764
#6           6      B  6.960439 8.5711567 5.805974 7.056487
#7           7      B  5.642962 4.5452340 6.922783 5.644429
#8           8      B  4.449354 6.0973501 5.214315 3.128638
#9           9      B  5.358301 5.5186123 6.792033 5.655760
#10         10      B -2.564175 0.8425047 7.772034 5.613354

 
#Start the reshape2 package.
library(reshape2)

melt.data<-melt(original.data,id=c("levels", "groups", "individual"),
  measured=c("response")) #variable measured is response
melt.data #show data



#   levels groups individual variable      value
#1       A group1          1 response  0.8034790
#2       A group2          1 response  2.7244518
#3       A group3          1 response  2.0623880
#Etc. on to 40 rows.

melt.data$variable<-melt.data$groups
#Set the variable here.  "groups" is the factor that we want to make #up our four new columns.


melt.data
#Here is what melt.data looks like now:


#   levels groups individual variable      value
#1       A group1          1   group1  0.8034790
#2       A group2          1   group2  2.7244518
#3       A group3          1   group3  2.0623880
#Etc. on to 40 rows.
  
data.columns<-dcast(melt.data, individual+levels~variable,mean)
#List the factors that do NOT go as the new columns.
#Because we want groups to be the new columns, you list only individual and levels.
#Then after ~ put "variable"and that you want the means of this measurement for each combination of individual and levels.
#In this example, you just have one observation for each group in each individual in each level,
#so it should be the same as your original data.
#If you took more than one observation for each individual in each level and each group,
#and didn't include it in the list with individual and levels,
#using mean would allow you to average across those.
#This might happen if you took multiple observations and hadn't averaged them yet,
#or had an additional factor that you are not examining in this analysis.

#Use dcast if you want a data frame as the resulting object;
#use acast if you want a matrix or vector.
#("cast" is the old function from the original reshape package;
#with the old function you had to specify data.frame if you wanted a data frame).


data.columns
#shows your dataset, now with measurements for each individual in a row,

#with one column each for group1, group2, group3, and group4
#instead of classified by factors in the old "groups" column.

#It's just like we wanted at the beginning!


#Did you start with data in the column form?
#Or do you want to put your data back the way it was 

#but now as means for each combination of factors?
#For example, one use I've found for this is to 
#put my data in the multi-column form, use na.omit
#to get rid of any individuals with missing data,
#and then convert back to long form.
 
data.longform<-data.frame(melt(data.columns, id=c("individual","levels")))
data.longform #view the dataset back in a long one-column form.



   individual levels variable      value
#1           1      A   group1  2.5109692
#2           2      A   group1  3.0962398
#3           3      A   group1  2.6410184
#4           4      A   group1  3.7203027

#Etc. until row 40 (end of data).

No comments:

Post a Comment

Comments and suggestions welcome.