r - Updating only certain values of data frame based on match -
i'm trying update variable (popsnp
) in higher scope within lapply
, on basis of match. can't quite figure out syntax updating values though, have overwrites existing values na
:
lapply(1:22, function(i){ in.name<-paste("/data/mdp14aps/ld/chr", i, ".ld", sep="") out.name<-paste("/data/mdp14aps/r/ldatachr", i, ".rda", sep="") ldata<-read.csv(in.name, sep="", header=true, colclasses=c(na,na,na,na,na,na,"null")) freq<-count(ldata, c("snp_a", "chr_a", "bp_a")) #the part i'm not sure popsnp$chrom<<-freq[match(popsnp$marker, freq$snp_a),2] popsnp$position<<-freq[match(popsnp$marker, freq$snp_a),3] popsnp$freq<<-freq[match(popsnp$marker, freq$snp_a),4] save(ldata,file=out.name) rm(ldata, freq) })
i want preserve values i'm setting between iterations of lapply
end popsnp
containing all values of chrom
, position
, freq
, not last iteration.
i feel should straightforward, i'm still unfamiliar r.
a toy example:
test<-data.frame(a = c("a", "b", "c", "d", "e"), b = c(rep(na,5))) test1<-data.frame(a = c("a", "b"), b = c(1, 2)) test2<-data.frame(a = c("c", "d", "e"), b = c(3, 4, 5)) test$b<-test1[match(test$a, test1$a), 2] test$b<-test2[match(test$a, test2$a), 2]
i want test$b
have values 1-5 in it.
update toy example
you need subset both sides of assignment, , convert conditions logical subsetting vectors.
logical1 <- !is.na(test1[match(test$a, test1$a),2]) # true/false logical2 <- !is.na(test1[match(test$a, test2$a),2]) test[t1,] <- test1[t1,] # selects true rows test[t2,] <- test2[t2,]
i recommend @ each element individually can see what's happening.
previously...
i'm not sure understand you're example trying accomplish. i'm going provide toy example of subsetting:
dat <- data.frame( = sample(letters[3:26],26,replace = true) b = runif(26) ) # replaces in column b column == "a" dat[dat$a == "c", "b"] <- 1 # dat$a == "c" returns true/false vector, "b" returns column "b".
best practice use true / false conditions while subsetting avoid future errors. subset row number, gets messy.
it's important note use of <<-
pushes change of variable parent environment, outside of scope of function. can lead unexpected results in future. it's better supply variable want change , return again @ end of manipulation function. way have clear sequence of events.
myfun <- function(x,y) { # ... stuff y return(y) } y <- myfun(x,y)
final update
lastly, respect dropping unnecessary columns. typical practice drop them after import name (best practice) or reference number (changes in data break this).
ldata[c('col1','col2',...)] <- null # drop
Comments
Post a Comment