In order to do some modeling, I needed to make a training sample and a test sample from a larger data frame. Making the training sample was easy enough (see my earlier post), but I was going crazy trying to figure out how to make a second sample that excluded the rows I had already sampled in the first sample.
After trying out some options myself, looking extensively on the net, and asking for help on the r-help forum, I came up with the following function that finally does what I need it to do:
To summarize the function, you enter in the big data frame first (here termed “main.df”), then your first sample data frame that has the ID values that you want to exclude (here termed “sample1.df”, then your sample size, then the ID variable names in both data frames enclosed in quotes.
Functions like this certainly make my working life with R easier in preventing me from having to type in syntax like that every time I want that kind of a task done.