Automate large tasks with an R script / by Chris T

Time for my yearly post, it seems.

Lately I’ve been swimming in data at both SoundGuys and AA, so I’ve needed to make myself some tools to parse and apply functions to large numbers of discrete files. While it’s tempting to make macros in Excel, it can’t do this as quickly or as well as R scripting can.

That means you need to use a consistent approach to format your data in a way that R can easily use, especially if you’re using a package like ggvis or ggplot2 in order to visualize what you want.

However, you might be working with people who don’t know much about R, or you might not use these scripts often enough to retain minutia on how to batch-process CSV files. In that case, I’ve drawn up a basic program for you to use if you want to not go searching on stackexchange.

## Detect packages and install/load if missing
if (!require("tidyverse")) {
  install.packages("tidyverse", dependencies = TRUE)
  library(tidyverse)
}
if (!require("rstudioapi")) {
  install.packages("rstudioapi", dependencies = TRUE)
  library(rstudioapi)
}
## point to source folder/where the program file lives
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

## read all files, add to a list for future processing
## define the filetype in the (pattern="*.extension") argument
temp <- list.files(pattern="*.csv")
for (i in 1:length(temp))
  assign(temp[i], read.csv(temp[i]))

## Use the list in the prior step to scrape together the data
## in the files identified by the list
myDB <- do.call("rbind", lapply(temp, function(x) {
  dat <- read.csv(x, header=TRUE)
  ## Define column names here
  colnames(dat)[2] <- "Enter column name here"
  colnames(dat)[3] <- "Enter column name here"
  ## If you need to add which file the data came from, this line will
  ## add a column that identifies it
  dat$model <- tools::file_path_sans_ext(basename(x))
  dat
}))

## Create a data frame with the columns not numbered
myDB <- select(myDB, -X)

## Output a CSV of the resulting data frame
write.csv(myDB, file = "YourFilenameHere.csv", row.names = FALSE)

Edit the column names to your preference and then save it. That way, you can instruct your colleagues to simply paste the file into the folder they want concatenated into one large file, and just hit run. Easy peasy, no thought required! The result will be one CSV that can then be read by R to use in Shiny apps, dataviz, or other applications.

Of course, this will only work on properly formatted files of the same column names, so be sure to have a method to process your files if you need to alter variations in format.