Automate large tasks with an R script by Chris T

Time for my yearly post, it seems.

Lately I’ve been swimming in data at both SoundGuys and AA, so I’ve needed to make myself some tools to parse and apply functions to large numbers of discrete files. While it’s tempting to make macros in Excel, it can’t do this as quickly or as well as R scripting can.

That means you need to use a consistent approach to format your data in a way that R can easily use, especially if you’re using a package like ggvis or ggplot2 in order to visualize what you want.

However, you might be working with people who don’t know much about R, or you might not use these scripts often enough to retain minutia on how to batch-process CSV files. In that case, I’ve drawn up a basic program for you to use if you want to not go searching on stackexchange.

## Detect packages and install/load if missing
if (!require("tidyverse")) {
  install.packages("tidyverse", dependencies = TRUE)
if (!require("rstudioapi")) {
  install.packages("rstudioapi", dependencies = TRUE)
## point to source folder/where the program file lives

## read all files, add to a list for future processing
## define the filetype in the (pattern="*.extension") argument
temp <- list.files(pattern="*.csv")
for (i in 1:length(temp))
  assign(temp[i], read.csv(temp[i]))

## Use the list in the prior step to scrape together the data
## in the files identified by the list
myDB <-"rbind", lapply(temp, function(x) {
  dat <- read.csv(x, header=TRUE)
  ## Define column names here
  colnames(dat)[2] <- "Enter column name here"
  colnames(dat)[3] <- "Enter column name here"
  ## If you need to add which file the data came from, this line will
  ## add a column that identifies it
  dat$model <- tools::file_path_sans_ext(basename(x))

## Create a data frame with the columns not numbered
myDB <- select(myDB, -X)

## Output a CSV of the resulting data frame
write.csv(myDB, file = "YourFilenameHere.csv", row.names = FALSE)

Edit the column names to your preference and then save it. That way, you can instruct your colleagues to simply paste the file into the folder they want concatenated into one large file, and just hit run. Easy peasy, no thought required! The result will be one CSV that can then be read by R to use in Shiny apps, dataviz, or other applications.

Of course, this will only work on properly formatted files of the same column names, so be sure to have a method to process your files if you need to alter variations in format.

Logistic regression in individual test scoring by Chris T

If you've read any online reviews in the last, say, 10 years—you probably look for that bigass number in bold at the top and move along after seeing it's not a 10.

Some outlets are moving toward a system of objective scoring, but problems arise when limited datasets and auto-scaling creates anomalies with product rankings. Sometimes that 10/10 really is meaningless.

That's why philosophy is so important, and I'll cover just one of the ways I look at metrics here. 

For any test that can record results that rely on human perception to determine if something is "good" or "bad," you may want to use a logistic regression instead of a linear one or relational model in order to properly score a product. The truth is, there are lots of products out there that have wildly inflated scores on some review sites based on mathematically irrelevant differences in test readings. The truth is, humans aren't going to be able to discern the difference between screen black levels of 0.003cd/m^2 and 0.002cd/m^2. Similarly, it won't matter if a smartphone's peak brightness is 1,000cd/m^2 vs. 100,000cd/m^2, so it makes no sense to score these against a linear model—lest all other scores be rendered "shit" compared to a ridiculous outlier of zero utility. 

The benefit of a logistic regression is that we can set limits for scoring at the average human limits, and award points (or take them away) in an exponentially decreasing fashion. Thus, something that's slightly better than anything a human could see gets a 95/100, and something that's right at the limit gets 90/100, In the brightness example above, that outlier would get a 100/100 and push the "decent" result to 1/100 in a normal relational auto-scaling model. Instead of relating all scores to each other, we weigh the results against what someone could actually experience instead.

Let's look at screen brightness.

Brightness score (Regression)

X = screen brightness (cd/m^2), Y = score/100

Using an equation (f(x) = 100/1+200e^-(0.009*(screen brightness in cd/m^2))), we can make a chart that shows what the limits could be.

Looking at the chart, we can see that the inflection point is around 350cd/m^2 and the crest is around 850. That's no accident: 350cd/m^2 is the minimum brightness needed to see an image in direct sunlight. 800cd/m^2 is the threshold of pain in a well-lit room. While these aren't scientific limits, they're just the ones I chose for this illustration. 

Note how the score doesn't reward ludicrous screen brightnesses past a certain point? See how the algorithm keeps sub-350 readouts in the scoring basement? That's by design. We set acceptable limits for what people need based on the philosophy of the product. By establishing the necessary parameters, we can then score against them.

We also don't want to discourage reaching for better heights, so we do reward the brighter screens—though we also want to preserve the philosophical integrity of the system. If we were to score our original hypothetical, 100,000cd/m^2 would get 99.9/100, the 1,000cd/m^2 would get 98/100. Both are ultra-high levels, but one is completely ludicrous, the other is realistic.

The user is the object of the product, not the scoring algorithm, so replacing scoring that rewards beating the pack with scoring that observes the user's needs is the right way forward.

Flexing my brain... might. by Chris T

So I'm extremely happy at AndroidAuthority. Wanna know why? Because they let me be a damn expert and help educate rather than gloss over critical concepts.

To that end, I've been having a ball writing explainers on concepts relating to personal audio and imaging. Here are some of my best lately:

But I gotta say, my two standouts have to be How smartphone cameras work and What is isolation.

That's it for now, but more on the way!


Back in Tech Land by Chris T

Holy hell it sucks to work for yourself.

After freelancing and working for myself for the better part of a year, I'm back with a steady gig, this time at AndroidAuthority! Well, SoundGuys for now, but that may change as time goes on.

I'll miss cameras, but at least I'm back into doing product photography, and I'm much better at video work now. I wish my D600 hadn't bit the dust with a shard of its own mirror, but whaddaya gonna do. Anyways, onto content: