December 2017 – Quantitative Archaeology

See executable Rstudio notebook of this post at this Github repo!

I have had a thing lately with aggregating analyses into sequences of purrr::map() function calls and dense tibbles. If I see a loop – map() it. Have many outputs – well then put it in a tibble. Better yet, apply a sequence of functions over multiple model outputs, put them in a tibble and map() it! That is the basic approach I take here for modeling over a sequence of random hyperparameters (Hypurrr-ameters???); plus using future to do it in parallel. The idea for the code in this post came after an unsettled night of dreaming in the combinatoric magnitude of repeated K-folds CV over the hyperparameters for a multilayer perceptron. anyway…

“Correlated Hyperparameters”“ – Zoroaster Clavis Artis 1738 (inspired by @vboykis #devart)

This post is based on the hyperparameter grid search example, but I am going to use it as a platform to go over some of the cool features of purrr that make it possible to put such an analysis in this tibble format. Further, I hope this post gives people some examples that make the idea of purrr “click”; I know it took me some time to get there. By no means a primer on purrr, the text will hopefully make some connections between the ideas of list-columns, purrr::map() functions, and purrr:nest() to show off what I interpret as the Tidy-Purrr philosophy. The part about using future to parallelize this routine is presented towards the end. If already know this stuff, then skip to the code examples at the end or see this repo. However, if you are purrr-curious, give it a read and check out some of the amazing tutorials out in the wild. If you want to interact with the code and this Rmd file, head over to this Github repo where you can launch an instance of Rstudio server and execute the code!

Objective: Demonstrate an approach to randomly searching over model hyperparameters in parallel storing the results in a tibble.
Pre-knowledge: Beginner to Moderate R; introductory modeling concepts; Tidy/Purrr framework
Software: R 3.4.0, tidyverse 1.2.1 (contains tibble, dplyr, purrr, tidyr, and ggplot2), rsample 0.0.2, future 1.6.2

Optimizing hyperparameters

Hyperparameters are the tuning knobs for statistical/machine learning models. Basic models such as linear regression and GLM family models don’t typically have hyperparameters, but once you get into Ridge or Lasso regression and GAMs there are parts of the model that need tuning (e.g. penalties or smoothers). Methods such as Random Forest or Gradient Boosting, Neural Networks, and Gaussian Processes have even more tuning knobs and even less theory for how they should be set. Setting such hyperparameters can be a dark-art and require experience and a bit of faith. However, even with experience in these matters it is rarely clear what combination of hyperparameters will lead to the best out-of-sample prediction on a given dataset. For this reason, it is often desired to test over a range of hyperparameters to find the best pairing.

Continue reading “Hypurrr-ameter Grid Search with Purrr and Future” →

Quantitative Archaeology

thoughts, models, and ephemera

Month: December 2017

Hypurrr-ameter Grid Search with Purrr and Future

See executable Rstudio notebook of this post at this Github repo!

Optimizing hyperparameters