Quick post to archive my 2015 Society for American Archaeology (SAA) meeting presentation (San Francisco, CA). Slide Share is at bottom of post or download here: MattHarris_SAA2015_final
This presentation was all about the completion of the Pennsylvania Predictive model and some post-project expansion with a new testing scheme and the Extreme Gradient Boosting (XGB) classifier.
The presentation starts with a bit about the context for Archaeological Predictive Modeling (APM) and the basics of the machine learning approach. I call it Machine Learning (ML) here (because I was in San Francisco after all), but I generally think of it as Statistical Learning (SL). The slight shift in perspective is based on the focus of SL on the assumptions and statistical models, where as ML is more oriented to applied maths, comp sci., and DEEP LEARNING OF BIG DATA! Just depending on how you want to look at it.
The presentation moves to understanding the characteristics of archaeological data that make it unique and challenging. I think this is a critical area that gets so glossed over and offers so many excuses for us to not pursue model based approaches. Okay, yes, our data kind of stink most of the time. Let’s accept that, plan for it, and move along.
After my typical lecture on how the Bias/Variance trade-off should keep you up at night, I go into schematic descriptions of the learning algorithms: Logistic Regression, Multivariate Adaptive Regression Splines (MARS), Random Forest, and XGB. Then try to show how each algorithm, regardless of how “fancy”, can be conceptualized in a
“simple” way. The remainder of the presentation is a tour of prediction metrics for the four models applied to a portion of the state.
Unfortunately, this portion was only developed after the project had completed. This is partially because of the timing of the contract, but also because some of these methods were not developed until later in the project, and by that time, I needed to follow the same general methods that the project started with for consistency.
The two big take aways from this part of the presentation are that 1) XGB “won” the model bake-off as it led to the lowest error on independent site sample across most sub-regions and sub-samples. It was the most consistent and accurate (to the positive class) learner of the four; and 2) error can be viewed in two important ways, a) percent of observations within sites that are correctly classified and b) the percent of sites that are correctly classified. Since each site is recorded as a measurement of each ~10×10-meter cell in a site, our error measurement can go either way. If I say there is a 20% error rate, does that mean the 20% of each site is misclassified or that 20% of all sites are misclassified. That is a subtle, but important distinction. The methods here calculate both aspect and then combine both measures into a (poorly named) measure called Gain or Balance. The penultimate slide gives a bunch of views of these metrics across the entire study area.
All in all, I am relatively proud of this presentation in that it is the culmination of 2 years of intensive work that addressed many issues in APM that existed for 20 years. It got over that hump and found a bunch of new issues that are a bit more contemporaneous with SL/ML/general modeling approaches. Some interesting ways to view prediction error were developed, and they were visualized in a way that (at least to me) is pretty satisfying. Let me know what you think!