The basis of this post is the inTrees (interpretable trees) package in R [CRAN]. The inTrees package summarizes the typically complex result of tree based ensemble learners/algorithms into rule-based learners referred to as Simplified Tree Ensemble Learner (STEL). This approach solves the following problem: Tree based ensemble learners, such as RandomForest (RF) and Gradient Boosting Machines (GBM), often deliver superior predictive results compare to more simple decision tree or other low complexity models, but are difficult to interpret or offer insight. I will use an archaeological example here (because my blog title mandates it), but this package/method can be applied to any domain.
Oh Archaeology… We need to talk:
Ok, let’s get real here. Archaeology has an illogical and unhealthy repugnance to predictive modeling. IMHO, illogical because the vast majority of archaeological thought process are based on empirically derived patterns based on from previous observation. All we do is predict based on what data we know (a.k.a. predict from models, a.k.a. predictive modeling). Unhealthy, because it stifles creative thought, advancement of knowledge, and relevance of the field. Why is this condition so? Again, IMHO, because archaeologists are stuck in the past (no pun intended), feel the need to cling to images of self-sufficient site-based discovery, have little impetus to change , and would rather stick to our area of research then coalesce to help solve real-world problems. But, what the hell do I know.
What the heck does this have to do with tree based statistical learning? Nothing and everything. In no direct sense do these two things overlap, but indirectly, the methods I am discussing here have (once again) IMHO, the potential to bridge the gap between complex multidimensional modeling and practical archaeological decision making. Further, not only do I think this is true for archaeology, but likely for other fields that have significant stalwart factions. That is a bold claim for sure, but having navigated this boundary for 20 years, I am really excited about the potential for this method; more so than most any other.
Bottom lines is that the method discussed below, interpretable trees from rule-based learners in the inTrees package for R, allows for the derivation of simple-to-interpret rule based decision making based on empirically derived complex and difficult-to-interpret tree based statistical learning methods. Or more generally, use fancy-pants models to make simple rules that even the most skeptical archaeologist will have to reason with. Please follow me…