Quantifying the present and predicting the past : theory, method, and application of archaeological predictive modeling

post3 head

The volume entitled, “Quantifying the present and predicting the past: theory, method, and application of archaeological predictive modeling” published in 1989 by the U.S. Bureau for Land Management (BLM) is the foundation for all approaches to Archaeological Predictive Modeling (APM).  Past or present, no single volume goes into greater depth on the true underlying issues associated with APM or attempts to synthesize the variety of basic approaches to the data.  I received my first copy from the back of  a BLM warehouse 1999 to 2000; but I have recently run across these scanned copies online.  A PDF is available at this link from the BLM Quantifying the Present and Predicting the Past [PDF] and also an  Alternative Version on Archive.org that allows for download in a variety of formats and is searchable. (the Archive.org link is part of the FEDLINK library and BLM collection that have a ton of great hard to find publications from days gone by)

This most ambitious undertaking was complied and edited by a large team of archaeological All-Stars (Sebastian, Kvamme, Kohler, Altschul, et al.) and sought to identify the fundamental questions the APM should address, the issues that are specific to our data, and methods to implement these models.  This volume is a synthesis of the previous decades worth model based archaeology that had developed out of “New Archaeology” and was directed by the needs of large-are land surveys for the newly formed Cultural Resources Management (CRM) industry. Editorially, I will suggest that this was at the peak of APM study as a central theme that such leaders in the field would devote themselves to.  In the years following, easy access to computer core cycles and push-button GIS made glossing over the details of these methods all to easy.  (I am a guilty contributor to this cycle, so no finger pointing here.)

From the methodological side, the technical details in this volume are not particularly sexy or exciting.  Logistic regression and PCA (Kvamme, Chapter 8) are not nearly as cool to talk about as Machine Learning and Deep Convolutional Neutral Nets, but they are incredibly important to understanding how we attempt to model site locations and are still appropriate models in many situations.  That is not to say that modern statistical learning algorithms are not important of have good performance, but just the notion that you cannot ignore the past (bad pun intended). I has to be recognized that this book was released when personal computers were relatively available andModel Output in ASCII desktop GIS was just getting off the ground. As such, model output that use ASCII characters as indicators of raster quantities was very cool for its day.  Still pretty cool in a retro sort of way.  While the modeling methods may seem outdated, the content in every chapter is well worth the read.  If not only for learning the information the first time, but also for understanding the basis for most of the APM literature from the decades to follow.

For example, many people recite Kvamme’s (pg. 327-328) basic archaeological assumptions for location based models, but wrap them in odd alternative meanings that are not likely intended.  Read this and you will get the original context and better understanding. Speaking of Kvamme, you also get to read his introduction the “Kvamme Gain”; the most widely applied (and mis-applied) measure of an archaeological models predictive ability. (Having studied many of the modern day metrics for classification models I’ll say that for as simple as it is, it is quite a good measure and holds up very well over time)

Kvamme Gain

I recommend reading this whole damn thing cover to cover.  But, say you don’t have a few extra weeks of spare time and are looking for a place to start, I recommend Sebastian’s discussion of correlative and explanatory models (I have MUCH more to say about this, but will hold my tongue for now); dip into Kholer’s contextual discussion in Chapter 2; the intro and model types of Altschul (Chapter 3); hit some high points in Ebert and Kholer; become very cozy with Chapter 5 with Rose and Altschul; read every word of Chapter 7 by Kvamme; and give some serious thought to what Judge and Martin say in Chapter 12.  Chapters 8 through 11 are very interesting, but not as critical as the others.

Quantifying the present and predicting the past : theory, method, and application of archaeological predictive modeling

Arbitrary Combined Weights (ACW) Archaeological Sensitivity Model – The Basics

header

In archaeology, there is a strong need and desire to project our understanding of where we know sites are located to areas where no one has yet looked for sites.  The practical extension of this desire is to use this technique to better protect and manage archaeological sites by identifying areas that have a heightened sensitivity to the potential for archaeological material.  Such an approach is necessary because time and funding will seldom allow archaeologists to survey 100% of a study area.  The manner in which observational data is projected onto the broader landscape can take on many forms; qualitative and quantitative.  The universe of approaches to do so are typically considered under the umbrella of “Archaeological Predictive Modeling” (APM).

This blog will contain many different takes on APM and discuss many different techniques.  As a first post, what follows is a description and code for the most basic and most commonly employed form of quantitative APM; the Arbitrary Combined Weights model (ACW). Simply, the ACW model assigns quantitative weights to values of one or more explanatory variables that are then combined in some mathematical form in order to signify a relative level of archaeological sensitivity.  Typically, a continuous variable is binned based on its characteristics (e.g. quantiles, natural breaks) or based on the investigators experience and a weight of sensitivity is assigned to each bin. Discreet variables may have assigned to each class or groups of class.  The assigned weight may be may be on any scale, such as 0 to 1 or 0 to 100, but almost always consider a greater weight value to indicate a higher sensitivity for archaeological material.  Weights may be assigned indicate the sensitivity of each bin individually or distributed across the bins relative to each other.

Once weighted, the combination can take any arithmetic form that fits the models intention.  The most common ways to do so include summing weights or averaging weights.  Many other linear combinations are possible and include adding a higher level of weights to each variable. As such, the weights within a variable are adjusted relative to the overall importance the investigator places on each variable as a whole.  The overall point goal of this method is to identify the intersection of variables that contain the highest combination of weights and ostensibly, the highest sensitivity for archaeological material.  The method described here is essentially the very basic Weighted Sum Model used in the Multi-Criteria Decision Analysis field.  That is a decent place to explore.

Clearly, the ACW method is very unstructured and leaves a great deal of room for error and interpretation.  Despite this, it is so commonly used because it is easy to describe, easy to implement with or without GIS, and coherent with our basic conceptual models of site distribution.  With this ease comes many reasons why ACW is a poor representation of archaeological sensitivity, highlighted by the arbitrary nature of bins and weights, ad hoc arithmetic, lack of a measure for variable contribution, and no inferential ability to state a few. However, despite a laundry list of theoretical and quantitive incompatibilities, the ACW model will continue to be extensively used throughout archaeology and deserves analysis. This post implements a schematic version of the ACW model in R code and explores some of the output.  This post will be followed up by deeper exploration and extensions of this basic model.

This code does most of the work with base R functions, but uses ggplot2, reshape2, and scales in order to format data for plotting

require(ggplot2)
require(reshape2)
require(scales)
## Loading required package: ggplot2
## Loading required package: reshape2
## Loading required package: scales

The first step here is to create a table of the bins and weights for each of three variables, h20_cls for distance in a water source in meters, slp_cls for percent topographic slope, and soil_cls_lbl for soil drainage class.  The objects of h20_wgt, slp_wgt, and soil_wgt are the weights associated to each bin. For example a weight of 0.85 is assigned to distances of 0 to 100 meters from a source of water.  A weight of 0.95 is assigned to slopes between 0% to 3%, followed by a weight of 0.85 for slopes of 3% to 5%.

# weights as probability within bin
h20_wgt <- c(0.85, 0.75, 0.45, 0.25, 0.1)
h20_cls <- c(0,100,300,500,900,9999)
slp_wgt <- c(0.95, 0.85, 0.65, 0.2, 0.1)
slp_cls <- c(0,3,5,8,15,999)
soil_wgt <- c(0.30, 0.65, 0.85)
soil_cls <- c(-1,0,1,2)
soil_cls_lbl <- c("poor", "moderate", "well")

As described in the model methods above, the bins and weights and bins are based on the judgement of the investigator and likely have grounding in real-world observation, they are not directly resulting from an analysis of a set of known site locations.  The explicit use of known sites to derive bin widths or weights are considered proportionally weighted models.

Example of Weights: Distance to Water Source
from to weight
0 100 0.85
100 300 0.75
300 500 0.45
500 900 0.25
900 9999 0.1

Next, I simulate some environmental data to apply my weights to.  Typically, this would be done by extracting raster cell values or by using the raster calculator in a GIS program.  The variables below draw values of slope and h20 distance with an exponential probability to simulate the high correlation that is often found in nature. In R, the raster package can be used to extract values from rasters and preform the following analysis.

# simulate a landscape. this would typically be enviro rasters
slope_sim <- sample(seq(0:100),1000, replace = TRUE, prob = dexp(seq(0:100),0.1))
h20_sim  <- sample(seq(0:5000),1000, replace = TRUE, prob = dexp(seq(0:5000),0.005))
soil_sim <- sample(c(0, 1, 2), 1000, replace = TRUE, prob = c(0.25, 0.5, 0.25))

The plot below shows the distribution and correlation of simulated data points for log slope and log water, faceted by soil drainage.

plot1

(code for above plot)

sim_landscape <- data.frame(slope = slope_sim, h20 = h20_sim, 
                            soil = factor(soil_sim, labels = soil_cls_lbl))
ggplot(data = sim_landscape, aes(x = log(slope), y = log(h20), color = soil)) +
  geom_point() +
  facet_grid(soil~.) +
  theme_bw() +
  ylab("Log Distance to H20 (meters)") +
  xlab("Log Slope (degrees)") +
  theme(legend.title = element_blank(),
        legend.title = element_blank(),
        axis.title = element_text(face = "bold"),
        strip.text.y = element_text(face = "bold", size = 12),
        strip.background = element_rect(colour = "gray10", fill = "gray90"),
        legend.position = "none") +
  scale_color_brewer(palette="Paired")

Then distribution of values extracted from the explanatory rasters are then recoded to the weight values based on the predetermined bins.  The cut function is used to recode. as.numeric(as.character(.)) is used as a bit of a hackey way to convert the factors resulting tom cut back to numeric values.

# reclass vector
h20_reclass     <- cut(h20_sim, breaks = h20_cls, labels = h20_wgt)
slope_reclass <- cut(slope_sim, breaks = slp_cls, labels = slp_wgt)
soil_reclass <- cut(soil_sim, breaks = soil_cls, labels = soil_wgt)
h20_reclass   <- as.numeric(as.character(h20_reclass))
slope_reclass <- as.numeric(as.character(slope_reclass))
soil_reclass <- as.numeric(as.character(soil_reclass))[

var_matrix <- data.frame(slope_wght = slope_reclass, 
                         h20_wght = h20_reclass,
                         soil_wght = soil_reclass)

The plot below shows the destiny of weight values across each variable from the simulated landscape.  Notice the peaked density for high sensitivity values of distance to water compared to generally low density and even densities for slope.  This results form the interaction of arbitrary weights, bins, and the nature of the landscape being modeled.

plot2

(code for above plot)

var_matrix_melt <- melt(var_matrix)
ggplot(data = var_matrix_melt, aes(x = value, fill = as.factor(variable))) +
  geom_density(binwidth = 0.05) +
  xlab("Sensitivity Weight") +
  facet_grid(variable~.) +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    axis.title = element_text(face = "bold"),
    strip.text.y = element_text(face = "bold", size = 12),
    strip.background = element_rect(colour = "gray10", fill = "gray90"),
    legend.position = "none")+
  scale_fill_brewer(palette="Paired") +
  scale_x_continuous(breaks=seq(0,1,0.1))

The final step in the model is to combine the weights across variables in to form an overall sensitivity value.  As described above, this can be done in any numbers of ways, but typically is a simple sum or mean of weights.  There is little guiding theory to help inform what should be done here, but the intention is to show locations where more desirable attributes intersect.  Mean and Median have some attraction because the final sensitivity is on the same [0,1] scale. However, these turn into essentially sample means/medians and have a Central Limit Theorem effect where all values trend toward the mean. This is because the are many likely ways to combine values to equal 0.5, but many fewer likely combinations that equal 0.9. Summing the values has a similar issue and also changes the scale of the combined sensitivity based on the number of variables in the model. These issues are some of the most important to understand if working with ACW. The plots below show how the final sensitivity value distribution changes based on the method used to construct it; Sum, Mean, or Median.

# calculate aggregations of sensitivity values
var_matrix$sum <- apply(var_matrix[ , 1:3], 1, sum)
var_matrix$mean <- apply(var_matrix[ , 1:3], 1, mean)
var_matrix$median <- apply(var_matrix[ , 1:3], 1, median)

plot3

(code for above plots)

results_melt <- melt(var_matrix[,c("sum","mean","median")])
ggplot(results_melt, aes(x = value, fill = as.factor(variable))) +
  geom_density() +
  xlab("Sensitivity Weight") +
  facet_grid(variable~.) +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    axis.title = element_text(face = "bold"),
    strip.text.y = element_text(face = "bold", size = 12),
    strip.background = element_rect(colour = "gray10", fill = "gray90"),
    legend.position = "none")+
  scale_fill_brewer(palette="Paired") +
  scale_x_continuous(breaks=seq(0,3,0.1))

These final plots illustrate the densities of final weights for the various combination methods relative to the weights of each variable. Notice the the sum method produces a more diverse range of values that peaks towards 0.5, which is a region of low density for all three variables. The mean function of sensitivities results in a similar pattern, but less dispersed, yet still aggregated towards the center. Finally, the median function of aggregation has a higher fidelity to the peak densities of weights per each variable.  Which of these is the best method is the best?  Well, there is probably not a good answer to that beyond either your theory or cross-validated prediction error.  However, if you had the former you would probably not use the ACW model because there are better approaches to model theories, and if you attempted the latter you would have a larger data set and could use statistical models with a better mathematical foundation.  That being said, there are times when small amounts of data, limited amounts of time, and the need to convince a skeptical audience require a model that is easy to create and explain; such as the ACW model.  Much more on this at a later date.

plot4

plot5

plot6

(data munge for above plots)

compare_weights_plot <- rbind(results_melt, var_matrix_melt)
compare_weights_plot$variable <- as.character(compare_weights_plot$variable)
compare_weights_sum_plot <- compare_weights_plot[which(compare_weights_plot$variable %in%
                              c("slope_wght","h20_wght","soil_wght")),]
compare_weights_sum <- compare_weights_plot[which(compare_weights_plot$variable %in%
                              c("sum")),]
compare_weights_sum$value <- scales::rescale(compare_weights_sum$value,to = c(0, 1))
# function(x){(x-min(x))/(max(x)-min(x))}
compare_weights_sum_plot <- rbind(compare_weights_sum_plot, compare_weights_sum)
compare_weights_mean_plot <- compare_weights_plot[which(compare_weights_plot$variable %in%
                              c("mean","slope_wght","h20_wght","soil_wght" )),]
compare_weights_median_plot <- compare_weights_plot[which(compare_weights_plot$variable %in%
                              c("median","slope_wght","h20_wght","soil_wght")),]

compare_weights_sum_plot$variable <- factor(compare_weights_sum_plot$variable,
                        levels =  c("slope_wght","h20_wght","soil_wght","sum"))
compare_weights_mean_plot$variable <- factor(compare_weights_mean_plot$variable,
                        levels =  c("slope_wght","h20_wght","soil_wght","mean"))
compare_weights_median_plot$variable <- factor(compare_weights_median_plot$variable,
                        levels =  c("slope_wght","h20_wght","soil_wght","median"))

(code for above plots)

ggplot(data = compare_weights_sum_plot, aes(x = value, color = variable,
                                             fill = variable, alpha = variable)) +
  geom_density() +
  scale_alpha_manual(values = c("0","0","0","0.2")) +
  xlab("Sensitivity Weight") +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    axis.title = element_text(face = "bold"))+
  scale_fill_brewer(palette="Paired") +
  scale_color_brewer(palette="Paired") +
  scale_x_continuous(limits=c(0, 1), breaks=seq(0,1,0.1))
ggplot(data = compare_weights_mean_plot, aes(x = value, color = variable,
                                             fill = variable, alpha = variable)) +
  geom_density() +
  scale_alpha_manual(values = c("0","0","0","0.2")) +
  xlab("Sensitivity Weight") +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    axis.title = element_text(face = "bold"))+
  scale_fill_brewer(palette="Paired") +
  scale_color_brewer(palette="Paired") +
  scale_x_continuous(limits=c(0, 1), breaks=seq(0,1,0.1))
ggplot(data = compare_weights_median_plot, aes(x = value, color = variable,
                                             fill = variable, alpha = variable)) +
  geom_density() +
  scale_alpha_manual(values = c("0","0","0","0.2")) +
  xlab("Sensitivity Weight") +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    axis.title = element_text(face = "bold"))+
  scale_fill_brewer(palette="Paired") +
  scale_color_brewer(palette="Paired") +
  scale_x_continuous(limits=c(0, 1), breaks=seq(0,1,0.1))
Arbitrary Combined Weights (ACW) Archaeological Sensitivity Model – The Basics

Plotting Shovel Test Unit Profiles in R with the aqp package

showing the inclusion of densities.
Examples of STU profiles

The code in the following post was first generated to find a quick way to create a large number of Shovel Test Unit (STU) soil profiles with some consistency and a few added features. This methodology uses the aqp package in R as the base function to generate the profiles and data in the form of a *.csv file. This package was originally developed to work with geology or soil science data, but we adapted it a bit for our purpose. Subsequent to developing this initial code, others in the office further developed the function to fit their work flow and profiles. The basics are here and adapting it is pretty straight forward if you dive into the guts of the amp package and read the docs.

Features shown here include converting Munsell colors to hex, artifact densities, exporting as PDF and a few other tweaks. Hopefully, you will find it useful!

aqp package http://aqp.r-forge.r-project.org/

first thing is to load the packages. The first time you do this you will need to install the packages. After that, you can just import them.

# this is the main package for profiles
install.packages("aqp") 
# these may come in handy
install.packages("RColorBrewer")
install.packages("latticeExtra")
install.packages("plyr")
install.packages("reshape")

Once the packages are installed, we need to import tham into the R environment as such

require(aqp)
require(RColorBrewer)
require(latticeExtra)
require(plyr)
require(reshape)

Now these packages are imported. That means we can use all the functions and data the come with these packages. Google searches will return documentation on these packages and tell you want functions they contain and some examples.

Data

time to load the data

# this sets the working directory to a folder that contains your data and will
# contain your output.  Setting this is not mandatory, but makes it easier to get
# and put data with less typing [Notice that the slashes in the URL as forward /,
# not the standard back slash \]
setwd("[Your File Location or Working Directory]")

# in this working directory, I made a folder called data to hold the 
# CSV of STU profiles
# I read this using the read.csv function and assign it to the variable "dat"
dat <- read.csv(paste(c(getwd(), "/data/Soil_Profiles.csv"), collapse=''), 
                stringsAsFactors = FALSE)[,-1]

# that "paste" fucntion concatenates the working directory and the 
# file location and name
# set "stringsAsFactors = FALSE" tells the functin to leave text fields as text 
# in the data
# the "[,-1]" at the end drops the index column that is automatically added

Let’s see out data

# this will print the whole thing.  use head(dat,#) where # is replaced with the
# number of rows you want to print
print(dat)
##         id top bottom name    p1 soil_color
## 1  Pole 13   0     19   Ap 6.963    #5E4423
## 2  Pole 13  19     55    B 5.951    #815A21
## 3  Pole 14   0     24   Ap 5.245    #3D2F21
## 4  Pole 14  24     45   Ab 5.069    #5E4423
## 5  Pole 14  45     50    E 5.069    #BEAD97
## 6  Pole 14  50     60    B 5.338    #B88E50
## 7  Pole 22   0     19   Ap 6.454    #7A5C36
## 8  Pole 22  19     30   Ab 5.844    #554636
## 9  Pole 22  30     38    B 5.437    #94764F
## 10 Pole 22  38     50    C 5.181    #9D7337
## 11 Pole 37   0     31   Ap 5.019    #7A5C36
## 12 Pole 37  31     63  Ap1 5.283    #554636
## 13 Pole 37  63     90  Ap2 6.440    #5E4423
## 14 Pole 37  90    117    B 6.670    #94764F
# the "str" function reports the structure of the data
str(dat)
## 'data.frame':    14 obs. of  6 variables:
##  $ id        : chr  "Pole 13" "Pole 13" "Pole 14" "Pole 14" ...
##  $ top       : int  0 19 0 24 45 50 0 19 30 38 ...
##  $ bottom    : int  19 55 24 45 50 60 19 30 38 50 ...
##  $ name      : chr  "Ap" "B" "Ap" "Ab" ...
##  $ p1        : num  6.96 5.95 5.24 5.07 5.07 ...
##  $ soil_color: chr  "#5E4423" "#815A21" "#3D2F21" "#5E4423" ...
# class tells us what class the object "dat" is
class(dat)
## [1] "data.frame"
# summary gives us a look into the range of the data
summary(dat)
##       id                 top            bottom          name          
##  Length:14          Min.   : 0.00   Min.   : 19.0   Length:14         
##  Class :character   1st Qu.: 4.75   1st Qu.: 30.2   Class :character  
##  Mode  :character   Median :27.00   Median : 47.5   Mode  :character  
##                     Mean   :29.21   Mean   : 49.4                     
##                     3rd Qu.:43.25   3rd Qu.: 58.8                     
##                     Max.   :90.00   Max.   :117.0                     
##        p1        soil_color       
##  Min.   :5.02   Length:14         
##  1st Qu.:5.20   Class :character  
##  Median :5.39   Mode  :character  
##  Mean   :5.71                     
##  3rd Qu.:6.32                     
##  Max.   :6.96

Label

Add a label with hoirzon and munsell color to data.frame

dat <- within(dat,  label <- paste(name, soil_color, sep=" - "))

print(dat)
##         id top bottom name    p1 soil_color         label
## 1  Pole 13   0     19   Ap 6.963    #5E4423  Ap - #5E4423
## 2  Pole 13  19     55    B 5.951    #815A21   B - #815A21
## 3  Pole 14   0     24   Ap 5.245    #3D2F21  Ap - #3D2F21
## 4  Pole 14  24     45   Ab 5.069    #5E4423  Ab - #5E4423
## 5  Pole 14  45     50    E 5.069    #BEAD97   E - #BEAD97
## 6  Pole 14  50     60    B 5.338    #B88E50   B - #B88E50
## 7  Pole 22   0     19   Ap 6.454    #7A5C36  Ap - #7A5C36
## 8  Pole 22  19     30   Ab 5.844    #554636  Ab - #554636
## 9  Pole 22  30     38    B 5.437    #94764F   B - #94764F
## 10 Pole 22  38     50    C 5.181    #9D7337   C - #9D7337
## 11 Pole 37   0     31   Ap 5.019    #7A5C36  Ap - #7A5C36
## 12 Pole 37  31     63  Ap1 5.283    #554636 Ap1 - #554636
## 13 Pole 37  63     90  Ap2 6.440    #5E4423 Ap2 - #5E4423
## 14 Pole 37  90    117    B 6.670    #94764F   B - #94764F

Conversion

Now we need to put these data in the format that is used by the aqp package to do the cool stuff. This “depths” function takes the csv data and converts is into a object of class “SoilProfileCollection” for future use. By doing so, the aqp package conforms that data to a type it knows well and can use in the functions the package provides.

# the depth function uses the id field in the data to set the STU number
# it uses the top to set the top of the STU and bottom for the... well the bottom.
depths(dat) <- id ~ top + bottom

# now we see that the class of dat has been changed from data.frame to "SoilProfileCollection"
str(dat) # you can see that the original data is now within this object with other attributes
## Formal class 'SoilProfileCollection' [package "aqp"] with 7 slots
##   ..@ idcol     : chr "id"
##   ..@ depthcols : chr [1:2] "top" "bottom"
##   ..@ metadata  :'data.frame':   1 obs. of  1 variable:
##   .. ..$ depth_units: chr "cm"
##   ..@ horizons  :'data.frame':   14 obs. of  7 variables:
##   .. ..$ id        : chr [1:14] "Pole 13" "Pole 13" "Pole 14" "Pole 14" ...
##   .. ..$ top       : int [1:14] 0 19 0 24 45 50 0 19 30 38 ...
##   .. ..$ bottom    : int [1:14] 19 55 24 45 50 60 19 30 38 50 ...
##   .. ..$ name      : chr [1:14] "Ap" "B" "Ap" "Ab" ...
##   .. ..$ p1        : num [1:14] 6.96 5.95 5.24 5.07 5.07 ...
##   .. ..$ soil_color: chr [1:14] "#5E4423" "#815A21" "#3D2F21" "#5E4423" ...
##   .. ..$ label     : chr [1:14] "Ap - #5E4423" "B - #815A21" "Ap - #3D2F21" "Ab - #5E4423" ...
##   ..@ site      :'data.frame':   4 obs. of  1 variable:
##   .. ..$ id: chr [1:4] "Pole 13" "Pole 14" "Pole 22" "Pole 37"
##   ..@ sp        :Formal class 'SpatialPoints' [package "sp"] with 3 slots
##   .. .. ..@ coords     : num [1, 1] 0
##   .. .. ..@ bbox       : logi [1, 1] NA
##   .. .. ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   .. .. .. .. ..@ projargs: chr NA
##   ..@ diagnostic:'data.frame':   0 obs. of  0 variables
class(dat)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
summary(dat)
##                Length                 Class                  Mode 
##                     4 SoilProfileCollection                    S4

Plotting

Whats left to do is to plot! The plot function taks the “SoilProfileCollection” object called dat and uses the “name” field in the data to assign soil horizons… simple as that!

plot(dat, name = 'label')
Basic profile with correct munsell colors
Basic profile with correct munsell colors

In R studo this can be save to a PDF in the plot window, but in basic R, you can use the following

# this will save a pdf called "profiles" to your working directory 
# after you run "dev.off()"
pdf("profiles.pdf")
plot(dat, name = 'label')
dev.off()
## pdf 
##   2

Adding Munsell colors from field notation (e.g. “10YR3/2”)

# lets make up some example data.  Put the Hue, Value, and Chroma into 
# seperate colums
# this should be included in the original CSV probably
hue <- c("10YR", "10YR", "5YR", "2.5YR", "2.5YR")
valu <- c(3, 5, 4, 6, 2)
chroma <- c(2, 4, 4, 2, 3)
# We join these together into a data.frame called munsell 
# note "stringsAsFactors = FALSE"
munsell <- data.frame(hue, valu, chroma, stringsAsFactors = FALSE)
print(munsell)
##     hue valu chroma
## 1  10YR    3      2
## 2  10YR    5      4
## 3   5YR    4      4
## 4 2.5YR    6      2
## 5 2.5YR    2      3
# now use the "munsell2rgb" function to convert by giving it the three column 
# of hue, value, chroma.  This is added as a field called "color" to the munsell data.frame
munsell$color <- munsell2rgb(munsell$hue, munsell$valu, munsell$chroma)

#see!
print(munsell)
##     hue valu chroma     color
## 1  10YR    3      2 #554636FF
## 2  10YR    5      4 #947650FF
## 3   5YR    4      4 #805840FF
## 4 2.5YR    6      2 #A79086FF
## 5 2.5YR    2      3 #482A22FF

Artifact Counts, fractions, and depth of cultural material

Reloading the original csv data here to add some fake artifact counts to it

# reload data
dat <- read.csv(paste(c(getwd(), "/data/Soil_Profiles.csv"), collapse=''), 
                stringsAsFactors = FALSE)[,-1]
# add field for count of jasper flakes
dat$jasper <- sample(1:20, nrow(dat), replace=TRUE)
# a little function to compute the proportion of flakes in each horizon 
# (e.g 50% in Ap)
dat$jasperpcnt <- unlist(sapply(split(dat, f = dat$id), 
                  function(x) (x$jasper/sum(x$jasper)*100), simplify=TRUE))
# add a label field that has the horizon and jasper flake count
dat <- within(dat, jasperlabel    &lt;- paste(name, jasper, sep=" - "))

# see the data
print(dat)
##         id top bottom name    p1 soil_color jasper jasperpcnt jasperlabel
## 1  Pole 13   0     19   Ap 6.963    #5E4423      3     42.857      Ap - 3
## 2  Pole 13  19     55    B 5.951    #815A21      4     57.143       B - 4
## 3  Pole 14   0     24   Ap 5.245    #3D2F21     18     42.857     Ap - 18
## 4  Pole 14  24     45   Ab 5.069    #5E4423      5     11.905      Ab - 5
## 5  Pole 14  45     50    E 5.069    #BEAD97      7     16.667       E - 7
## 6  Pole 14  50     60    B 5.338    #B88E50     12     28.571      B - 12
## 7  Pole 22   0     19   Ap 6.454    #7A5C36      3      6.977      Ap - 3
## 8  Pole 22  19     30   Ab 5.844    #554636     20     46.512     Ab - 20
## 9  Pole 22  30     38    B 5.437    #94764F     13     30.233      B - 13
## 10 Pole 22  38     50    C 5.181    #9D7337      7     16.279       C - 7
## 11 Pole 37   0     31   Ap 5.019    #7A5C36     20     38.462     Ap - 20
## 12 Pole 37  31     63  Ap1 5.283    #554636      2      3.846     Ap1 - 2
## 13 Pole 37  63     90  Ap2 6.440    #5E4423     18     34.615    Ap2 - 18
## 14 Pole 37  90    117    B 6.670    #94764F     12     23.077      B - 12
# convert to apq object 
depths(dat) <- id ~ top + bottom

# plot with jasper quantity as color and lable with counts
plot(dat, name = "jasperlabel", color = "jasper")

# add overlay symbolizing the percent of jasper in each horizon
addVolumeFraction(dat, "jasperpcnt")
crazy soil colors and artifact densities
Color by artifact densities
# add in brackets to show depth of cultural material (fake in this example case)
fake.tops <- rep(0, times=length(dat))
fake.bottoms <- runif(n=length(dat), min=0, max=20)
par(mar=c(0,0,3,0)) # tighter figure margins
plot(dat, name = "jasperlabel", color = "soil_color")
addVolumeFraction(dat, "jasperpcnt")
addBracket(fake.tops, fake.bottoms, col='red') # add depth brackets

showing the inclusion of densities.
showing the inclusion of densities.
Plotting Shovel Test Unit Profiles in R with the aqp package