Building a ggplot2 Step by Step
I have included this viz on my blog before; as an afterthought to a more complex viz of the same data. However, I was splitting out the steps to the plot for another purpose and though it would be worth while to post this as a step-by-step how to. The post below will step through the making of this plot using R and ggplot2 (the dev version from Github). Each code chunk and accompanying image adds a little bit to the plot on its way to the final plot; depicted here. Hopefully this can help or inspire someone to take their plot beyond the basics.
Step 1 is to load the proper libraries which include
ggplot2 (of course),
ggalt for the handy
ggsave function, and
extrafont for the use of a range of fonts in your plot. I did this on a mac book pro and using the extra fonts was pretty easy. I have done the same on a windows machine and it was more of a pain. Google around and it should work out.
library("ggplot2") # dev version library("ggalt") # dev version, for ggsave() library("extrafont")
Create the data… I included R code at the end of the post to recreate the
data.frame that is in the plots below. You will have to run that code block to make an object called
dat so that the rest of this code will work.
# run the code block for making the 'dat' data.frame. # code is located at bottom of this post
So that whole point to why I was making this plot was to show a trend in population change over time and over geography. The hypothesis put forth by the author of this study (Zubrow, 1974) is that over time population decreased in the east and migrated to the west. The other blog post I did covers all of this in more details.
If one were starting out in R, they may go straight to the base
plot function to plot population over time as shown below. [to be fair, there are many expert R users that will go straight to the base
plot function and make fantastic plots. I am just not one of them. That is a battle I will leave to the pros.]
# base R line plot of population over year plot(dat$Population ~ dat$Year, type = "l")
That result is underwhelming and not very informative. A jumble of tightly spaced lines don’t tell us much. Further, this is plotting data across all of the 19 sites we have data on. Instead of pursuing this further in base
plot and move to
ggplot2 is built on the Grammar of Graphics model, check out a quick intro here. The basic idea is to plot by layers using data and geometries intuitively. To me, plotting in
ggplot2 is much like building a GIS map. Data, layers, and geometries; it all sounds pretty cartographic to me.