ggplot_build() again

May 6, 2018    ggplot2

I was playing around with extracting data from ggplot graphs, which I have done previously using ggplot2_build(). This time it was getting some of the modelling results that are plotted using geom_smooth(). Take the quick example below.

library(ggplot2)
p = ggplot(cars, aes(x = speed, y = dist)) +
geom_point() +
geom_smooth(method = "loess")
p

The plot has two layers and so ggplot_build(p)$data will return a list of two data frames. If we are interested in the results from geom_smooth() we need the second one as geom_smooth() is the second layer. head(ggplot_build(p)$data[[2]], n = 10)
          x         y        ymin     ymax       se PANEL group  colour   fill
1  4.000000  5.893628 -14.0214308 25.80869 9.885466     1    -1 #3366FF grey60
2  4.265823  6.369796 -12.1702114 24.90980 9.202917     1    -1 #3366FF grey60
3  4.531646  6.867702 -10.3935937 24.12900 8.568188     1    -1 #3366FF grey60
4  4.797468  7.387181  -8.6930267 23.46739 7.981917     1    -1 #3366FF grey60
5  5.063291  7.928070  -7.0698299 22.92597 7.444680     1    -1 #3366FF grey60
6  5.329114  8.490205  -5.5250240 22.50543 6.956900     1    -1 #3366FF grey60
7  5.594937  9.073423  -4.0591073 22.20595 6.518745     1    -1 #3366FF grey60
8  5.860759  9.677561  -2.6717841 22.02691 6.129986     1    -1 #3366FF grey60
9  6.126582 10.302454  -1.3616644 21.96657 5.789853     1    -1 #3366FF grey60
10 6.392405 10.947940  -0.1259761 22.02186 5.496887     1    -1 #3366FF grey60
size linetype weight alpha
1     1        1      1   0.4
2     1        1      1   0.4
3     1        1      1   0.4
4     1        1      1   0.4
5     1        1      1   0.4
6     1        1      1   0.4
7     1        1      1   0.4
8     1        1      1   0.4
9     1        1      1   0.4
10    1        1      1   0.4

The y values are the results from the loess model. So far, so good but I wanted to check if these values were the same as the results from direct calculation. Using base R functions give us the following by default.

loess_results = predict(loess(dist~speed,cars), cars$speed) head(loess_results) [1] 5.893628 5.893628 12.499786 12.499786 15.281082 18.446568 These results don’t match but a quick look at the documentation shows that geom_smooth() defaults to using n = 80 points and these are in equal steps from the minimum to the maximum values of speed. Providing the same sequence of speed values # recalculate the values loess_results = predict(loess(dist~speed,cars), seq(min(cars$speed), max(cars$speed), length.out = 80)) produces the same results as geom_smooth(), so we don’t need to do both. We can double check just to be sure. # are they the same as those from ggplot? identical(loess_results, ggplot_build(p)$data[[2]]\$y)
[1] TRUE