I’m a little late to the party on this but I thought that I would add a post about it because it is something that relates to my teaching and I was slightly surprised by the conclusion I ended up coming to. The topic is piping and the strengths and weaknesses of the `magrittr`

pipe `%>%`

versus the (relatively) new base pipe `|>`

.

The background to this post is that I was making a general move towards switching from `%>%`

to `|>`

. That is, when I started writing some brand new code in a brand new RStudio project (and git repo), I changed to using `|>`

. Some of this new code generated a weird error and on fixing it I ended up doing a deeper dive into the base pipe and that led to this post.

Before I get too much further I would just add that I am a big fan of the pipe and piped operations. In my opinion it generates far more readable code than base `R`

(prior to the pipe). I personally use it all the time and introduce it to my students as soon as I think is reasonably sensible.

Anyway, onto the discussion and the secondary reason for the post - to give me something to refer back to when I forget the specifics. I’ve used the magrittr pipe `%>%`

for ages and the example below (from the old version of the `map()`

help page) is a good example of where I might use it.

```
library(tidyverse)
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map(summary) %>%
map_dbl("r.squared")
```

```
4 6 8
0.5086326 0.4645102 0.4229655
```

On digging into the base pipe, which I was using as part of `map()`

call, I used `?map`

and got the updated version of the above code.

```
mtcars |>
group_split(mtcars$cyl) |>
map(\(df) lm(mpg ~ wt, data = df)) |>
map(summary) |>
map_dbl("r.squared")
```

`[1] 0.5086326 0.4645102 0.4229655`

These two pieces of code look reasonably similar but you’ll notice the lack of `.`

notation in the second. This led me reading about `|>`

and the differences. In turn I found this excellent blog post that provided better examples than I found in other documentation. For me the crux of the differences between `|>`

and `%>%`

are that:

- the base pipe only ever passes the left-hand side of the pipe into the first argument of the right-hand side;
- and (because of this) we can no longer use the
`.`

notation.

The anonymous function defined above as `\(df) lm(mpg ~ wt, data = df))`

is way around the issue with `.`

notation. For the most part, the first point above (LHS into RHS) is rarely going to be an issue, particularly with `dplyr`

functions. For example:

```
mtcars %>%
group_by(cyl) %>%
summarise(av_mpg = mean(mpg))
```

```
# A tibble: 3 × 2
cyl av_mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
```

is exactly the same as

```
mtcars |>
group_by(cyl) |>
summarise(av_mpg = mean(mpg))
```

```
# A tibble: 3 × 2
cyl av_mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
```

The problem for me, and the reason that I don’t think I’ll be switching to use `|>`

in the near future is the `.`

notation for when we don’t want to pipe into the first argument. I quite often find myself piping into a function where the data isn’t the first argument. To provide an example, let’s create a small dataset.

```
ex_df <- tibble(
grp1 = rnorm(100, mean = 10, sd = 5),
grp2 = rnorm(100, mean = 20, sd = 5)
)
head(ex_df)
```

```
# A tibble: 6 × 2
grp1 grp2
<dbl> <dbl>
1 1.63 14.3
2 5.09 8.97
3 24.2 19.6
4 4.13 31.0
5 10.2 27.0
6 7.96 32.1
```

Suppose we wanted to perform a t-test. We have a number of choices

`t.test(ex_df$grp1, ex_df$grp2)`

```
Welch Two Sample t-test
data: ex_df$grp1 and ex_df$grp2
t = -13.731, df = 197.69, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.784637 -8.824786
sample estimates:
mean of x mean of y
9.962203 20.266915
```

Alternatively we might want to change the data to long format, maybe add a cleaning step (which isn’t needed for this example) and run the test using the formula notation.

```
ex_df %>%
pivot_longer(everything(), names_to = "group", values_to = "value") %>%
# additional cleaning step ... %>%
t.test(value ~ group, data = .)
```

```
Welch Two Sample t-test
data: value by group
t = -13.731, df = 197.69, p-value < 2.2e-16
alternative hypothesis: true difference in means between group grp1 and group grp2 is not equal to 0
95 percent confidence interval:
-11.784637 -8.824786
sample estimates:
mean in group grp1 mean in group grp2
9.962203 20.266915
```

Now if we try that with `|>`

we know in advance that it won’t work but I’m doing it here to generate the error

```
ex_df |>
pivot_longer(everything(), names_to = "group", values_to = "value") |>
# additional cleaning step ... |>
t.test(value ~ group, data = .)
```

```
Error in `vec_c()`:
! Can't combine `group` <character> and `value` <double>.
```

We can get around this if we really want, with an anonymous function.

```
ex_df |>
pivot_longer(everything(), names_to = "group", values_to = "value") |>
(\(df) t.test(value ~ group, data = df))()
```

```
Welch Two Sample t-test
data: value by group
t = -13.731, df = 197.69, p-value < 2.2e-16
alternative hypothesis: true difference in means between group grp1 and group grp2 is not equal to 0
95 percent confidence interval:
-11.784637 -8.824786
sample estimates:
mean in group grp1 mean in group grp2
9.962203 20.266915
```

I quite like the notation of the anonymous functions but I’m not going to go into detail on them here because others have already provided excellent explanations. The blog post I linked above is one of them and I encourage you to read that too.

The conclusion I came to after this little bit of reading was that I’ll stick to the magrittr pipe `%>%`

, as i think the notation it leads to (include the `.`

notation) is easier to read. The exception to this would be if I was writing a package. In that case I do try to limit the dependencies and it might be worthwhile using `|>`

and anonymous functions.

As part of the reading I did, I also revisited the other pipes available from the `magrittr`

package and it was really good reminder of what the options were. The ones I think I’ll end up using the most are the assignment pipe `%<>%`

and what I’ll probably end up column the *column* pipe (the exposition pipe) `%$%`

.

The `%$%`

provides an alternative to the above `t.test()`

notation as it allows us to access the columns directly.

```
library(magrittr)
ex_df %$%
t.test(grp1, grp2)
```

```
Welch Two Sample t-test
data: grp1 and grp2
t = -13.731, df = 197.69, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.784637 -8.824786
sample estimates:
mean of x mean of y
9.962203 20.266915
```

The assignment pipe saves me some typing for things I do a lot by replacing

```
ex_df <- ex_df %>%
pivot_longer(everything(), names_to = "group", values_to = "value")
```

with

```
ex_df %<>% pivot_longer(everything(), names_to = "group", values_to = "value")
head(ex_df)
```

```
# A tibble: 6 × 2
group value
<chr> <dbl>
1 grp1 1.63
2 grp2 14.3
3 grp1 5.09
4 grp2 8.97
5 grp1 24.2
6 grp2 19.6
```

If you are reading this and haven’t already, then I would really encourage you to read the blog post. I’m a big fan of the pipe and will continue to use it but for my personal work, I think I’ll stick with the `magrittr`

pipe rather than switch to the base pipe.