r—为什么在使用动态变量名时，按1添加实际上是按2添加？

tsm1rwdh 于 2021-07-13 发布在 Spark

关注(0)|答案(1)|浏览(348)

当我运行下面的代码时，我希望sepal\u width\u 2列的值是sepal\u width+1，但实际上是sepal\u width+2。有什么好处？

require(dplyr)
require(sparklyr)

Sys.setenv(SPARK_HOME='/usr/lib/spark')
sc <- spark_connect(master="yarn")

# for this example these variables are hard coded

# but in my actual code these are named dynamically

sw_name <- as.name('Sepal_Width')
sw2 <- "Sepal_Width_2"
sw2_name <- as.name(sw2)

ir <- copy_to(sc, iris)

print(head(ir %>% mutate(!!sw2 := sw_name))) # so far so good

# Source: spark<?> [?? x 6]

# Sepal_Length Sepal_Width Petal_Length Petal_Width Species Sepal_Width_2

# <dbl>       <dbl>        <dbl>       <dbl> <chr>           <dbl>

# 5.1         3.5          1.4         0.2 setosa            3.5

# 4.9         3            1.4         0.2 setosa            3

# 4.7         3.2          1.3         0.2 setosa            3.2

# 4.6         3.1          1.5         0.2 setosa            3.1

# 5           3.6          1.4         0.2 setosa            3.6

# 5.4         3.9          1.7         0.4 setosa            3.9

print(head(ir %>% mutate(!!sw2 := sw_name) %>% mutate(!!sw2 := sw2_name + 1))) # i guess 2+2 != 4?

# Source: spark<?> [?? x 6]

# Sepal_Length Sepal_Width Petal_Length Petal_Width Species Sepal_Width_2

# <dbl>       <dbl>        <dbl>       <dbl> <chr>           <dbl>

# 5.1         3.5          1.4         0.2 setosa            5.5

# 4.9         3            1.4         0.2 setosa            5

# 4.7         3.2          1.3         0.2 setosa            5.2

# 4.6         3.1          1.5         0.2 setosa            5.1

# 5           3.6          1.4         0.2 setosa            5.6

# 5.4         3.9          1.7         0.4 setosa            5.9

我的用例要求我使用上面提到的动态变量命名。在本例中，这是相当愚蠢的（与直接使用变量相比），但在我的用例中，我在数百个不同的spark表中运行相同的函数。它们在列数和每列是什么（一些机器学习模型的输出）方面都有相同的“模式”，但是名称不同，因为每个表包含不同模型的输出。这些名称是可预测的，但由于它们各不相同，因此我按照您在这里看到的动态构造它们，而不是硬编码它们。
似乎spark知道如何在名称硬编码时将2和2相加，但是当名称是动态的时，它会突然崩溃。

apache-spark r sparklyr dplyr

来源：https://stackoverflow.com/questions/66162709/why-does-adding-by-1-actually-add-by-2-in-sparklyr-when-using-dynamic-variable-n

1条答案

按热度按时间

8hhllhi21#

你可能误用了 as.name 哪个是领先的 sparklyr 误解你的意见。
请注意，仅处理本地表时出现的代码错误：

sw_name <- as.name('Sepal.Width') # swap "_" to "." to match variable names
sw2 <- "Sepal_Width_2"
sw2_name <- as.name(sw2)
data(iris)

print(head(iris %>% mutate(!!sw2 := sw_name)))

# Error: Problem with `mutate()` input `Sepal_Width_2`.

# x object 'Sepal.Width' not found

# i Input `Sepal_Width_2` is `sw_name`.

请注意，您使用的是 !! 来自rlang的操作员 as.name 从右基座开始。但你们并没有像在这个问题中所展示的那样把它们结合在一起使用。
我建议你用 sym 以及 !! 从rlang包而不是 as.name ，并将两者应用于作为列名的字符串。以下是本地工作，与非标准评估指南一致。所以它应该转化为Spark：

library(dplyr)
data(iris)

sw <- 'Sepal.Width'
sw2 <- paste0(sw, "_2")

head(iris %>% mutate(!!sym(sw2) := !!sym(sw)))
head(iris %>% mutate(!!sym(sw2) := !!sym(sw)) %>% mutate(!!sym(sw2) := !!sym(sw2) + 1))

赞(0）回复(0）举报 2021-07-13

我来回答

r—为什么在使用动态变量名时，按1添加实际上是按2添加？

1条答案

相关问题

热门标签

最新问答