使用hive命令更改df中的字符串，并使用sparklyr进行变异

6uxekuva 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(333)

使用hive命令 regexp_extract 我正在尝试更改以下字符串：

201703170455 to 2017-03-17:04:55

发件人：

2017031704555675 to 2017-03-17:04:55.0010

我在SparkyR中这样做是为了使用在r中与gsub一起工作的代码：

newdf<-df%>%mutate(Time1 = regexp_extract(Time, "(....)(..)(..)(..)(..)", "\\1-\\2-\\3:\\4:\\5"))

这个代码是：

newdf<-df%>mutate(TimeTrans = regexp_extract("(....)(..)(..)(..)(..)(....)", "\\1-\\2-\\3:\\4:\\5.\\6"))

但根本不起作用。有没有关于如何使用regexp\u extract的建议？

Hive apache-spark r sparklyr gsub

来源：https://stackoverflow.com/questions/44332886/change-string-in-df-using-hive-command-and-mutate-with-sparklyr

1条答案

按热度按时间

r6hnlfcb1#

apachespark使用java正则表达式方言而不是r，组应该用 $ . 此外 regexp_replace 用于通过数字索引提取单个组。
你可以用 regexp_replace :

df <- data.frame(time = c("201703170455", "2017031704555675"))
sdf <- copy_to(sc, df)

sdf %>% 
  mutate(time1 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)$", "$1-$2-$3 $4:$5" )) %>%
  mutate(time2 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)(....)$", "$1-$2-$3 $4:$5.$6"))

Source:   query [2 x 3]
Database: spark connection master=local[8] app=sparklyr local=TRUE

# A tibble: 2 x 3

              time            time1                 time2
             <chr>            <chr>                 <chr>
1     201703170455 2017-03-17 04:55          201703170455
2 2017031704555675 2017031704555675 2017-03-17 04:55.5675

赞(0）回复(0）举报 2021-06-26

我来回答

使用hive命令更改df中的字符串，并使用sparklyr进行变异

1条答案

相关问题

热门标签

最新问答