如何使用pyspark从Dataframe的日期列中提取年份

0kjbasz6  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(474)

我最近开始使用pyspark,我正在尝试从Dataframe的add\u date列中提取年份的不同方法,并在同一Dataframe中创建一个名为year的新列。

+-------+-------+-----+---------+-----------------+
|show_id|   type|title|  country|       date_added|
+-------+-------+-----+---------+-----------------+
|     s1|TV Show|   3%|   Brazil|  August 14, 2020|
|     s2|  Movie| 7:19|   Mexico|December 23, 2016|
|     s3|  Movie|23:59|Singapore|December 20, 2018|
+-------+-------+-----+---------+-----------------+
nwsw7zdq

nwsw7zdq1#

可以使用子字符串:

import pyspark.sql.functions as F

df2 = df.withColumn('year', F.expr('substring(date_added, -4)'))
df2.show()
+-------+-------+-----+---------+-----------------+----+
|show_id|   type|title|  country|       date_added|year|
+-------+-------+-----+---------+-----------------+----+
|     s1|TV Show|   3%|   Brazil|  August 14, 2020|2020|
|     s2|  Movie| 7:19|   Mexico|December 23, 2016|2016|
|     s3|  Movie|23:59|Singapore|December 20, 2018|2018|
+-------+-------+-----+---------+-----------------+----+

相关问题