在pyspark中将字符串类型列转换为datetime

xpcnnkqh  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(693)

我有一个专栏 Time 在我的Spark里。它是字符串类型。我需要把它转换成日期时间格式。我尝试了以下方法:

data.select(unix_timestamp(data.Time, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp"))

data.printSchema()

输出为:

root
 |-- Time: string (nullable = true)

如果我将它保存在新的df中,那么我将丢失所有其他列。

ijxebb2r

ijxebb2r1#

你可以用 withColumn 而不是 select ```
data = spark.createDataFrame([('1997/02/28 10:30:00',"test")], ['Time','Col_Test'])

df = data.withColumn("timestamp",unix_timestamp(data.Time, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()))

输出:

df.show()
+-------------------+--------+-------------------+
| Time|Col_Test| timestamp|
+-------------------+--------+-------------------+
|1997/02/28 10:30:00| test|1997-02-28 10:30:00|
+-------------------+--------+-------------------+

data.printSchema()
root
|-- Time: string (nullable = true)
|-- Col_Test: string (nullable = true)

df.printSchema()
root
|-- Time: string (nullable = true)
|-- Col_Test: string (nullable = true)
|-- timestamp: timestamp (nullable = true)

相关问题