如何用和值重命名sparksql中的列?

xam8gpfp  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(295)

给定一个使用spark sql(2.4.*)以这种方式构建的表:

scala> spark.sql("with some_data (values ('A',1),('B',2)) select * from some_data").show()
+----+----+
|col1|col2|
+----+----+
|   A|   1|
|   B|   2|
+----+----+

我无法设置列名(实际上是默认的) col1 以及 col2 ). 有没有办法将这些列重命名为 label 以及 value ?

vfh0ocws

vfh0ocws1#

将查询修改为-

spark.sql("with some_data (values ('A',1),('B',2) T(label, value)) select * from some_data").show()

    /**
      * +-----+-----+
      * |label|value|
      * +-----+-----+
      * |    A|    1|
      * |    B|    2|
      * +-----+-----+
      */

或者用这个例子作为参考-

val df = spark.sql(
      """
        |select Class_Name, Customer, Date_Time, Median_Percentage
        |from values
        |   ('ClassA', 'A', '6/13/20', 64550),
        |   ('ClassA', 'B', '6/6/20', 40200),
        |   ('ClassB', 'F', '6/20/20', 26800),
        |   ('ClassB', 'G', '6/20/20', 18100)
        |  T(Class_Name, Customer, Date_Time, Median_Percentage)
      """.stripMargin)
    df.show(false)
    df.printSchema()

    /**
      * +----------+--------+---------+-----------------+
      * |Class_Name|Customer|Date_Time|Median_Percentage|
      * +----------+--------+---------+-----------------+
      * |ClassA    |A       |6/13/20  |64550            |
      * |ClassA    |B       |6/6/20   |40200            |
      * |ClassB    |F       |6/20/20  |26800            |
      * |ClassB    |G       |6/20/20  |18100            |
      * +----------+--------+---------+-----------------+
      *
      * root
      * |-- Class_Name: string (nullable = false)
      * |-- Customer: string (nullable = false)
      * |-- Date_Time: string (nullable = false)
      * |-- Median_Percentage: integer (nullable = false)
      */

请注意 T(Class_Name, Customer, Date_Time, Median_Percentage) 根据需要为列提供名称

相关问题