我们可以重新排序sparkDataframe的列吗?

ltqd579y  于 2021-05-29  发布在  Spark
关注(0)|答案(2)|浏览(315)

我按照给定的模式创建dataframe,然后我想通过对现有的dataframe重新排序来创建新的dataframe。sparkDataframe中的列可以重新排序吗?

object Demo extends Context {

  def main(args: Array[String]): Unit = {
    val emp = Seq((1,"Smith",-1,"2018","10","M",3000),
      (2,"Rose",1,"2010","20","M",4000),
      (3,"Williams",1,"2010","10","M",1000),
      (4,"Jones",2,"2005","10","F",2000),
      (5,"Brown",2,"2010","40","",-1),
      (6,"Brown",2,"2010","50","",-1)
    )
    val empColumns = Seq("emp_id","name","superior_emp_id","year_joined",
      "emp_dept_id","gender","salary")
    import sparkSession.sqlContext.implicits._
    val empDF = emp.toDF(empColumns: _*)
    empDF.show(false)
  }
}

Current DF:
+------+--------+---------------+-----------+-----------+------+------+
|emp_id|name    |superior_emp_id|year_joined|emp_dept_id|gender|salary|
+------+--------+---------------+-----------+-----------+------+------+
|1     |Smith   |-1             |2018       |10         |M     |3000  |
|2     |Rose    |1              |2010       |20         |M     |4000  |
|3     |Williams|1              |2010       |10         |M     |1000  |
|4     |Jones   |2              |2005       |10         |F     |2000  |
|5     |Brown   |2              |2010       |40         |      |-1    |
|6     |Brown   |2              |2010       |50         |      |-1    |
+------+--------+---------------+-----------+-----------+------+------+

I want output as this following df, where gender and salary column re-ordered
New DF:
+------+--------+------+------+---------------+-----------+-----------+
|emp_id|name    |gender|salary|superior_emp_id|year_joined|emp_dept_id|
+------+--------+------+------+---------------+-----------+-----------+
|1     |Smith   |M     |3000  |-1             |2018       |10         |
|2     |Rose    |M     |4000  |1              |2010       |20         |
|3     |Williams|M     |1000  |1              |2010       |10         |
|4     |Jones   |F     |2000  |2              |2005       |10         |
|5     |Brown   |      |-1    |2              |2010       |40         |
|6     |Brown   |      |-1    |2              |2010       |50         |
+------+--------+------+------+---------------+-----------+-----------+
yws3nbqq

yws3nbqq1#

只是使用 select() 要重新排列列,请执行以下操作:

df = df.select('emp_id','name','gender','salary','superior_emp_id','year_joined','emp_dept_id')

它将根据您的订单显示在 select() 争论。

ijxebb2r

ijxebb2r2#

这是一种方法

//Order the column names as you want
    val columns = Array("emp_id","name","gender","salary","superior_emp_id","year_joined","emp_dept_id")
        .map(col)

    //Pass it to select
    df.select(columns: _*)

相关问题