spark-column表达式

up9lanfz  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(283)

我遇到了下面的快递,我知道它是什么意思-部门(“名称”)。我很想知道,它决心要做什么。请分享你的意见。 department("name") -用于引用名为“name”的列。希望我是对的?但是,它的目的似乎是辅助构造函数
从https://spark.apache.org/docs/2.4.5/api/java/index.html?org/apache/spark/sql/dataframewriter.html,

// To create Dataset[Row] using SparkSession
   val people = spark.read.parquet("...")
   val department = spark.read.parquet("...")

   people.filter("age > 30")
     .join(department, people("deptId") === department("id"))
     .groupBy(department("name"), people("gender"))
     .agg(avg(people("salary")), max(people("age")))
ycl3bljg

ycl3bljg1#

department("name") 只是给你打电话用的糖 apply 功能: department.apply("name") 它回来了 Column 来自spark api, Dataset 对象:

/**
   * Selects column based on the column name and returns it as a [[Column]].
   *
   * @note The column name can also reference to a nested column like `a.b`.
   *
   * @group untypedrel
   * @since 2.0.0
   */
  def apply(colName: String): Column = col(colName)

相关问题