sparkDataframe在加入后未正确分割

rfbsl7qr  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(365)

我有3个sparkDataframe,想把它们加入baq列,并从division创建2个新列( dtfAvgEnd("avg(AAG)")/dtfAvgWeek("avg(AAG)" )以及 dtfAvgLong("avg(AAG)")/dtfAvgWeek("avg(AAG)") ).

scala> dtfAvgWeek.filter("BAQ='3310101041401034198668'").show(10,false)
+----------------------+-----------------+                                      ]
|BAQ                   |avg(AAG)         |
+----------------------+-----------------+
|3310101041401034198668|147.6660606060606|
+----------------------+-----------------+

scala> dtfAvgEnd.filter("BAQ='3310101041401034198668'").show(10,false)
+----------------------+------------------+                                     ]
|BAQ                   |avg(AAG)          |
+----------------------+------------------+
|3310101041401034198668|58.360833333333325|
+----------------------+------------------+

scala> dtfAvgLong.filter("BAQ='3310101041401034198668'").show(10,false)
+----------------------+------------------+                                     1]
|BAQ                   |avg(AAG)          |
+----------------------+------------------+
|3310101041401034198668|121.46857142857144|
+----------------------+------------------+

scala> val dtfRatiConsSing=dtfAvgWeek.
     |   filter("BAQ='3310101041401034198668'").
     |   join(dtfAvgEnd,Seq("BAQ"),"inner").
     |   join(dtfAvgLong,Seq("BAQ"),"inner").
     |   withColumn("Rati_End",dtfAvgEnd("avg(AAG)")/dtfAvgWeek("avg(AAG)")).
     |   withColumn("Rati_long",dtfAvgLong("avg(AAG)")/dtfAvgWeek("avg(AAG)"));
dtfRatiConsSing: org.apache.spark.sql.DataFrame = [BAQ: string, avg(AAG): double ... 4 more fields]

我得到了这个。所以rati d u很早就解决了,但最后没有。我不明白出了什么问题。

scala>   dtfRatiConsSing.
     |   show(20,false);
+----------------------+------------------+------------------+------------------+--------+------------------+
|BAQ                   |avg(AAG)          |avg(AAG)          |avg(AAG)          |Rati_End|Rati_long         |
+----------------------+------------------+------------------+------------------+--------+------------------+
|3310101041401034198668|147.66606060606063|58.360833333333346|121.46857142857142|1.0     |0.8225896386077629|
njthzxwz

njthzxwz1#

scala> dtfRatiCons.filter("BAQ='3310101041401034198668'").show(10,false);
+----------------------+------------------+-----------------+------------------+------------------+------------------+
|BAQ                   |AVGWeek           |AVGEnd           |AVGLong           |Rati_End          |Rati_long         |
+----------------------+------------------+-----------------+------------------+------------------+------------------+
|3310101041401034198668|147.66606060606063|58.36083333333334|121.46857142857142|0.3952217123813354|0.8225896386077629|
+----------------------+------------------+-----------------+------------------+------------------+------------------+

我重新命名了avg列,它成功了。

相关问题