我有一个场景,其中最后一个数据框看起来就像是加入阶段和基础的结果。
+-------------------+------------------------+---------------+-----------------------+------------+------------+------------+-----------------------+------------+------------+------------+
|ID_key |ICC_key |suff_key |stage_{timestamp} |stage_{code}|stage_{dol1}|stage_{dol2}|final_{timestamp} |final_{code}|final_{dol1}|final_{dol2}|
+-------------------+------------------------+---------------+-----------------------+------------+------------+------------+-----------------------+------------+------------+------------+
|222 |222 |1 |2019-02-02 21:50:25.585|9123 |20.00 |1000.00 |2019-03-02 21:50:25.585|7123 |30.00 |200.00 |
|333 |333 |1 |2020-03-03 21:50:25.585|8123 |30.00 |200.00 |2020-01-03 21:50:25.585|823 |30.00 |200.00 |
|444 |444 |1 |2020-04-03 21:50:25.585|8123 |30.00 |200.00 |null |null |null |null |
|555 |333 |1 |null |null |null |null |2020-05-03 21:50:25.585|813 |30.00 |200.00 |
|111 |111 |1 |2020-01-01 21:50:25.585|A123 |10.00 |99.00 |null |null |null |null |
+-------------------+------------------------+---------------+-----------------------+------------+------------+------------+-----------------------+------------+------------+------------+
我在寻找一个逻辑,在final{timestamp}>stage{timestamp}的每一行上,必须用“null”替换值,所有列都以stage{}开头。
如下所示:
+-------------------+------------------------+---------------+-----------------------+------------+------------+------------+-----------------------+------------+------------+------------+
|ID_key |ICC_key |suff_key |stage_{timestamp} |stage_{code}|stage_{dol1}|stage_{dol2}|final_{timestamp} |final_{code}|final_{dol1}|final_{dol2}|
+-------------------+------------------------+---------------+-----------------------+------------+------------+------------+-----------------------+------------+------------+------------+
|222 |222 |1 |null |null |null |null |2019-03-02 21:50:25.585|7123 |30.00 |200.00 |
|333 |333 |1 |2020-03-03 21:50:25.585|8123 |30.00 |200.00 |2020-01-03 21:50:25.585|823 |30.00 |200.00 |
|444 |444 |1 |2020-04-03 21:50:25.585|8123 |30.00 |200.00 |null |null |null |null |
|555 |333 |1 |null |null |null |null |2020-05-03 21:50:25.585|813 |30.00 |200.00 |
|111 |111 |1 |2020-01-01 21:50:25.585|A123 |10.00 |99.00 |null |null |null |null |
+-------------------+------------------------+---------------+-----------------------+------------+------------+------------+-----------------------+------------+------------+------------+
如果你能帮我讲逻辑就太好了
3条答案
按热度按时间ryhaxcpt1#
检查以下代码。
条件
条件
Matched
柱条件
Not Matched
柱结合
Not Matched
&Matched
柱最终结果
t9eec4r02#
代码如下:
2hh7jdfx3#
Pypark解决方案:
结果是: