pyspark 使用有序字典创建Map类型

z2acfund  于 5个月前  发布在  Spark
关注(0)|答案(2)|浏览(51)

我正在尝试将有序的dict转换为pyspark MapType。

from pyspark.sql.functions import create_map, lit
from pyspark.sql import SparkSession
from collections import OrderedDict

# Sample ordered dictionary
ordered_dict = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
create_map([lit(k) for k in ordered_dict.keys()], [lit(v) for v in ordered_dict.values()])

字符串
给出一个错误:

TypeError: Invalid argument, not a string or column: [Column<'a'>, Column<'b'>, Column<'c'>] of type <class 'list'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.


Spark 3.2版本,任何解决此问题的建议都将受到高度赞赏。谢谢

bsxbgnwa

bsxbgnwa1#

F.create_map需要一个扁平的键和值序列:

from pyspark.sql import functions as F
from collections import OrderedDict
from itertools import chain

ordered_dict = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
kv_list = [[k, v] for k, v in ordered_dict.items()]
kv_flat = list(chain(*kv_list))

map_col = F.create_map([F.lit(e) for e in kv_flat]).alias('map_col')
df = spark.range(1).select(map_col)
df.printSchema()
df.show(1, False)

# root
#  |-- map_col: map (nullable = false)
#  |    |-- key: string
#  |    |-- value: integer (valueContainsNull = false)

# +------------------------+
# |map_col                 |
# +------------------------+
# |{a -> 1, b -> 2, c -> 3}|
# +------------------------+

字符串

qacovj5a

qacovj5a2#

下面有帮助吗?

from pyspark.sql.functions import create_map, lit
from itertools import chain
simple_dict={"a":1, "b":2, "c":3 }
mapping_expr = create_map([lit(x) for x in chain(*simple_dict.items())])
print(type(mapping_expr))

字符串

相关问题