我正在尝试对我有lat/long和一个静态geojson文件的数据进行空间操作。现在我需要加载geojson,并使用交集查找DF lat/long中的每一行是否属于哪个位置。
源数据
1111:150458,025.22826N,055.30022E,348,39,JOB_ONBOARD
2222:150448,025.22746N,055.29962E,32,48, CAR_AVAILABLE
3333,20072023:150612,025.30559N,055.38272E,130,50,CAR_AVAILABLE
4444,20072023:150740,025.21794N,055.28569E,0,0,JOB_ONBOARD
我试图遵循Apache Sedona文档,但没有成功。
请引导我继续前进。谢谢你
val spark: SparkSession = SparkSession.builder()
.appName("test")
.config("spark.master", "local[*]")
.config("spark.serializer", classOf[KryoSerializer].getName)
.config("spark.kryo.registrator", classOf[SedonaKryoRegistrator].getName)
.getOrCreate()
SedonaSQLRegistrator.registerAll(spark)
val inputLocation = "C:\\communities_0.geojson"
val schema = "type string, crs string, totalFeatures long, features array<struct<type string, geometry string, properties map<string, string>>>"
spark.read.schema(schema).json(inputLocation)
.selectExpr("explode(features) as features") // Explode the envelope to get one feature per row.
.select("features.*") // Unpack the features struct.
.withColumn("geometry", expr("ST_GeomFromGeoJSON(geometry)")) // Convert the geometry string.
.printSchema()
最终结果DF应如下所示
1111:150458,025.22826N,055.30022E,348,39,JOB_ONBOARD, community_A
2222:150448,025.22746N,055.29962E,32,48, CAR_AVAILABLE, community_B
3333,20072023:150612,025.30559N,055.38272E,130,50,CAR_AVAILABLE, community_C
4444,20072023:150740,025.21794N,055.28569E,0,0,JOB_ONBOARD, community_D
1条答案
按热度按时间mdfafbf11#
您的源数据不是GeoJSON或任何典型的地理空间格式。请考虑先删除字母N和E,将数据清理为以下格式。
然后可以使用Sedona ST_Point创建几何柱:https://sedona.apache.org/1.4.1/api/sql/Constructor/#st_point