我已经在基于https://sedona.apache.org/1.5.0/setup/emr/的Amazon EMR集群中设置了Apache Sedona
我将EMR集群连接到Amazon EMR中的数据库实验室。
首先,我设置配置,允许我从AWS Glue Catalog中注册的Delta表中读取:
%%configure -f
{
"conf": {
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension",
"spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
}
}
字符串
当我运行
%%sql
select _time, _coordinate
from my_db.my_delta_table
order by _time desc
limit 5
型
它给了我结果:
x1c 0d1x的数据
我的_coordinate
是WKT字符串格式。
现在我尝试从Apache Sedona运行Spatial Spark SQL:
%%sql
select
_time,
ST_Distance(
ST_GeomFromWKT('POINT(37.335480 -121.893028)'),
ST_GeomFromWKT(_coordinate)
) as `Distance to San Jose, CA`
from my_db.my_delta_table
order by _time desc
limit 5
型
我出错了
An error was encountered:
[UNRESOLVED_ROUTINE] Cannot resolve function `ST_Distance` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].; line 3 pos 4
Traceback (most recent call last):
File "/mnt1/yarn/usercache/livy/appcache/application_1699328410941_0006/container_1699328410941_0006_01_000001/pyspark.zip/pyspark/sql/session.py", line 1440, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
File "/mnt1/yarn/usercache/livy/appcache/application_1699328410941_0006/container_1699328410941_0006_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/mnt1/yarn/usercache/livy/appcache/application_1699328410941_0006/container_1699328410941_0006_01_000001/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function `ST_Distance` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].; line 3 pos 4
型
我想我需要以某种方式注册Apache Sedona SQL函数。如何注册它们?谢谢!
1条答案
按热度按时间hfyxw5xn1#
基本上,你需要把下面的脚本放在你的notebook单元格中:
字符串