Apache Spark 如何在Amazon EMR数据库实验室中注册Apache Sedona SQL函数?

mpbci0fu  于 6个月前  发布在  Apache
关注(0)|答案(1)|浏览(77)

我已经在基于https://sedona.apache.org/1.5.0/setup/emr/的Amazon EMR集群中设置了Apache Sedona
我将EMR集群连接到Amazon EMR中的数据库实验室。
首先,我设置配置,允许我从AWS Glue Catalog中注册的Delta表中读取:

%%configure -f
{
  "conf": {
    "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension",
    "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
  }
}

字符串
当我运行

%%sql
select _time, _coordinate
from my_db.my_delta_table
order by _time desc
limit 5


它给了我结果:
x1c 0d1x的数据
我的_coordinate是WKT字符串格式。
现在我尝试从Apache Sedona运行Spatial Spark SQL:

%%sql
select
    _time,
    ST_Distance(
        ST_GeomFromWKT('POINT(37.335480 -121.893028)'),
        ST_GeomFromWKT(_coordinate)
    ) as `Distance to San Jose, CA`
from my_db.my_delta_table
order by _time desc
limit 5


我出错了

An error was encountered:
[UNRESOLVED_ROUTINE] Cannot resolve function `ST_Distance` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].; line 3 pos 4
Traceback (most recent call last):
  File "/mnt1/yarn/usercache/livy/appcache/application_1699328410941_0006/container_1699328410941_0006_01_000001/pyspark.zip/pyspark/sql/session.py", line 1440, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
  File "/mnt1/yarn/usercache/livy/appcache/application_1699328410941_0006/container_1699328410941_0006_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/mnt1/yarn/usercache/livy/appcache/application_1699328410941_0006/container_1699328410941_0006_01_000001/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
    raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function `ST_Distance` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].; line 3 pos 4


我想我需要以某种方式注册Apache Sedona SQL函数。如何注册它们?谢谢!

hfyxw5xn

hfyxw5xn1#

基本上,你需要把下面的脚本放在你的notebook单元格中:

%%configure -f
{
  "conf": {
    "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions",
    "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
  }
}

字符串

相关问题