Pypark和Hive/ Impala

7gcisfzg 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(206)

我想在pyspark中建立一个分类模型。我对这个模型的输入是从hive或impala选择查询或视图的结果。有没有任何方法可以将这个查询包含在pyspark代码本身中，而不是将结果存储在提供给我们模型的文本文件中

Hive impala pyspark logistic-regression

来源：https://stackoverflow.com/questions/42394105/pyspark-and-hive-impala

1条答案

按热度按时间

yvt65v4c1#

是的，为此，您需要将hivecontext与sparkcontext一起使用。这里是example:-

sqlContext = HiveContext(sc)
tableData = sqlContext.sql("SELECT * FROM TABLE")

# tableData is a dataframe containing reference to schema of table, check this using tableData.printSchema()

tableData.collect() #collect executes query and provide all rows from sql

或者你可以参考这里https://spark.apache.org/docs/1.6.0/sql-programming-guide.html

赞(0）回复(0）举报 2021-06-26

我来回答

Pypark和Hive/ Impala

1条答案

相关问题

热门标签

最新问答