hadoop 从Hive表到Teradata表的数据传输

a0zr77ik 于 5个月前发布在 Hadoop

关注(0)|答案(1)|浏览(83)

我在Hive数据库（公司）中有一个表（employee），记录计数大于4100万。该表在current_date列上分区。

select count(*) from company.employee

字符串
上述查询的结果是Hive查询编辑器为：41，547，896。
现在的主要任务是将此数据复制到Teradata数据库（company_td）中的表（employee_td）。
下面是用PySpark编写的代码，用于将数据从Hive传输到Terrace。

# creating a dataframe
df = spark.sql("select * from company.employee")

# removing the duplicate records
df = df.distinct()

td_url = 'jdbc:teradata://*****/Database=company_td, LOGMECH=LDAP'

# writing the dataframe to Teradata
df.write.format('jdbc') \
        .option('url', td_url) \
        .option('user', db_user) \
        .option('password', password) \
        .option('dbtable', "employee_td") \
        .option('driver','com.teradata.jdbc.TeraDriver') \
        .mode('append').save()

型
当上面的代码被执行时，我面临着以下错误，一些记录被复制到Teradata表。记录的数量从执行到执行有所不同：

py4j.protocol.Py4JJavaError: An error occurred while calling o204.save.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 in stage 2.0 failed 4 times, most recent failure: Lost task 75.3 in stage 2.0 (TID 97, *******.***.******.com, executor 39): 
java.sql.BatchUpdateException: [Teradata JDBC Driver] [TeraJDBC 16.10.00.05] [Error 1338] [SQLState HY000] A failure occurred while executing a PreparedStatement batch request. Details of the failure can be found in the exception chain that is accessible with getNextException.

型
如例外中所述：

Details of the failure can be found in the exception chain that is accessible with getNextException

型
有人可以帮助我在提高上述错误的详细异常链。

hadoop

来源：https://stackoverflow.com/questions/72273446/data-transfer-from-hive-table-to-teradata-table

1条答案

按热度按时间

arknldoa1#

你可以试着改变你的

df.write...

字符串
到

df.coalesce(1).write...

型
因为我怀疑Terrace可能不支持并行写作。

赞(0）回复(0）举报 5个月前

我来回答

hadoop 从Hive表到Teradata表的数据传输

1条答案

相关问题

热门标签

最新问答