oozie sqoop问题

q8l4jmvw  于 2021-06-03  发布在  Sqoop
关注(0)|答案(0)|浏览(285)

我正在尝试运行ooziesqoop作业,以便从teradata导入到hive。
sqoop在cli中运行良好。但我面临着与oozie一起安排的问题。
注意:我可以在oozie中执行shell操作,而且效果很好。
找到下面的错误日志和工作流
错误日志:

Log Type: stderr
    Log Upload Time: Wed Feb 01 04:19:00 -0500 2017
    Log Length: 513

log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:WARN No appenders could be found for logger (org.apache.hadoop.yarn.client.RMProxy).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
No such sqoop tool: sqoop. See 'sqoop help'.
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Log Type: stdout
Log Upload Time: Wed Feb 01 04:19:00 -0500 2017
Log Length: 158473
Showing 4096 bytes of 158473 total. Click here for the full log.
curity.ShellBasedUnixGroupsMapping
dfs.client.domain.socket.data.traffic=false
dfs.client.read.shortcircuit.streams.cache.size=256
fs.s3a.connection.timeout=200000
dfs.datanode.block-pinning.enabled=false
mapreduce.job.end-notification.max.retry.interval=5000
yarn.acl.enable=true
yarn.nm.liveness-monitor.expiry-interval-ms=600000
mapreduce.application.classpath=$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH
mapreduce.input.fileinputformat.list-status.num-threads=1
dfs.client.mmap.cache.size=256
mapreduce.tasktracker.map.tasks.maximum=2
yarn.scheduler.fair.user-as-default-queue=true
yarn.timeline-service.ttl-enable=true
yarn.nodemanager.linux-container-executor.resources-handler.class=org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler
dfs.namenode.max.objects=0
dfs.namenode.service.handler.count=10
dfs.namenode.kerberos.principal.pattern=*
yarn.resourcemanager.state-store.max-completed-applications=${yarn.resourcemanager.max-completed-applications}
dfs.namenode.delegation.token.max-lifetime=604800000
mapreduce.job.classloader=false
yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000
mapreduce.job.hdfs-servers=${fs.defaultFS}
yarn.application.classpath=$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
dfs.datanode.hdfs-blocks-metadata.enabled=true
mapreduce.tasktracker.dns.nameserver=default
dfs.datanode.readahead.bytes=4193404
mapreduce.job.ubertask.maxreduces=1
dfs.image.compress=false
mapreduce.shuffle.ssl.enabled=false
yarn.log-aggregation-enable=false
mapreduce.tasktracker.report.address=127.0.0.1:0
mapreduce.tasktracker.http.threads=40
dfs.stream-buffer-size=4096
tfile.fs.output.buffer.size=262144
fs.permissions.umask-mode=022
dfs.client.datanode-restart.timeout=30
dfs.namenode.resource.du.reserved=104857600
yarn.resourcemanager.am.max-attempts=2
yarn.nodemanager.resource.percentage-physical-cpu-limit=100
ha.failover-controller.graceful-fence.connection.retries=1
mapreduce.job.speculative.speculative-cap-running-tasks=0.1
hadoop.proxyuser.hdfs.groups=*
dfs.datanode.drop.cache.behind.writes=false
hadoop.proxyuser.HTTP.hosts=*
hadoop.common.configuration.version=0.23.0
mapreduce.job.ubertask.enable=false
yarn.app.mapreduce.am.resource.cpu-vcores=1
dfs.namenode.replication.work.multiplier.per.iteration=2
mapreduce.job.acl-modify-job= 
io.seqfile.local.dir=${hadoop.tmp.dir}/io/local
yarn.resourcemanager.system-metrics-publisher.enabled=false
fs.s3.sleepTimeSeconds=10
mapreduce.client.output.filter=FAILED
------------------------

Sqoop command arguments :
             sqoop
             import
             --connect
             "jdbc:teradata://xx.xxx.xx:xxxx/DATABASE=Database_name"
             --verbose
             --username
         xxx
-password
         'xxx'
             --table
             BILL_DETL_EXTRC
             --split-by
             EXTRC_RUN_ID
             --m
             1
             --fields-terminated-by
             ,
--hive-import
             --hive-table
             OPS_TEST.bill_detl_extr213
             --target-dir
             /hadoop/dev/TD_archive/bill_detl_extrc
Fetching child yarn jobs
tag id : oozie-56ea2084fcb1d55591f8919b405f0be0
Child yarn jobs are found - 
=================================================================

Invoking Sqoop command line now >>>

3324 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://namenode:8020/user/hadoopadm/oozie-oozi/0000039-170123205203054-oozie-oozi-W/sqoop-action--sqoop/action-data.seq

Oozie Launcher ends

Log Type: syslog
Log Upload Time: Wed Feb 01 04:19:00 -0500 2017
Log Length: 16065
Showing 4096 bytes of 16065 total. Click here for the full log.
adoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources.
2017-02-01 04:18:51,990 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/hadoopadm/.staging/job_1485220715968_0219/job.xml
2017-02-01 04:18:52,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #5 tokens and #1 secret keys for NM use for launching container
2017-02-01 04:18:52,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 6
2017-02-01 04:18:52,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData
2017-02-01 04:18:52,174 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://svacld001.bcbsnc.com:8020]
2017-02-01 04:18:52,240 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapred.JobConf: Task java-opts do not specify heap size. Setting task attempt jvm max heap size to -Xmx820m
2017-02-01 04:18:52,243 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1485220715968_0219_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2017-02-01 04:18:52,243 INFO [uber-EventHandler] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1485220715968_0219_01_000001 taskAttempt attempt_1485220715968_0219_m_000000_0
2017-02-01 04:18:52,245 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1485220715968_0219_m_000000_0] using containerId: [container_1485220715968_0219_01_000001 on NM: [svacld005.bcbsnc.com:8041]
2017-02-01 04:18:52,246 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: mapreduce.cluster.local.dir for uber task: /disk1/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk10/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk11/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk12/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk2/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk3/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk4/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk5/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk6/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk7/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk8/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk9/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219
2017-02-01 04:18:52,247 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1485220715968_0219_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2017-02-01 04:18:52,247 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1485220715968_0219_m_000000 Task Transitioned from SCHEDULED to RUNNING
2017-02-01 04:18:52,249 INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-02-01 04:18:52,258 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2017-02-01 04:18:52,324 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit@9c73765
2017-02-01 04:18:52,329 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2017-02-01 04:18:52,340 INFO [uber-SubtaskRunner] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id

工作流

<workflow-app xmlns="uri:oozie:workflow:0.5" name="oozie-wf">
<start to="sqoop-wf"/>
<action name="sqoop-wf">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>xx.xx.xx:8032</job-tracker>
            <name-node>hdfs://xx.xxx.xx:8020</name-node>
            <command>import  --connect "jdbc:teradata://ip/DATABASE=EDW_EXTRC_TAB_HST" --connection-manager "com.cloudera.connector.teradata.TeradataManager" --verbose --username HADOOP -password 'xxxxx' --table BILL_DETL_EXTRC --split-by EXTRC_RUN_ID  --m 1  --fields-terminated-by , --hive-import --hive-table OPS_TEST.bill_detl_extrc1 --target-dir /hadoop/dev/TD_archive/data/PDCRDATA_TEST/bill_detl_extrc </command>
    </sqoop>
    <ok to="end"/>
    <error to="fail"/>
</action> 
<kill name="fail">
    <message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

作业属性

oozie.wf.application.path=hdfs:///hadoop/dev/TD_archive/workflow1.xml
oozie.use.system.libpath=true
security_enabled=True
dryrun=False
jobtracker=xxx.xxx:8032
nameNode=hdfs://xx.xx:8020

注意:我们使用的是cloudera cdh5.5,所有必需的jar(sqoop-connector-teradata-1.5c5.jar tdgssconfig.jar terajdbc4.jar)都放在/var/lib/sqoop中,也放在hdfs中。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题