在缓存中找不到spark hdfs\u委派\u令牌

ttp71kqs  于 2021-05-24  发布在  Spark
关注(0)|答案(1)|浏览(482)
I am running simplest Driver alone long running job to reproduce this error
Hadoop Version        2.7.3.2.6.5.0-292 
Spark-core version    2_11.2.3.0.2.6.5.0-292
Code:
FileSystem fs = tmpPath.getFileSystem(sc.hadoopConfiguration())
log.info("Path {} is ",path,fs.exists(tmpPath);

行为:我的工作运行了17-18个小时没有任何问题,在新的钥匙被释放后,作为工作的一部分 HadoopFSDelagationTokenProvider 作业继续使用新颁发的委派令牌运行,但在委派令牌续订后的1小时内,在缓存中找不到带有错误令牌的作业失败。我已经为涉及的namenodes编程生成了自己的dfs.adddelegationtoken,我看到了相同的行为。
问题:
委托令牌从服务器中删除的可能性有多大?哪些属性控制此操作?。
服务器端日志显示这个令牌是关于删除或从缓存中删除的。

Path /test/abc.parquet is true
Path /test/abc.parquet is true
INFO Successfully logged into KDC
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,ugi=qa_user@ABC.com(auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615466 for qa_user on ha:hdfs:hacluster
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,ugi=qa_user@ABC.com(auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615467 for qa_user on ha:hdfs:hacluster
INFO writing out delegation tokens to hdfs://abc/user/qa/.sparkstaging/application_121212.....tmp
INFO delegation tokens written out successfully, renaming file to hdfs://.....
INFO delegation token file rename complete(org.apache.spark.deploy.yarn.security.AMCredentialRenewer)
Scheduling login from keytab in 64799125 millis
Path /test/abc.parquet is true
Path /test/abc.parquet is true

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 31615466 for qa_user) can't be found in cache
 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
 at org.apache.hadoop.ipc.Client.call(Client.java:1498)
 at org.apache.hadoop.ipc.Client.call(Client.java:1398)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
 at com.sun.proxy.$Proxy13.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:620)
 at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
FYI submitted in yarn-cluster-mode with:
--keytab /path/to/the/headless-keytab,
--principal principalNameAsPerTheKeytab 
--conf spark.hadoop.fs.hdfs.impl.disable.cache=true
Note that Token renewer is issuing new keys and new keys are working too, But it;s somehow gets revoked from server, AM logs doesn't have any clue on the same.
t40tm48m

t40tm48m1#

回答我自己的问题:
这里有几点非常重要。
委派令牌是存储在usergroupinformation.getcredentials.getalltokens()中的单个副本,它可以由save jvm中运行的任何其他线程更新。我的问题通过设置 mapreduce.job.complete.cancel.delegation.tokens=false 对于在相同上下文中运行的所有其他作业,尤其是运行mapreduce上下文的作业。
hadoopfsdelagationtokenprovider应该为每个 (fraction*renewal time) i、 e.违约 0.75*24 hrs 如果您提交的作业带有--keytab和--principal
一定要准备好 fs.disable.cache 对于hdfs文件系统,即每次您获得新的文件系统对象时,这是一个代价高昂的操作,但您肯定会获得带有新密钥的新的fsobject,而不是从 CACHE.get(fsname) .
如果不起作用,您可以通过调用new credentials()来创建自己的委派令牌https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/api/org/apache/hadoop/fs/filesystem.html#adddelegationtokens(java.lang.string,%20org.apache.hadoop.security.credentials),但必须使用 kerberosUGI.doAS({});

相关问题