我编写了一个程序,使用spark streaming将数据插入到启用kerberos的hbase中。在一批中,我遇到了一个失败的任务。错误如下:
java.io.IOException: Login failure for myuser@example.com from keytab ./user.keytab
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1160)
at com.framework.common.HbaseUtil$.InsertToHbase(HbaseUtil.scala:81)
at com.framework.realtime.RDDUtil$$anonfun$dwsTodwd$2.apply(RDDUtil.scala:203)
at com.framework.realtime.RDDUtil$$anonfun$dwsTodwd$2.apply(RDDUtil.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.security.auth.login.LoginException: Receive timed out
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:767)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
at javax.security.auth.login.LoginContext.login(LoginContext.java:595)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1149)
... 13 more
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:146)
at java.net.DatagramSocket.receive(DatagramSocket.java:816)
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:390)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:343)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:327)
at sun.security.krb5.KdcComm.send(KdcComm.java:219)
at sun.security.krb5.KdcComm.send(KdcComm.java:191)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:319)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:364)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735)
... 25 more
但在第二次尝试中,任务成功了。在我看来,认证过程太长,所以它失败了,在另一次尝试中,这个过程很短。就这样结束了。我说的对吗?如果是或不是,请问如何解决这个问题?我的代码如下:
val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(princ,
keytab)
ugi.doAs(new PrivilegedAction[Unit]() {
def run(): Unit = {
// TODO Auto-generated method stub
var conn: HConnection = null
var htable: HTableInterface = null
conn = HConnectionManager.createConnection(conf)
htable = conn.getTable(tableName)
htable.setAutoFlushTo(false)
for (record <- partitionOfRecords) {
htable.put(record)
}
}
})
1条答案
按热度按时间fiei3ece1#
从hadoop和kerberos-疯狂超越大门章节“错误消息恐惧”。。。
接收超时
通常在堆栈跟踪中,如
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
...at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
... udp套接字。。。切换到tcp-至少,它会更快地失败。更重要的是:
将kerberos切换为使用tcp而不是udp
在
/etc/krb5.conf
:[libdefaults]
udp_preference_limit = 1
一般来说,许多不稳定的kerberos问题似乎只发生在udp中,因此不幸的是默认使用它。。。注意,java还支持
kdc_timeout
配置参数,但它是一个肮脏的烂摊子:mit kerberos文档中未提及
unix/linux文档中未提及bsd除外
只在java文档的最黑暗的角落提到过,这里是Java9,还有一个有趣的侧记,即默认值已经从以毫秒为单位隐式表示的30s更改为某个时刻的30s
几周前,cloudera支持团队发布了一个关于该设置的建议——因为30秒的默认超时可能会在hdfs高可用性或类似的情况下造成级联故障——但可怜的家伙们并不知道他们的建议是什么,因此,他们随机建议“3”或“3s”或“3000”作为显式超时值
还要注意,如果您有多个kdc来实现高可用性,并且这些kdc在
krb5.conf
(或者通过一个dns别名集以循环规则隐式列出,例如)然后在“kdc超时”的情况下,java应该在下一个kdc排队的情况下重试。除非您已达到全局超时。