Spark2.X cannot work with Hadoop 2.7.X timeline service

Spark2.1 + Hadoop 2.7.3

Spark on Yarn, use client deploy-mode, submit one spark job from client. It runs the following errors.

[xxx@master spark]$ bin/spark-shell --master yarn --deploy-mode client
21/01/29 18:21:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
… 47 elided
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
… 61 more


Do some debugs, it is due to the class conflicts for jersey library. For Hadoop-yarn, it uses jersey 1.19, however, spark-2.x uses jersey 2.22. This is the common issue when we do some applications on distributed computation framework.

Resolution is straightforward. Please disable the feature of timeline service from yarn perspective.

Add the following config into the yarn-site.xml

<property>
  <name>yarn.timeline-service.enabled</name>
  <value>false</value>
</property>

文章的脚注信息由WordPress的wp-posturl插件自动生成

发表评论

邮箱地址不会被公开。 必填项已用*标注