Zeppelin notebook - sc.textFile does not work for HDI with ESP

Steven Lai 1 Reputation point
2021-02-11T15:14:14.283+00:00

We have HDI cluster with ESP enabled.

From our zeppelin notebook, when I read data to a dataset (spark.read.text) it works but when I try to read it to an RDD (sc.textFile), I get an authentication exception:

66978-screen.png

Note that, while sc.textFile failed in zeppelin, it works well from spark-shell. Moreover, "spark.read.text(path).rdd" (basically just read data into dataset and convert it to an RDD) also works in zeppelin

I found some related information from Internet such as https://community.cloudera.com/t5/Support-Questions/How-to-make-Zeppelin-s-User-Impersonation-work-with-Kerberos/td-p/212817 but if I got other errors if I choose 'User impersonate'.

Could you please advice that do I need to enable 'user impersonate' in order for 'sc.textFile' to work?

My livy2 interpreter config is as follows:

livy.spark.driver.cores
livy.spark.driver.memory
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout
livy.spark.dynamicAllocation.enabled
livy.spark.dynamicAllocation.initialExecutors
livy.spark.dynamicAllocation.maxExecutors
livy.spark.dynamicAllocation.minExecutors
livy.spark.executor.cores
livy.spark.executor.instances
livy.spark.executor.memory
livy.spark.jars abfss://rtgasia-negotiation@e9vpzaab1y7fz2q1xprivate.dfs.core.windows.net/apps/rtgasia/pyspark_mapping_engine_v1.0/spark-avro_2.11-4.0.0.jar,abfss://rtgasia-negotiation@e9vpzaab1y7fz2q1xprivate.dfs.core.windows.net/apps/rtgasia/demo/circle-poc-assembly-1.0.jar
livy.spark.jars.packages
zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2C8A4SZ9T_livy2
zeppelin.interpreter.output.limit 102400
zeppelin.livy.concurrentSQL false
zeppelin.livy.displayAppInfo true
zeppelin.livy.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.livy.principal zeppelin-ecialyxizp-projectspark@SGAZUREPRD.ONMICROSOFT.COM
zeppelin.livy.pull_status.interval.millis 1000
zeppelin.livy.session.create_timeout 120
zeppelin.livy.spark.sql.maxResult 1000
zeppelin.livy.url http://hn0-ecialy.sgazureprd.onmicrosoft.com:8998
zeppelin.spark.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.spark.principal zeppelin-ecialyxizp-projectspark@SGAZUREPRD.ONMICROSOFT.COM

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
    2021-02-22T06:32:00.287+00:00

    Hello @Steven Lai ,

    As per repro from our end, we are able to see the results without any issues.

    Note: We are supposed to use the text file in the example and not the jar file.

    70511-image.png.

    I would suggest you to re-try with the text file and do let us know the status.

    Hope this helps. Do let us know if you any further queries.

    ------------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.