HDInsight Spark Cluster Customization with Boostrapping and Custom Action Scripts

Christoph Kiefer 141 Reputation points
2020-08-27T18:29:32.387+00:00

Hello All

We use both bootstrapping (via ARM templates) and action scripts to provision our HDInsight Spark Cluster (HDI 3.6, Spark 2.3).

We face several challenges (in no particular order):

First, some of the bootstrapping statements are not applied on the final cluster. For instance, this snippet has no effect:

"yarn-site": {  
    "yarn.acl.enable": "true",  
    "yarn.admin.acl": "*"  
}  

The final cluster has a different value for property yarn.admin.acl: 'ckiefer-cloud, aq-xxx, abc', but not '*'.

Second, how shall we deal with files that cannot be addressed with bootstrapping, for instance the Yarn capacity scheduler configs in /etc/hadoop/conf/capacity-scheduler.xml? We tried to solve this with the attached code snippet from our cluster customization post-creation script action script 21021-code-snippet.txt

However, for some reason it has no effects, the values are still different on the final cluster. It's as if there is still another process reverting this changes, but we don't know.

Third, we manually edited file /etc/hadoop/conf/hdfs-site.xml. When we restart HDFS service from Ambari, it throws away this change and overwrites it again with some kind of default value. However, if we change this value from within Ambari first and then restart HDFS service, the value is persisted. How is that possible? It looks as if there is some other "source" of files with values which are taken to overwrite these settings.

We are a bit lost and happy about any help / feedback that brings us further.

BR, Christoph

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
207 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 85,346 Reputation points Microsoft Employee
    2020-09-02T04:40:49.64+00:00

    Hello @Christoph Kiefer ,

    HDInsight relies on Ambari to manage all the configurations. If you just change the underlying configuration data, Ambari won’t reflect this change. The right way of doing this is to use Ambari rest APIs to change the configuration and then restart the corresponding services.

    For more details, refer https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful