Hello All
We use both bootstrapping (via ARM templates) and action scripts to provision our HDInsight Spark Cluster (HDI 3.6, Spark 2.3).
We face several challenges (in no particular order):
First, some of the bootstrapping statements are not applied on the final cluster. For instance, this snippet has no effect:
"yarn-site": {
"yarn.acl.enable": "true",
"yarn.admin.acl": "*"
}
The final cluster has a different value for property yarn.admin.acl: 'ckiefer-cloud, aq-xxx, abc', but not '*'.
Second, how shall we deal with files that cannot be addressed with bootstrapping, for instance the Yarn capacity scheduler configs in /etc/hadoop/conf/capacity-scheduler.xml? We tried to solve this with the attached code snippet from our cluster customization post-creation script action script 21021-code-snippet.txt
However, for some reason it has no effects, the values are still different on the final cluster. It's as if there is still another process reverting this changes, but we don't know.
Third, we manually edited file /etc/hadoop/conf/hdfs-site.xml. When we restart HDFS service from Ambari, it throws away this change and overwrites it again with some kind of default value. However, if we change this value from within Ambari first and then restart HDFS service, the value is persisted. How is that possible? It looks as if there is some other "source" of files with values which are taken to overwrite these settings.
We are a bit lost and happy about any help / feedback that brings us further.
BR, Christoph