PowerShell を使用して HDInsight 上の Apache Hadoop で MapReduce ジョブを実行する

[アーティクル]
06/15/2024

このドキュメントでは、Azure PowerShell を使用して HDInsight クラスターの Hadoop で MapReduce ジョブを実行する方法を説明します。

前提条件

HDInsight の Apache Hadoop クラスター。 Azure portal を使用した Apache Hadoop クラスターの作成に関するページを参照してください。
インストール済みの PowerShell Az モジュール。

MapReduce ジョブを実行する

Azure PowerShell では、HDInsight で MapReduce ジョブをリモートで実行できる コマンドレット が提供されます。 PowerShell は、HDInsight クラスター上で実行されている WebHCat (旧称: Templeton) への REST 呼び出しを内部的に行います。

リモート HDInsight クラスターで MapReduce ジョブを実行するときには、以下のコマンドレットが使用されます。

コマンドレット	説明
Connect-AzAccount	Azure サブスクリプションに対して Azure PowerShell を認証します。
New-AzHDInsightMapReduceJobDefinition	指定された MapReduce 情報を使用して、新しい "ジョブ定義" を作成します。
Start-AzHDInsightJob	ジョブ定義を HDInsight に送信し、ジョブを開始します。 "ジョブ" オブジェクトが返されます。
Wait-AzHDInsightJob	ジョブオブジェクトを使用して、ジョブの状態を確認します。ジョブの完了を待機するか、待機時間が上限に達します。
Get-AzHDInsightJobOutput	ジョブの出力を取得するために使用します。

これらのコマンドレットを使用して、HDInsight クラスターでジョブを実行するための手順を以下に示します。

エディターを使用して、次のコードを mapreducejob.ps1として保存します。

# Login to your Azure subscription
$context = Get-AzContext
if ($context -eq $null) 
{
    Connect-AzAccount
}
$context

# Get cluster info
$clusterName = Read-Host -Prompt "Enter the HDInsight cluster name"
$creds=Get-Credential -Message "Enter the login for the cluster"

#Get the cluster info so we can get the resource group, storage, etc.
$clusterInfo = Get-AzHDInsightCluster -ClusterName $clusterName
$resourceGroup = $clusterInfo.ResourceGroup
$storageAccountName=$clusterInfo.DefaultStorageAccount.split('.')[0]
$container=$clusterInfo.DefaultStorageContainer
#NOTE: This assumes that the storage account is in the same resource
#      group as the cluster. If it is not, change the
#      --ResourceGroupName parameter to the group that contains storage.
$storageAccountKey=(Get-AzStorageAccountKey `
    -Name $storageAccountName `
-ResourceGroupName $resourceGroup)[0].Value

#Create a storage context
$context = New-AzStorageContext `
    -StorageAccountName $storageAccountName `
    -StorageAccountKey $storageAccountKey

#Define the MapReduce job
#NOTE: If using an HDInsight 2.0 cluster, use hadoop-examples.jar instead.
# -JarFile = the JAR containing the MapReduce application
# -ClassName = the class of the application
# -Arguments = The input file, and the output directory
$wordCountJobDefinition = New-AzHDInsightMapReduceJobDefinition `
    -JarFile "/example/jars/hadoop-mapreduce-examples.jar" `
    -ClassName "wordcount" `
    -Arguments `
        "/example/data/gutenberg/davinci.txt", `
        "/example/data/WordCountOutput"

#Submit the job to the cluster
Write-Host "Start the MapReduce job..." -ForegroundColor Green
$wordCountJob = Start-AzHDInsightJob `
    -ClusterName $clusterName `
    -JobDefinition $wordCountJobDefinition `
    -HttpCredential $creds

#Wait for the job to complete
Write-Host "Wait for the job to complete..." -ForegroundColor Green
Wait-AzHDInsightJob `
    -ClusterName $clusterName `
    -JobId $wordCountJob.JobId `
    -HttpCredential $creds
# Download the output
Get-AzStorageBlobContent `
    -Blob 'example/data/WordCountOutput/part-r-00000' `
    -Container $container `
    -Destination output.txt `
    -Context $context
# Print the output of the job.
Get-AzHDInsightJobOutput `
    -Clustername $clusterName `
    -JobId $wordCountJob.JobId `
    -HttpCredential $creds

Azure PowerShell コマンドプロンプトを開きます。ディレクトリを mapreducejob.ps1 ファイルの場所に変更し、次のコマンドを使用してスクリプトを実行します。
```
.\mapreducejob.ps1
```
スクリプトを実行すると、HDInsight クラスター名とクラスターログインの入力を求められます。 Azure サブスクリプションの認証が求められる場合もあります。
ジョブが完了すると、次のような出力が返されます。
```
Cluster         : CLUSTERNAME
ExitCode        : 0
Name            : wordcount
PercentComplete : map 100% reduce 100%
Query           :
State           : Completed
StatusDirectory : f1ed2028-afe8-402f-a24b-13cc17858097
SubmissionTime  : 12/5/2014 8:34:09 PM
JobId           : job_1415949758166_0071
```
この出力は、ジョブが正常に完了したことを示しています。

Note

ExitCode が 0 以外の値の場合は、トラブルシューティングをご覧ください

この例では、スクリプトが実行されるディレクトリにある output.txt ファイルにダウンロードしたファイルを格納します。

出力の表示

ジョブによって生成された文字と文字数を確認するには、テキストエディターで output.txt ファイルを開きます。

Note

MapReduce ジョブの出力ファイルは変更できません。そのため、このサンプルを再実行する場合は、出力ファイルの名前を変更する必要があります。

トラブルシューティング

ジョブが完了しても情報が返されない場合は、ジョブのエラーを調べてください。このジョブに関するエラーを表示するには、次のコマンドを mapreducejob.ps1 ファイルの末尾に追加します。その後、ファイルを保存し、スクリプトを再実行します。

# Print the output of the WordCount job.
Write-Host "Display the standard output ..." -ForegroundColor Green
Get-AzHDInsightJobOutput `
        -Clustername $clusterName `
        -JobId $wordCountJob.JobId `
        -HttpCredential $creds `
        -DisplayOutputType StandardError

このコマンドレットは、ジョブ実行時に STDERR に書き込まれた情報を返します。

次のステップ

このように、Azure PowerShell を使用すると、HDInsight クラスターで簡単に MapReduce ジョブを実行し、ジョブステータスを監視し、出力を取得できます。 HDInsight での Hadoop のその他の使用方法に関する情報

次の方法で共有