Control network traffic in Azure HDInsight
Network traffic in an Azure Virtual Networks can be controlled using the following methods:
Network security groups (NSG) allow you to filter inbound and outbound traffic to the network. For more information, see the Filter network traffic with network security groups document.
Network virtual appliances (NVA) can be used with outbound traffic only. NVAs replicate the functionality of devices such as firewalls and routers. For more information, see the Network Appliances document.
As a managed service, HDInsight requires unrestricted access to the HDInsight health and management services both for incoming and outgoing traffic from the VNET. When using NSGs, you must ensure that these services can still communicate with HDInsight cluster.
HDInsight with network security groups
If you plan on using network security groups to control network traffic, perform the following actions before installing HDInsight:
Identify the Azure region that you plan to use for HDInsight.
Identify the service tags required by HDInsight for your region. There are multiple ways to obtain these service tags:
- Consult the list of published service tags in Network security group (NSG) service tags for Azure HDInsight.
- If your region isn't present in the list, use the Service Tag Discovery API to find a service tag for your region.
- If you are unable to use the API, download the service tag JSON file and search for your desired region.
Create or modify the network security groups for the subnet that you plan to install HDInsight into.
- Network security groups: allow inbound traffic on port 443 from the IP addresses. This will ensure that HDInsight management services can reach the cluster from outside the virtual network. For Kafka REST proxy enabled clusters, allow inbound traffic on port 9400 as well. This will ensure that Kafka REST proxy server is reachable.
For more information on network security groups, see the overview of network security groups.
Controlling outbound traffic from HDInsight clusters
For more information on controlling outbound traffic from HDInsight clusters, see Configure outbound network traffic restriction for Azure HDInsight clusters.
Forced tunneling to on-premises
Forced tunneling is a user-defined routing configuration where all traffic from a subnet is forced to a specific network or location, such as your on-premises network or Firewall. Forced tunneling of all data transfer back to on-premises is not recommended due to large volumes of data transfer and potential performance impact.
Customers who are interested to set up forced tunneling, should use custom metastores and set up the appropriate connectivity from the cluster subnet or on-premises network to these custom metastores.
To see an example of the UDR set up with Azure Firewall, see Configure outbound network traffic restriction for Azure HDInsight clusters.
Required ports
If you plan on using a firewall and access the cluster from outside on certain ports, you might need to allow traffic on those ports needed for your scenario. By default, no special filtering of ports is needed as long as the Azure management traffic explained in the previous section is allowed to reach cluster on port 443.
For a list of ports for specific services, see the Ports used by Apache Hadoop services on HDInsight document.
For more information on firewall rules for virtual appliances, see the virtual appliance scenario document.
Next steps
- For code samples and examples of creating Azure Virtual Networks, see Create virtual networks for Azure HDInsight clusters.
- For an end-to-end example of configuring HDInsight to connect to an on-premises network, see Connect HDInsight to an on-premises network.
- For more information on Azure virtual networks, see the Azure Virtual Network overview.
- For more information on network security groups, see Network security groups.
- For more information on user-defined routes, see User-defined routes and IP forwarding.
- For more information on virtual networks, see Plan VNETs for HDInsight.