Microsoft Azure now has a Preview version of HDInsight Cluster 3.1. In this blog post, I will walk you through the setup process. When you launch the HDInsight cluster creation Wizard in the Azure Management Portal, you will need to select the Custom Create option (Screenshot 1).
In the New HDInsight Cluster page, you will need to pick the version as 3.1 (preview, HDP 2.1, Hadoop 2.4) which is Hadoop 2.4 with HDInsight cluster version 3.1 using Hortonworks Data Platform 2.1 (see Screenshot 2). Remember to pick the region that you want.
In the next page (Configure Cluster User), you will have to assign a user name and a password which must comply with the following criteria:
an uppercase letter, a lowercase letter, a number, a special character
If you want to enable the “Enter the Hive/Oozie Metastore” option, then you will need at least one SQL database created and configured in the same data center as the HDInsight cluster.
The next task is to configure the storage to host all the nodes. In the Storage Account page, you get to pick from three options:
Creating a new Storage Account
Using an existing Storage Account
Using a Storage Account from an existing subscription
I chose to create a new Storage Account which allowed me to specify the Storage Account name and the container name.
Once you are providing the configuration information for your HDInsight Cluster storage, you will be on your way to create your cluster. Azure does it’s magic and voila, after a few minutes of waiting…. you will have your HDInsight cluster staring back at you in the Azure Management Portal. Looking into the configuration, I find that I have 4 data nodes and 2 head nodes (as per the configuration requested).
Click HDINSIGHT from the left pane. You shall see the cluster created. Click the cluster name where you want to run the Hive job. Click MANAGE CLUSTER from the bottom of the page to open HDInsight cluster dashboard. It opens a Web page on a different browser tab. The URL is <cluster name>.azurehdinsight.net/Home/ where I can login using the user name and password that I created during my configuration. This URL is accessible without the Azure Management Portal as well. In the home page, I get access to:
Hive Editor to write and submit hive queries.
Job History to view the status of the jobs submitted and look into their details
File Browser to get a list of the files available
Azure HDInsight clusters are stateless so it is easy to drop and re-create a cluster. When you are not using your cluster, you could drop the cluster to save the cost especially if you are evaluating or in development phase. When you need the cluster again recreate it again.
If you want remote desktop access to your cluster, then you will need to enable Remote Access to the Azure HDInsight using a user account which does not exist on the nodes. This means that the user provisioned in the earlier section cannot be used. The remote access can be granted to a maximum period of 7 days only!
When you look into your storage account, you will find a large number of files in the container that you chose to host the cluster files as shown in Screenshot 3.
Once remote access is configured and you log into the head node, you will find that the machine is a Windows Server 2012R2 node.
The version information available from the command line confirms that I am on 2.4. On the desktop of the head node, I have access to three html files namely,
- Hadoop Name Node Status
- Hadoop Service Availability
- Hadoop Yarn Status
So that was quick view of what I get in the Preview. In future blogs posts, I shall share new adventures with HDInsight!
What’s new in the Hadoop cluster versions provided by HDInsight
HDInsight Pricing Details
Get started using Hadoop 2.4 in HDInsight