Hive: Struct

This tutorial will show you how to use struct. If you have no installed Hive yet please follow this tutorial.

Create Table with Struct:

CREATE TABLE test_struct (
    columnA STRING,
    columnB VARCHAR(15),
    columnC INT,
    columnD TIMESTAMP,
    columnE DATE,
    columnF STRUCT<key:STRING, value:INT>
)
STORED AS ORC;

Insert Data:

INSERT INTO test_struct 
SELECT '1', '2', 1, '2019-02-07 20:58:27', '2019-02-07', NAMED_STRUCT('key', 'val', 'value', 1);

Select Data:
This will give back the value of “key” and “value” in columnF.

SELECT columnF.key, columnF.value
FROM test_struct;

Hive: Map

This tutorial will show you how to use map. If you have no installed Hive yet please follow this tutorial.

Create Table with Map:

CREATE TABLE test_map (
    columnA STRING,
    columnB VARCHAR(15),
    columnC INT,
    columnD TIMESTAMP,
    columnE DATE,
    columnF MAP<STRING, INT>
)
STORED AS ORC;

Insert Data:

INSERT INTO test_map 
SELECT '1', '2', 1, '2019-02-07 20:58:27', '2019-02-07', MAP('Val', 1);

Select Data:
This will give back the value of “Val” in columnF.

SELECT columnF['Val']
FROM test_map;

Hive: Misc

This tutorial will show you some misc usage for working with Hive. If you have no installed Hive yet please follow this tutorial.

NVL:
Check if value is null then substitute other value.

SELECT NVL(columnA, 'was null')
FROM test;

CAST:
If columnE was a date and you wanted it to be a string.

SELECT CAST(columnE AS STRING)
FROM test;

Concat:
This will concat the strings together giving “Test the code!”

SELECT CONCAT('Test', ' the ', 'code!');

CONCAT_WS:
This will concat the strings together starting at “Test” and use the first position as the seperator giving “Test the code”

SELECT CONCAT_WS(' ', 'Test', 'the', 'code');

MIN:
This will give the minimum value of columnC

SELECT MIN(columnC) AS Max_ColumnC
FROM test;

MAX:
This will give the maximum value of columnC

SELECT MAX(columnC) AS Max_ColumnC
FROM test;

DISTINCT:
This will select the distinct columnA and columnB

SELECT DISTINCT columnA, columnB
FROM test;

CASE:

SELECT CASE WHEN columnA = 'val' THEN 'Is Val' ELSE 'Not Val' END
FROM test;

COLLECT_SET:
This will collect all values in columnC and select the first index.

SELECT COLLECT_SET(columnC)[0]
FROM test;

CURRENT_DATE:
This will give you the current date in the format YYYY-MM-dd.

SELECT CURRENT_DATE();

UNIX_TIMESTAMP:
This will give you the current timestamp from EPOCH. IE: 1549591492

SELECT UNIX_TIMESTAMP();

FROM_UNIXTIME:
This will take a timestamp and display it in the format YYYY-MM-dd HH:MM:SS.

SELECT FROM_UNIXTIME(1549591492);

TO_DATE:
This will convert a date to YYYY-MM-dd.

SELECT TO_DATE('2019-02-01 01:01:01');

YEAR:

SELECT YEAR(columnE)
FROM test;

MONTH:

SELECT MONTH(columnE)
FROM test;

DAY:

SELECT DAY(columnE)
FROM test;

DATE_ADD:

SELECT DATE_ADD('2019-01-01', 4);

UPPER:
This will upper case columnA.

SELECT UPPER(columnA)
FROM test;

LOWER:
This will lower case columnA.

SELECT LOWER(columnA)
FROM test;

TRIM:
This will trim leading and trailing spaces in columnA.

SELECT TRIM(columnA)
FROM test;

LITERALS:
If a column contains a space you will need to use literal in order to use the AS keyword or when you are defining it in the create table command. Although I don’t recommend this it is possible.

SELECT columnA AS `test column`
FROM test;

 

 

 

Hive: Tables

This tutorial will show you some common usage for working with tables. If you have no installed Hive yet please follow this tutorial.

Show Tables:

SHOW TABLES;
SHOW TABLES LIKE '*test*';

Table Creation:

CREATE TABLE test (
columnA STRING,
columnB VARCHAR(15),
columnC INT,
columnD TIMESTAMP,
columnE DATE
)
STORED AS ORC;

Table Creation with Partitioning:

CREATE TABLE test_partition (
columnA STRING,
columnB VARCHAR(15),
columnC INT,
columnD TIMESTAMP,
columnE DATE
)
PARTITIONED BY (columnF INT)
STORED AS ORC;

Inline Table Creation:

CREATE TABLE test_inline STORED AS ORC AS
SELECT *
FROM test;

Temporary Table Creation:

CREATE TEMPORARY TABLE temp (
columnA STRING,
columnB VARCHAR(15),
columnC INT,
columnD TIMESTAMP,
columnE DATE
)
STORED AS ORC;

DESC Table:
This will show you the basic definition of a table.

DESC test;

DESC EXTENDED Table:
This will show you the extended definition of a table.

DESC EXTENDED test;

Drop Table:

DROP TABLE IF EXISTS temp;

Hive & Java: Connect to Remote Kerberos Hive using KeyTab

In this tutorial I will show you how to connect to remote Kerberos Hive cluster using Java. If you haven’t install Hive yet follow the tutorial.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

#Import it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"

#Check it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

#If you want to delete it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml:

<dependency>
	<groupId>org.apache.hive</groupId>
	<artifactId>hive-jdbc</artifactId>
	<version>2.3.3</version>
	<exclusions>
		<exclusion>
			<groupId>jdk.tools</groupId>
			<artifactId>jdk.tools</artifactId>
		</exclusion>
	</exclusions>
</dependency>

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

Connect:

// Setup the configuration object.
final Configuration config = new Configuration();

config.set("fs.defaultFS", "swebhdfs://hadoop:50470");
config.set("hadoop.security.authentication", "kerberos");
config.set("hadoop.rpc.protection", "integrity");

System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
System.setProperty("java.security.krb5.realm", "REALM.CA");
System.setProperty("java.security.krb5.kdc", "REALM.CA");
System.setProperty("sun.security.krb5.debug", "true");
System.setProperty("javax.net.debug", "all");
System.setProperty("javax.net.ssl.keyStorePassword","changeit");
System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStorePassword","changeit");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");

UserGroupInformation.setConfiguration(config);
UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("hive/hadoop@REALM.CA", "c:\\data\\hive.service.keytab"));

System.out.println(UserGroupInformation.getLoginUser());
System.out.println(UserGroupInformation.getCurrentUser());

//Add the hive driver
Class.forName("org.apache.hive.jdbc.HiveDriver");

//Connect to hive jdbc
Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop:10000/default;principal=hive/hadoop@REALM.CA");
Statement statement = connection.createStatement();

//Create a table
String createTableSql = "CREATE TABLE IF NOT EXISTS "
		+" employee ( eid int, name String, "
		+" salary String, designation String)"
		+" COMMENT 'Employee details'"
		+" ROW FORMAT DELIMITED"
		+" FIELDS TERMINATED BY '\t'"
		+" LINES TERMINATED BY '\n'"
		+" STORED AS TEXTFILE";

System.out.println("Creating Table: " + createTableSql);
statement.executeUpdate(createTableSql);

//Show all the tables to ensure we successfully added the table
String showTablesSql = "show tables";
System.out.println("Show All Tables: " + showTablesSql);
ResultSet res = statement.executeQuery(showTablesSql);

while (res.next()) {
	System.out.println(res.getString(1));
}

//Drop the table
String dropTablesSql = "DROP TABLE IF EXISTS employee";

System.out.println("Dropping Table: " + dropTablesSql);
statement.executeUpdate(dropTablesSql);

System.out.println("Finish!");

Hive Kerberos Installation

We are going to install Hive over Hadoop and perform a basic query. Ensure you install Kerberos and Hadoop with Kerberos.

This assumes your hostname is “hadoop”

Download Hive:

wget http://apache.forsale.plus/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
tar -xzf apache-hive-2.3.3-bin.tar.gz
sudo mv apache-hive-2.3.3-bin /usr/local/hive
sudo chown -R root:hadoopuser /usr/local/hive/

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#HIVE VARIABLES START
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:/usr/local/hive/lib/*
#HIVE VARIABLES STOP

 source ~/.bashrc

Create warehouse on hdfs

kinit -kt /etc/security/keytabs/myuser.keytab myuser/hadoop@REAL.CA
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir /tmp
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse

Create Kerberos Principals

cd /etc/security/keytabs
sudo kadmin.local
addprinc -randkey hive/hadoop@REALM.CA
addprinc -randkey hivemetastore/hadoop@REALM.CA
addprinc -randkey hive-spnego/hadoop@REALM.CA
xst -kt hive.service.keytab hive/hadoop@REALM.CA
xst -kt hivemetastore.service.keytab hivemetastore/hadoop@REALM.CA
xst -kt hive-spnego.service.keytab hive-spnego/hadoop@REALM.CA
q

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

hive-env.sh

cd $HIVE_HOME/conf
sudo cp hive-env.sh.template hive-env.sh

sudo nano /usr/local/hive/conf/hive-env.sh

#locate "HADOOP_HOME" and change to be this
export HADOOP_HOME=/usr/local/hadoop

#locate "HIVE_CONF_DIR" and change to be this
export HIVE_CONF_DIR=/usr/local/hive/conf

hive-site.xml

Chekck out this link for the configuration properties.

sudo cp /usr/local/hive/conf/hive-default.xml.template /usr/local/hive/conf/hive-site.xml

sudo nano /usr/local/hive/conf/hive-site.xml

#Modify the following properties

<property>
	<name>system:user.name</name>
	<value>${user.name}</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionURL</name>
	<value>jdbc:postgresql://myhost:5432/metastore</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionDriverName</name>
	<value>org.postgresql.Driver</value>
</property>
<property>
	<name>hive.metastore.warehouse.dir</name>
	<value>/user/hive/warehouse</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionUserName</name>
	<value>hiveuser</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionPassword</name>
	<value>PASSWORD</value>
</property>
<property>
	<name>hive.exec.local.scratchdir</name>
	<value>/tmp/${system:user.name}</value>
	<description>Local scratch space for Hive jobs</description>
</property>
<property>
	<name>hive.querylog.location</name>
	<value>/tmp/${system:user.name}</value>
	<description>Location of Hive run time structured log file</description>
</property>
<property>
	<name>hive.downloaded.resources.dir</name>
	<value>/tmp/${hive.session.id}_resources</value>
	<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
	<name>hive.server2.logging.operation.log.location</name>
	<value>/tmp/${system:user.name}/operation_logs</value>
	<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
	<name>hive.metastore.uris</name>
	<value>thrift://0.0.0.0:9083</value>
	<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
	<name>hive.server2.webui.host</name> 
	<value>0.0.0.0</value>
</property>
<property>
	<name>hive.server2.webui.port</name> 
	<value>10002</value>
</property>
<property>
	<name>hive.metastore.port</name>
	<value>9083</value>
</property>
<property>
	<name>hive.server2.transport.mode</name>
	<value>binary</value>
</property>
<property>
	<name>hive.server2.thrift.sasl.qop</name>
	<value>auth-int</value>
</property>
<property>
	<name>hive.server2.authentication</name>
	<value>KERBEROS</value>
	<description>authenticationtype</description>     
</property>
<property>
	<name>hive.server2.authentication.kerberos.principal</name>
	<value>hive/_HOST@REALM.CA</value>
	<description>HiveServer2 principal. If _HOST is used as the FQDN portion, it will be replaced with the actual hostname of the running instance.</description>
</property>
<property>
	<name>hive.server2.authentication.kerberos.keytab</name>
	<value>/etc/security/keytabs/hive.service.keytab</value>
	<description>Keytab file for HiveServer2 principal</description>  
</property>
<property>
	<name>hive.metastore.sasl.enabled</name>
	<value>true</value>
	<description>If true, the metastore thrift interface will be secured with SASL. Clients
	must authenticate with Kerberos.</description>
</property>
<property>
	<name>hive.metastore.kerberos.keytab.file</name>
	<value>/etc/security/keytabs/hivemetastore.service.keytab</value>
	<description>The path to the Kerberos Keytab file containing the metastore thrift 
	server's service principal.</description>
</property>
<property>
	<name>hive.metastore.kerberos.principal</name>
	<value>hivemetastore/_HOST@REALM.CA</value>
	<description>The service principal for the metastore thrift server. The special string _HOST will be replaced automatically with the correct host name.</description>
</property>
<property>
	<name>hive.security.authorization.enabled</name>
	<value>true</value>
	<description>enable or disable the hive client authorization</description>
</property>
<property>
	<name>hive.metastore.pre.event.listeners</name>
	<value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
	<description>List of comma separated listeners for metastore events.</description>
</property>
<property>
	<name>hive.security.metastore.authorization.manager</name>
	<value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
	<description>
	Names of authorization manager classes (comma separated) to be used in the metastore
	for authorization. The user defined authorization class should implement interface
	org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.
	All authorization manager classes have to successfully authorize the metastore API
	call for the command execution to be allowed.
</description>
</property>
<property>
	<name>hive.security.metastore.authenticator.manager</name>
	<value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
	<description>
	authenticator manager class name to be used in the metastore for authentication.
	The user defined authenticator should implement interface org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.
</description>
</property>
<property>
	<name>hive.security.metastore.authorization.auth.reads</name>
	<value>true</value>
	<description>If this is true, metastore authorizer authorizes read actions on database, table</description>
</property>
<property>     
	<name>datanucleus.autoCreateSchema</name>     
	<value>false</value>
</property>

Hadoop core-site.xml

Notice here how it’s .hive. that is used with the storage based authentication.

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
	<name>hadoop.proxyuser.hive.hosts</name>
	<value>*</value>
</property>
<property>
	<name>hadoop.proxyuser.hive.groups</name>
	<value>*</value>
</property>

Install Postgres 9.6

Follow this install for installing Psotgresql 9.6

sudo su - postgres
psql

CREATE USER hiveuser WITH PASSWORD 'PASSWORD';
CREATE DATABASE metastore;
GRANT ALL PRIVILEGES ON DATABASE metastore TO hiveuser;
\q
exit

Initiate Postgres Schema

schematool -dbType postgres -initSchema

Start Metastore & HiveServer2

nohup /usr/local/hive/bin/hive --service metastore --hiveconf hive.log.file=hivemetastore.log >/var/log/hive/hivemetastore.out 2>/var/log/hive/hivemetastoreerr.log &

nohup /usr/local/hive/bin/hiveserver2 --hiveconf hive.metastore.uris=" " --hiveconf hive.log.file=hiveserver2.log >/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2err.log &

Auto Start

sudo mkdir /var/log/hive/
sudo chown root:hduser /var/log/hive
sudo chmod 777 /var/log/hive

crontab -e

#Add the following
@reboot nohup /usr/local/hive/bin/hive --service metastore --hiveconf hive.log.file=hivemetastore.log >/var/log/hive/hivemetastore.out 2>/var/log/hive/hivemetastoreerr.log &
@reboot nohup /usr/local/hive/bin/hiveserver2 --hiveconf hive.metastore.uris=" " --hiveconf hive.log.file=hiveserver2.log >/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2err.log &

Now you can check the hive version

 hive --version

Hive Web URL

http://hadoop:10002/

Beeline

#We first need to have a ticket to access beeline using the hive kerberos user we setup earlier.
kinit -kt /etc/security/keytabs/hive.service.keytab hive/hadoop@REALM.CA

#Now we can get into beeline using that principal
beeline -u "jdbc:hive2://0.0.0.0:10000/default;principal=hive/hadoop@REALM.CA;"

#You can also just get into beeline then connect from there
beeline
beeline>!connect jdbc:hive2://0.0.0.0:10000/default;principal=hive/hadoop@REALM.CA

#Disconnect from beeline
!q

References

http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
https://maprdocs.mapr.com/home/Hive/Config-RemotePostgreSQLForHiveMetastore.html
https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool#HiveSchemaTool-TheHiveSchemaTool
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration