Hive: Misc

This tutorial will show you some misc usage for working with Hive. If you have no installed Hive yet please follow this tutorial.

NVL:
Check if value is null then substitute other value.

SELECT NVL(columnA, 'was null')
FROM test;

CAST:
If columnE was a date and you wanted it to be a string.

SELECT CAST(columnE AS STRING)
FROM test;

Concat:
This will concat the strings together giving “Test the code!”

SELECT CONCAT('Test', ' the ', 'code!');

CONCAT_WS:
This will concat the strings together starting at “Test” and use the first position as the seperator giving “Test the code”

SELECT CONCAT_WS(' ', 'Test', 'the', 'code');

MIN:
This will give the minimum value of columnC

SELECT MIN(columnC) AS Max_ColumnC
FROM test;

MAX:
This will give the maximum value of columnC

SELECT MAX(columnC) AS Max_ColumnC
FROM test;

DISTINCT:
This will select the distinct columnA and columnB

SELECT DISTINCT columnA, columnB
FROM test;

CASE:

SELECT CASE WHEN columnA = 'val' THEN 'Is Val' ELSE 'Not Val' END
FROM test;

COLLECT_SET:
This will collect all values in columnC and select the first index.

SELECT COLLECT_SET(columnC)[0]
FROM test;

CURRENT_DATE:
This will give you the current date in the format YYYY-MM-dd.

SELECT CURRENT_DATE();

UNIX_TIMESTAMP:
This will give you the current timestamp from EPOCH. IE: 1549591492

SELECT UNIX_TIMESTAMP();

FROM_UNIXTIME:
This will take a timestamp and display it in the format YYYY-MM-dd HH:MM:SS.

SELECT FROM_UNIXTIME(1549591492);

TO_DATE:
This will convert a date to YYYY-MM-dd.

SELECT TO_DATE('2019-02-01 01:01:01');

YEAR:

SELECT YEAR(columnE)
FROM test;

MONTH:

SELECT MONTH(columnE)
FROM test;

DAY:

SELECT DAY(columnE)
FROM test;

DATE_ADD:

SELECT DATE_ADD('2019-01-01', 4);

UPPER:
This will upper case columnA.

SELECT UPPER(columnA)
FROM test;

LOWER:
This will lower case columnA.

SELECT LOWER(columnA)
FROM test;

TRIM:
This will trim leading and trailing spaces in columnA.

SELECT TRIM(columnA)
FROM test;

LITERALS:
If a column contains a space you will need to use literal in order to use the AS keyword or when you are defining it in the create table command. Although I don’t recommend this it is possible.

SELECT columnA AS `test column`
FROM test;

 

 

 

Python: Working with DateTimes

In this tutorial I will show you the different ways of working with dates and times in python. Which includes working with milliseconds. You should note this isn’t all available options just some that I have encountered over the years.

Install Python Packages:

Open cmd/terminal and if required navigate to your sites working folder. (note: if you are working in a virtual env you should ensure you source it first).

pip install python-dateutil

There are many different packages that we can use to work with date and times. You need to decide what is right for you.

dateutil:

The following will convert the date string you give it fast and easily. This gives you back the datetime object. Notice how we don’t need to pass it a date time format. To me this is very convenient.

from dateutil import parser

date_str = '2017-06-06'
date_time_str = '2017-06-07 12:34'
date_time_str_2 = '2017-06-07 12:34:46'
date_time_str_3 = '2017-06-07 12:34:42.234'

result = parser.parse(date_str)
print(result) #2017-06-06 00:00:00
result = parser.parse(date_time_str)
print(result) #2017-06-07 12:34:00
result = parser.parse(date_time_str_2)
print(result) #2017-06-07 12:34:46
result = parser.parse(date_time_str_3)
print(result) #2017-06-07 12:34:42.234000

datetime:

The following will convert the date string you give it fast and easily. This gives you back the datetime object. Notice how we need to pass the format of the datetime. If you don’t you will get an exception. This is a convenient way if you know the format before hand. But that might not always be the case.

import datetime

date_str = '2017-06-06'
date_time_str = '2017-06-07 12:34'
date_time_str_2 = '2017-06-07 12:34:46'
date_time_str_3 = '2017-06-07 12:34:42.234'

result = datetime.datetime.strptime(date_str, "%Y-%m-%d")
print(result) #2017-06-06 00:00:00
result = datetime.datetime.strptime(date_time_str, "%Y-%m-%d %H:%M")
print(result) #2017-06-07 12:34:00
result = datetime.datetime.strptime(date_time_str_2, "%Y-%m-%d %H:%M:%S")
print(result) #2017-06-07 12:34:46
result = datetime.datetime.strptime(date_time_str_3, "%Y-%m-%d %H:%M:%S.%f")
print(result) #2017-06-07 12:34:42.234000

The above all works however the following example will not. Why do you think this is?

import datetime

date_time_str = '2017-06-07 12:34:46'

try:
    datetime.datetime.strptime(date_time_str, "%Y-%m-%d %H:%M:%S")
except:
    pass #just for this example don't do this lol

The reason is because datetime expects the correct format to be supplied. We gave it hour minute second but not milliseconds. You will get the following exception (ValueError: unconverted data remains: .234)

Timestamps:

Sometimes we want to convert the date to unix (epoch) time or vise versa.

From Date:
from dateutil import parser
from datetime import timezone

date_time_str = '2017-06-07 17:34:42.234'
result = parser.parse(date_time_str)

timestamp = result.replace(tzinfo=timezone.utc).timestamp()
print(timestamp) #1496856882.234

This gives us the timestamp as a float as 1496856882.234.

From Timestamp:
from dateutil import parser
import datetime

timestamp = 1496856882.234

result = datetime.datetime.fromtimestamp(timestamp)
print(result) #2017-06-07 13:34:42.234000

result = datetime.datetime.utcfromtimestamp(timestamp)
print(result) #2017-06-07 17:34:42.234000

Get Date Parts:

If you want to get specific date parts such as the year, month, day, hour, etc.

import datetime
from dateutil import parser

result = parser.parse(date_time_str_3)
print(result) #2017-06-07 12:34:42.234000

year = result.year #2017
month = result.month #6
day = result.day #7
hour = result.hour #12
minute = result.minute #34
second = result.second #42
millisecond = result.microsecond #234000

Add To Date:

If you want to add time to a date.

import datetime
from dateutil import parser
from datetime import timezone, timedelta

date_time_str = '2017-06-07 17:34:42.234'
result = parser.parse(date_time_str)
print(result) #2017-06-07 17:34:42.234000

timestamp = result.replace(tzinfo=timezone.utc).timestamp()
print(timestamp) #1496856882.234

#Add 10 seconds to datetime
new_time = int((datetime.datetime.fromtimestamp(timestamp) + timedelta(milliseconds=10000)).timestamp() * 1000)
print(new_time) #1496856892234

As you can see you can 10 seconds has been added the datetime.

datetime strftime

from datetime import datetime

now = datetime.now()
datetime_str = now.strftime("%Y-%m-%d %H:%M:%S")
print(datetime_str)

datetime fromisoformat

from datetime import datetime

print(datetime.fromisoformat("2024-04-09 13:48:20"))