In Hadoop, Generally null values are represented as blank in HDFS file. But in databases null value has a special meaning. When Hive SQL is used to generate reports, then its common to use IS NULL construct. But Hive does not treat blank and null in the same way. Hence output of Hive SQL query with IS NULL construct many not be as per your expectation if you come from database background.
Lets look at an example.
1. Create sample input file We will create an input file with three columns ( name , title, birth_year ) Column values are seperated by pipe(‘|’) character.
If you see the output of this SQL, you will find that it returns zero results which is unexpected.
We were expecting that the result should contain 2 rows.
How to handle this scenario?
Hive developers have already thought about it and Hive provides you a table property“serialization.null.format” which can be used to treat a character of your choice as null in Hive SQL. In this case we want to use blank as null.
Drop above Hive table and use following DDL statement instead of the one specified above