Tuesday, 21 October 2014

Apache Hive – Getting Started

The Apache Hive™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Source : hive.apache.org

This post is a fast paced, instruction based tutorial that dives directly into using Hive.

Creating a database

A database can be created using the CREATE DATABASE command at the hive prompt.
Syntax:
 CREATE DATABASE <database_name> 
E.g.
hive> CREATE DATABASE test_hive_db;
OK
Time taken: 0.048 seconds
The CREATE DATABASE command creates the database under HDFS at the default location: /user/hive/warehouse
This can be verified using the DESCRIBE command.
Syntax:
DESCRIBE DATABASE <database_name>
E.g.
hive> DESCRIBE DATABASE test_hive_db;
OK
test_hive_db hdfs://localhost:54310/user/hive/warehouse/test_hive_db.db
Time taken: 0.042 seconds, Fetched: 1 row(s)

Using a database

To use a database we can use the USE command.
Syntax:
USE <database_name>
E.g.
hive> USE test_hive_db;
OK
Time taken: 0.045 seconds

Dropping a database

To drop a database we can use the DROP DATABASE command.
Syntax:
DROP DATABASE <database_name>;
E.g.
hive> DROP DATABASE test_hive_db;
OK
Time taken: 0.233 seconds
To drop a database that has tables within it, you need to use the CASCADE directive along with the DROP DATABASE command.
Syntax:
DROP DATABASE <database_name> CASCADE;

Related Posts Plugin for WordPress, Blogger...