Hive Table usage in Hadoop
In data engineering world, many times we may have to make schema changes to an existing table or rebuild an existing table's data due to multiple reasons like bugs fell through the cracks or new business requirements and you realized the data needs rebuild to rectify, and so on. Very first thing that we need to understand in such scenarios is; what was the impact or who has been using that data in the past 0-6 months or even more depending on the criticality of the data set. To answer such queries, following code snippet will be very useful to identify the impacted users: << Step:1 hadoop jar /apache/hadoop/share/hadoop/tools/lib/hadoop-streaming- 2.7.1.2.4.2.66-4 .jar \ -Dstream.non.zero.exit.is.failure=false \ -Dmapred.job.queue.name= <QUEUE Name> \ -Dmapred.job.name="grepper" \ -Dmapred.reduce.tasks=1 \ -input /logs/<HADOOP NAMENODE>/auditlog/YYYY-* \ -output <a HDFS location where your account has write access>/ \ -mapper ...