Monday, 4 January 2016

Running sample examples on YARN

Running the available sample MapReduce programs is a simple task with YARN. The Hadoop version ships with some basic MapReduce examples. 

You can find them inside $HADOOP_HOME/share/Hadoop/mapreduce/Hadoop-mapreduce-examples-<HADOOP_VERSION>.jar . 

The location of the file may differ depending on your Hadoop
installation folder structure.

Let’s include this in the  YARN_EXAMPLES  path:
$export YARN_EXAMPLES=$HADOOP_HOME/share/Hadoop/mapreduce

Now, we have all the sample examples in the YARN_EXAMPLES environmental variable. You can access all the examples using this variable; to list all the available examples, try typing the following command on the console:

$ yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar

An example program must be given as the first argument.

The valid program names are as follows:

  • aggregatewordcount : This is an aggregate-based map/reduce program that counts the words in the input files
  • aggregatewordhist : This is an aggregate-based map/reduce program that computes the histogram of the words in the input files
  • bbp : This is a map/reduce program that uses Bailey-Borwein-Plouffe to compute the exact digits of Pi
  • dbcount : This is an example job that counts the page view counts from a database
  • distbbp : This is a map/reduce program that uses a BBP-type formula to compute the exact bits of Pi
  • grep : This is a map/reduce program that counts the matches of a regex in the input
  • join : This is a job that affects a join over sorted, equally-partitioned datasets
  • multifilewc : This is a job that counts words from several files
  • pentomino : This is a map/reduce tile that lays a program to find solutions to pentomino problems
  • pi : This is a map/reduce program that estimates Pi using a quasi-Monte Carlo method
  • randomtextwriter : This is a map/reduce program that writes 10 GB of random textual data per node
  • randomwriter : This is a map/reduce program that writes 10 GB of random data per node
  • secondarysort : This is an example that defines a secondary sort to the reduce
  • sort : This is a map/reduce program that sorts the data written by the random writer
  • sudoku : This is a sudoku solver
  • teragen : This generates data for the terasort
  • terasort : This runs the terasort
  • teravalidate : This checks the results of terasort
  • wordcount : This is a map/reduce program that counts the words in the input files
  • wordmean : This is a map/reduce program that counts the average length of the words in the input files
  • wordmedian : This is a map/reduce program that counts the median length of the words in the input files
  • wordstandarddeviation : This is a map/reduce program that counts the standard deviation of the length of the words in the input files


These were the sample examples that come as part of the YARN distribution by default. 


Related Posts Plugin for WordPress, Blogger...