Map Reduce

How Does MapReduce Fit into My Organization and Advanced Analytics?

MapReduce can be used in a variety of ways to bring efficiency into the ways you are processing data within your organization and drive-down costs significantly over conventional data processing technologies.
Depending on your specific data analytics needs, you can leverage MapReduce or SQL-MapReduce in multiple ways and derive deep insights on your data through queries that were previously difficult or impossible to express.
Check out some of these examples of how companies are using SQL-MapReduce effectively to improve their ROI and derive deep insights on their entire database:
  • Fraud Detection – A large online gaming company catches cases of fraud that previous queries could not detect. And the company reduced its fraud analytics cycle time from one week to 15 minutes, with query response dropping from 90 minutes to 90 seconds.
  • Graph Analysis – A social media company uses the SQL-MapReduce function nPath for graph analysis to understand how its users are connected and enhance the networks of its community.
  • Sharing Behavior – ShareThis uses MapReduce to reduce query times as it analyzes the items that people share online to understand sharing behavior.
  • Sessionization – A social network uses the SQL-MapReduce function "sessionize" to break user data into sessions based on the length of time between activity on the network. With sessionize, the SQL code dropped from more than 1000 lines to less than 100 and performance improved dramatically.
  • Search Behavior – An online media company uses the SQL-MapReduce function nPath to better understand the paths its users follow after conducting a search to improve search results.
  • Transformations – Where data transformations previously required multiple complex self joins, a media company now uses the SQL-MapReduce function nPath to make a single pass of its data, significantly simplifying the code and improving performance.
  • Machine Learning – Research show that algorithms that fit the Statistical Query model can be written in a certain "summation form," which allows them to be easily parallelized on multicore computers. The researchers adapt Google’s map-reduce paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression (LR), naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis (GDA), EM, and backpropagation (NN).

Related Posts Plugin for WordPress, Blogger...