Today in Manageable Monday we’re going to walk through the MapReduce process and how I could potentially use it for the CronDose website.
For the case study we’re going to imagine that my tutorial suggestion page has millions of content suggestions (not really, but let’s pretend).
It would take too long to look at each suggestion manually, so I’m going to use a mapreduce algorithm to analyze and organize the data.
I’d follow the steps below:
- Pass each suggestion to the map method to tokenize (convert each suggestion into an array of words)
- This map method will return a series of key/value elements (“algorithms”, suggestion_1, “Ajax”, suggestion_2, etc)
- The mapreduce framework would sort the returned values
- The reduce method would iterate over each of the values and tally up the popularity of each word.
So a final result could potentially be:
“Algorithms” -> 500 times
“OOP” -> 300
“Pagination” -> 200 times