Hadoop Streaming in F# and MapReduce (summary)
With all my recent posts around Hadoop Streaming I thought it would be useful to summarize them into a single post. The main objective of these posts was to put together a codebase to enable F# developers to write Map/Reduce libraries through a simple API.
The full code posting can be found here: https://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850
The idea was to provide reusable code such that one only needed to be concerned with implementing the Map/Reduce code with the following function prototypes:
For Text Streaming:
Map : string > (string * obj) option
Reduce : string -> seq<string> > obj option
For Binary Streaming:
Map : WordprocessingDocument -> seq<string * obj>)
Map : PdfReader -> seq<string * obj>)
Reduce: string -> seq<string> -> obj option
For XML Streaming:
Map : XElement-> seq<(string * string) * obj>)
Reduce : string * string -> seq<string> -> obj option
So here is the full posting summary:
Hadoop Streaming and F# MapReduce
Using Hadoop on Azure JS Console for Data Visualizations
MapReduce Tester: A Quick Word
Hadoop Binary Streaming and F# MapReduce
Hadoop Binary Streaming and PDF File Inclusion
Hadoop Streaming and Reporting
Hadoop Streaming and Windows Azure Blob Storage
Hadoop XML Streaming and F# MapReduce
Look out for more Hadoop posts in the coming months.