Found in 1 comment on Hacker News
ahalan · 2011-12-17 · Original thread
MapReduce is applicable wherever you can partition the data and process each part independently of others.

I used Hadoop/Hbase for EEG time series analysis, looking for certain oscillation patterns (basically classic time-series classification) and it was an embarrassingly parallel problem:


1. Partition the data into fixed segments (either temporal, say 1hr chunks or location based, say 10x10 blocks of pixels). Alternatively you can use a 'sliding window' and extract features as you go. In some cases you can use symbolic representation/piecewise approximation to reduce dimensionality, as in iSax: , "sketches" as described here: or some other time-series segmentation techniques:

2. Extract features for each segment (either linear statistics/moments or non-linear signatures: ). The most difficult part here has nothing to do with MapReduce but decide which features carry the most information. I found ID3 criterion helpful:, also see and,33&...


3. Aggregate the results into a hash-table where the keys are segment' signatures/features/fingerprints, and the values are arrays of pointers to corresponding segments (Based on the size this table can either sit on a single machine, of be distributed on multiple hdfs nodes)

Essentially you do time-series clustering at the Reduce stage with each 'basket' in a hash-table containing a group of similar segments. It can be used as an index for similarity or range searches (for fast in-memory retrieval you can use HBase which sits on top of HDFS). You can also have multiple indices for different feature sets.


The hard part is problem decomposition, i.e. dividing work into independent units, replacing one big nested loop/sigma on the entire dataset with smaller loops that can run in parallel on parts of the dataset, when you've done that, MapReduce is just a natural way to execute the job and aggregate the results.

Fresh book recommendations delivered straight to your inbox every Thursday.