custom INputFormat, hadoop

Question

Please help, I have the following sample data:

-21.33,45.677,1234,1245,1267,1290,1212,1111,10000,1902
-21.34,45.677,1264,1645,1266,1260,1612,1611,16000,1602
-21.35,45.677,1244,1445,1467,1240,1242,1211,11000,1912
-21.36,45.677,1231,1215,1217,1210,1212,1111,10010,1902

I want that my hadoop mapreduce code should consider the first two float entries as the key(-21.33,45.677) and the remaining integer entries as value (1234,1245,1267,1290,1212,1111,10000,1902).

I am not sure if it can be done with the existing FileInputFormats. So how shall I go about this knowing that the value should be used as array not text.

Also how should I change the inputSplit such that I am able to get multiple records at the same time in the map for computation.

And please don't re-post duplicates - stackoverflow.com/questions/11689972/… — Chris White, Commented Jul 27, 2012 at 22:05

Niels Basjes · Accepted Answer · 2012-07-28 14:11:29Z

1

The easiest way is to use the TextInputFormat and have your mapper make the split between the key and the value. The output key and value of your mapper could then both be Text.

answered Jul 28, 2012 at 14:11

Niels Basjes

10.7k9 gold badges53 silver badges71 bronze badges

I was hesitating to use this because it will be an extra computation. So I was hoping for some custom data type which would read from my input file directly as a array and I can just write the mapper for that.
– ayush singhal
Commented Jul 30, 2012 at 17:55

Add a comment |

Chris White · Accepted Answer · 2012-07-27 22:04:20Z

0

Any reason why you can't just use TextInputFormat's <LongWritable, Text> input types, and performs the extraction and transformation accordingly?

If that really isn't acceptable, then consider using the ChainMapper - use one map to do the extraction and then pass those results to another mapperthat's expecting the key/values required.

answered Jul 27, 2012 at 22:04

Chris White

30.1k4 gold badges75 silver badges96 bronze badges

Add a comment |

DebD · Accepted Answer · 2013-12-03 06:54:14Z

0

The easiest way is to split the record with the delimiter as ','. Then in your mapper just take the first two values and append them to make the key. You have to use Text because you need one value corresponding to your Key. Some computations will be required to convert the key back to numerical values.

answered Dec 3, 2013 at 6:54

DebD

3861 gold badge4 silver badges19 bronze badges

Add a comment |

Collectives™ on Stack Overflow

custom INputFormat, hadoop

3 Answers 3

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Linked

Related