Monday, February 6, 2012

Reuse values in reducer

Inside reduce(key, Iterable<Type> value, context), the value.iterator() is a single reference.
So to reuse values, add them to a list structure for example

Credit goes to CornerCases

Friday, February 3, 2012

Too many fetch failures: A matter of threads HTTP see troubleshooting 101 page 25
Hadoop Troubleshooting 101
http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera

Thursday, February 2, 2012

If tasktrackers cannot come up on the slaves, check with ps -ef if there exist remaining processess started by mapred or hdfs.
Same goes for failed-to-start datanodes.

sudo pkill -u mapred

sudo pkill -u hdfs

Should slave datanodes are not up, check if namespace ids are compatible.

If not, delete /app/hadoop/tmo (or generally, dfs.tmp.dir) and format namenode again!