Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am using DistributedCache. But there are no files in the cache after execution of code. I have referred to other similar questions but the answers does not solve my issue.

Please find the code below:

   Configuration conf = new Configuration();
   Job job1 = new Job(conf, "distributed cache");
   Configuration conf1 = job1.getConfiguration();
   DistributedCache.addCacheFile(new Path("File").toUri(), conf1);
   System.out.println("distributed cache file "+DistributedCache.getLocalCacheFiles(conf1));

This gives null..

The same thing when given inside mapper also gives null hence. Please let me know your suggestions.

Thanks

share|improve this question
Does the file File exist in HDFS? Also the final call to getLocalCacheFiles will not work in your driver code (but should for your mapper - i'm assume you are only showing this line as an example). Find your job's job.xml in the job tracker web UI and post back the value of mapred.cache.files – Chris White 9 hours ago

3 Answers

I believe this is (at least partly) due to what Chris White wrote here:

After you create your Job object, you need to pull back the Configuration object as Job makes a copy of it, and configuring values in conf2 after you create the job will have no effect on the job iteself. Try this:

job = new Job(new Configuration());
Configuration conf2 = job.getConfiguration();
job.setJobName("Join with Cache");
DistributedCache.addCacheFile(new URI("hdfs://server:port/FilePath/part-r-00000"), conf2);

I guess if it still does not work, there is another problem somewhere, but that doesn't mean that Chris White's point is not correct.

share|improve this answer
Yes.. I have gone through these answers. I did not mean that the answers are not correct. I am still facing the issue even after trying these things. So requesting you to help incase there is any other point am missing out regarding DistributedCache – Neethu Prem 11 hours ago

You might want to try -files option which is much simpler.See my answer here:

Distributed Caching in Hadoop File Not Found Exception

share|improve this answer

When distributing, don't forget the local link name, preferably using a relative path:

URI is of the form hdfs://host:port/absolute-path#local-link-name

When reading:

  • if you don't use distributed cache possibilities, you are supposed to use HDFS's FileSystem to access the hdfs://host:port/absolute-path
  • if you use the distributed cache, then you have to use standard Java file utilities to access the local-link-name
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.