如何修复mapreduce中mapper的setup方法给出的字符串值的不规则行为？

Question

我是MapReduce的新手，并且正在学习设置方法的实现。配置给出的新字符串值正确打印，但是当我尝试进一步处理它时，字符串的初始值就会生效。我知道字符串是不可变的，但它应该提供当前指向其他方法的值。

public class EMapper extends Mapper<LongWritable, Text, Text, Text> {

    String wordstring = "abcd"; //initialized wordstring with "abcd"


    public void setup(Context context) {
        Configuration config = new Configuration(context.getConfiguration());
        wordstring = config.get("mapper.word"); // As string is immutable,
        // wordstring should now point to
        // value given by mapper.word
        //Here mapper.word="ankit" by 
        //using -D in hadoop command

    }

    String def = wordstring;
    String jkl = String.valueOf(wordstring); //tried to copy current value 
    //but 
    //string jkl prints the initial 
    /value.

    public void map(LongWritable key, Text value, Context context)
    throws InterruptedException, IOException {
        context.write(new Text("wordstring=" + wordstring + "   " + "def=" + 
                def),
            new Text("jkl=" + jkl));
    }
}


public class EDriver extends Configured implements Tool {

    private static Logger logger = LoggerFactory.getLogger(EDriver.class);


    public static void main(String[] args) throws Exception {
        logger.info("Driver started");

        int res = ToolRunner.run(new Configuration(), new EDriver(), args);
        System.exit(res);
    }

    public int run(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.printf("Usage: %s  needsarguments",
                getClass().getSimpleName());
            return -1;
        }
        Configuration conf = getConf();
        Job job = new Job(conf);
        job.setJarByClass(EDriver.class);
        job.setJobName("E Record Reader");

        job.setMapperClass(EMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setReducerClass(EReducer.class);
        job.setNumReduceTasks(0);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setInputFormatClass(ExcelInputFormat.class);

        return job.waitForCompletion(true) ? 0 : 1;
    }

}

我期待输出

   wordstring=ankit   def=ankit   jkl=ankit

实际输出是

   wordstring=ankit   def=abcd    jkl=abcd

Answer 1

这与字符串的可变性无关，而与代码执行顺序有关。

只有在执行任何类级别命令后才会调用setup方法。您编写代码的顺序不会改变任何内容。如果您按照实际执行的顺序重新编写代码的顶部，则可以：

public class EMapper extends Mapper<LongWritable, Text, Text, Text> {
    String wordstring = "abcd";
    String jkl = String.valueOf(wordstring);

    public void setup(Context context) {
        Configuration config = new Configuration(context.getConfiguration());
        wordstring = config.get("mapper.word"); //By the time this is called, jkl has already been assigned to "abcd"
    }

因此，jkl仍然是abcd并不奇怪。您应该在jkl方法中设置setup，如下所示：

public class EMapper extends Mapper<LongWritable, Text, Text, Text> {
    String wordstring;
    String jkl;

    public void setup(Context context) {
        Configuration config = new Configuration(context.getConfiguration());
        wordstring = config.get("mapper.word");
        jkl = wordstring;
        //Here, jkl and wordstring are both different variables pointing to "ankit"
    }

    //Here, jkl and wordstring are null, as setup(Context context) has not yet run

    public void map(LongWritable key, Text value, Context context)
        throws InterruptedException, IOException {
        //Here, jkl and wordstring are both different variables pointing to "ankit"
        context.write(new Text("wordstring=" + wordstring),
            new Text("jkl=" + jkl));
    }

当然你实际上并不需要jkl，你可以直接使用wordstring。

Answer 2

问题已经解决了。实际上，我在分布式模式下运行Hadoop，其中SETUP，MAPPER，REDUCER和CLEANUP在不同的JVM上运行。因此，数据无法直接从SETUP传输到MAPPER。第一个wordstring对象在mapper中被初始化为“abcd”。我试图改变SETUP中的wordstring（创建了wordstring的另一个对象），这实际上是在另一个JVM中发生的。所以，当我试图复制jkl中的“wordstring”时

String jkl = String.valueOf（wordstring）;

wordstring的第一个值（由mapper创建并初始化为“abcd”）被复制到jkl。

如果我在独立模式下运行Hadoop，它将使用单个JVM，并且SETUP给予wordstring的值将被复制到jkl。

因此，jkl将wordstring的副本初始化为“abcd”而不是SETUP给出的副本。

我用了

HashMap map = new HashMap（）;

在SETUP到MAPPER之间传输数据，然后jkl获得了SETUP的字符串给出的值的副本。

如何修复mapreduce中mapper的setup方法给出的字符串值的不规则行为？

问题描述投票：0回答：2

2个回答

最新问题

如何修复mapreduce中mapper的setup方法给出的字符串值的不规则行为？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2