如何使这段代码线程安全?

问题描述 投票:1回答:1

此代码是方法中的一部分。代码使用两个for循环遍历两个列表。我想看看是否有可能使用多线程加速这两个循环的过程。我关心的是如何使其线程安全。

EDITTED:更完整的代码

static class Similarity {
        double similarity;
        String seedWord;
        String candidateWord;

        public Similarity(double similarity, String seedWord, String candidateWord) {
            this.similarity = similarity;
            this.seedWord = seedWord;
            this.candidateWord = candidateWord;
        }

        public double getSimilarity() {
            return similarity;
        }

        public String getSeedWord() {
            return seedWord;
        }

        public String getCandidateWord() {
            return candidateWord;
        }
    }

    static class SimilarityTask implements Callable<Similarity> {
        Word2Vec vectors;
        String seedWord;
        String candidateWord;
        Collection<String> label1;
        Collection<String> label2;

        public SimilarityTask(Word2Vec vectors, String seedWord, String candidateWord, Collection<String> label1, Collection<String> label2) {
            this.vectors = vectors;
            this.seedWord = seedWord;
            this.candidateWord = candidateWord;
            this.label1 = label1;
            this.label2 = label2;
        }

        @Override
        public Similarity call() {
            double similarity = cosineSimForSentence(vectors, label1, label2);
            return new Similarity(similarity, seedWord, candidateWord);
        }
    }

现在,这个'计算'线程安全吗?涉及3个变量:

1) vectors;
  2) toeknizerFactory;
  3) similarities;

public static void compute() throws Exception {

        File modelFile = new File("sim.bin");
        Word2Vec vectors = WordVectorSerializer.readWord2VecModel(modelFile);

        TokenizerFactory tokenizerFactory = new TokenizerFactory()

        List<String> seedList = loadSeeds();
        List<String> candidateList = loadCandidates();

        log.info("Computing similarity: ");

        ExecutorService POOL = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        List<Future<Similarity>> tasks = new ArrayList<>();
        int totalCount=0;
        for (String seed : seedList) {
            Collection<String> label1 = getTokens(seed.trim(), tokenizerFactory);
            if (label1.isEmpty()) {
                continue;
            }
            for (String candidate : candidateList) {
                Collection<String> label2 = getTokens(candidate.trim(), tokenizerFactory);
                if (label2.isEmpty()) {
                    continue;
                }
                Callable<Similarity> callable = new SimilarityTask(vectors, seed, candidate, label1, label2);
                tasks.add(POOL.submit(callable));
                log.info("TotalCount:" + (++totalCount));
            }
        }

        Map<String, Set<String>> similarities = new HashMap<>();
        int validCount = 0;
        for (Future<Similarity> task : tasks) {
            Similarity simi = task.get();
            Double similarity = simi.getSimilarity();
            String seedWord = simi.getSeedWord();
            String candidateWord = simi.getCandidateWord();

            Set<String> similarityWords = similarities.get(seedWord);
            if (similarity >= 0.85) {
                if (similarityWords == null) {
                    similarityWords = new HashSet<>();
                }
                similarityWords.add(candidateWord);
                log.info(seedWord + " " + similarity + " " + candidateWord);
                log.info("ValidCount: "  + (++validCount));
            }

            if (similarityWords != null) {
                similarities.put(seedWord, similarityWords);
            }
        }
}

添加了一个更相关的方法,call()方法使用它:

public static double cosineSimForSentence(Word2Vec vectors, Collection<String> label1, Collection<String> label2) {
        try {
            return Transforms.cosineSim(vectors.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
        } catch (Exception e) {
            log.warn("OOV: " + label1.toString() + " " + label2.toString());
            //e.getMessage();
            //e.printStackTrace();
            return 0.0;
        }
    }
java future executorservice
1个回答
0
投票

(针对更改的问题更新了答案。)

通常,您应该在尝试优化代码之前对代码进行概要分析,特别是在代码非常复杂的情况下。

对于线程,您需要确定线程之间共享哪个可变状态。理想情况下,在诉诸锁和并发数据结构之前尽可能多。一个线程中包含的可变状态不是问题。不可改变的是伟大的。

我假设没有任何传递给你的任务被修改。这很难说。 final在田野上是一个好主意。集合可以放在不可修改的包装器中,但这并不能阻止它们通过其他引用进行修改,现在可以在静态类型中显示它们。

假设你没有分解内部循环,唯一的共享可变状态似乎是similarities及其包含的值。

您可能会或可能不会发现您最终仍然连续做太多,并且需要将similarities更改为并发

    ConcurrentMap<String, Set<String>> similarities = new ConcurrentHashMap<>();

getputsimilarities需要是线程安全的。我建议总是创造Set

        Set<String> similarityWords = similarities.getOrDefault(seed, new HashSet<>());

要么

        Set<String> similarityWords = similarities.computeIfAbsent(seed, key -> new HashSet<>());

您可以使用线程安全的Set(例如使用Collections.synchronizedSet),但我建议为整个内循环保持相关的锁。

synchronized (similarityWords) {
    ...
}

如果你想懒洋洋地创造similarityWords那么它会“更有趣”。

© www.soinside.com 2019 - 2024. All rights reserved.