ConcurrentHashMap值未更新

Question

我正在尝试使用一组包含单词的文档来创建一个简单的多线程字典/索引。字典存储在带有字符串键和Vector值的ConcurrentHashMap中。对于字典中的每个单词，都有一个外观列表，该外观列表是带有一系列Tuple对象（自定义对象）的向量。（在我的情况下，Tuple是2个数字的组合）。

每个线程将一个文档作为输入，在其中找到所有单词，然后尝试更新ConcurrentHashMap。另外，我必须指出，2个线程可能会尝试通过添加其值（新的元组）来更新Map的相同键。我只对Vector进行写操作。

下面您将看到用于提交新线程的代码。如您所见，我将字典作为输入，该字典是带有字符串键和向量值的ConcurrentHashMap

public void run(Crawler crawler) throws InterruptedException {
        while (!crawler.getFinishedPages().isEmpty()) {
            this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources, 
                                                          crawler.getFinishedPages().take()));
        }
        this.INDEXING_SERVICE.shutdown();
}

下面您可以看到和索引线程的代码：

public class IndexingTask implements Runnable {

    private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
    private HtmlDocument document;

    public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
                        ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
        this.dictionary = dictionary;
        this.document = document;
        sources.putIfAbsent(document.getDocId(), document.getURL());
    }

    @Override
    public void run() {

        for (String word : document.getTerms()) {

            this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
                    .add(new Tuple(document.getDocId(), document.getWordFrequency(word)));

        }
    }
}

该代码似乎正确，但是字典未正确更新。我的意思是原始词典中缺少某些单词（键），而其他一些键在其Vector中的项目较少。

我已经进行了一些调试，我发现在终止线程实例之前，它已计算出正确的键和值。尽管线程中作为输入提供的原始词典（请看第一段代码）没有正确更新，您有任何想法或建议吗？

Answer 1

import java.util.Arrays;
import java.util.List;
import java.util.Vector;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

class Tuple {
    private Integer key;
    private String value;

    public Tuple(Integer key, String value) {
        this.key = key;
        this.value = value;
    }

    @Override
    public String toString() {
        return "(" + key + ", " + value + ")";
    }
}

class HtmlDocument {

    private int docId;
    private String URL;
    private List<String> terms;

    public int getDocId() {
        return docId;
    }

    public void setDocId(int docId) {
        this.docId = docId;
    }

    public String getURL() {
        return URL;
    }

    public void setURL(String URL) {
        this.URL = URL;
    }

    public List<String> getTerms() {
        return terms;
    }

    public void setTerms(List<String> terms) {
        this.terms = terms;
    }

    public String getWordFrequency(String word) {
        return "query";
    }
}

class IndexingTask implements Runnable {

    private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
    private HtmlDocument document;

    public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
                        ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
        this.dictionary = dictionary;
        this.document = document;
        sources.putIfAbsent(document.getDocId(), document.getURL());
    }

    @Override
    public void run() {

        for (String word : document.getTerms()) {

            this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
                    .add(new Tuple(document.getDocId(), document.getWordFrequency(word)));

        }
        Crawler.RUNNING_TASKS.decrementAndGet();
    }
}

class Crawler {

    protected BlockingQueue<HtmlDocument> finishedPages = new LinkedBlockingQueue<>();

    public static final AtomicInteger RUNNING_TASKS = new AtomicInteger();

    public BlockingQueue<HtmlDocument> getFinishedPages() {
        return finishedPages;
    }
}

public class ConcurrentHashMapExample {

    private ConcurrentHashMap<Integer, String> sources = new ConcurrentHashMap<>();
    private ConcurrentHashMap<String, Vector<Tuple>> dictionary = new ConcurrentHashMap<>();

    private static final ExecutorService INDEXING_SERVICE = Executors.newSingleThreadExecutor();

    public void run(Crawler crawler) throws InterruptedException {
        while (!crawler.getFinishedPages().isEmpty()) {
            Crawler.RUNNING_TASKS.incrementAndGet();
            this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources,
                    crawler.getFinishedPages().take()));
        }
        //when you call ```this.INDEXING_SERVICE.shutdown()``` may 'IndexingTask' has not run yet
        while (Crawler.RUNNING_TASKS.get() > 0)
            Thread.sleep(3);
        this.INDEXING_SERVICE.shutdown();
    }

    public ConcurrentHashMap<Integer, String> getSources() {
        return sources;
    }

    public ConcurrentHashMap<String, Vector<Tuple>> getDictionary() {
        return dictionary;
    }

    public static void main(String[] args) throws Exception {
        ConcurrentHashMapExample example = new ConcurrentHashMapExample();
        Crawler crawler = new Crawler();
        HtmlDocument document = new HtmlDocument();
        document.setDocId(1);
        document.setURL("http://127.0.0.1/abc");
        document.setTerms(Arrays.asList("hello", "world"));
        crawler.getFinishedPages().add(document);
        example.run(crawler);
        System.out.println("source: " + example.getSources());
        System.out.println("dictionary: " + example.getDictionary());
    }

}

输出：

source: {1=http://127.0.0.1/abc}
dictionary: {world=[(1, query)], hello=[(1, query)]}

我认为，在您的企业中，应该使用“生产者”，“消费者”设计模式

ConcurrentHashMap值未更新

问题描述投票：-1回答：1

1个回答

最新问题

ConcurrentHashMap值未更新

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1