如何对结构的向量进行二值搜索并在适当的索引处插入

问题描述 投票:0回答:1

我想做什么:

我想要一个排序的文本表(大小相等),该表显示每个文本的出现次数。在将文本插入表中时对表进行排序。将单词插入表中时,将检查它们是否已经存在,在这种情况下,它们的ref_count会增加。如果它们是新的,则将它们的[[ref_count设置为1,并插入到正确索引处的向量中,以确保该表仍在排序。

我做了什么:

我创建了一个结构和结构的矢量,定义如下。然后,我使用所示的二进制搜索来识别使用

std :: insert()

函数的适当索引。

我的问题:

我的二进制搜索实现未返回正确的索引位置。 #define WORD_LENGTH 6 typedef struct RC_table { char word[WORD_LENGTH+1];//+1 for ‘\0’ unsigned int ref_count; }RC_table; std::vector<RC_table>RC; void update_RC(char *word_to_insert) { int index = 0; bool found=binary_search_RC(word_to_insert, &index); if (found == TRUE) { //increment reference count RC[index].ref_count++; } else { //insert new word at index RC_table new_entry; memcpy(new_entry.word, word_to_insert, WORD_LENGTH); new_entry.word[WORD_LENGTH] = '\0'; new_entry.ref_count = 1; if(index==0) RC.insert(RC.begin(),new_entry); else if(index==RC.size()-1) RC.insert(RC.end(),new_entry); else { RC.insert(RC.begin() + index +1, new_entry); } } } bool binary_search_RC(char *word, int *index) { int left = 0; int right = RC.size() -1; int middle = (left + right) / 2; bool found = false; while (left<=right) { middle = (left + right) / 2; *index = middle; if (strncmp(word, RC[middle].word, WORD_LENGTH) < 0) { right = middle - 1; } else if(strncmp(word, RC[middle].word, WORD_LENGTH) > 0) { left = middle + 1; } else { found = true; break; } } *index = middle; return found; }

编辑:

我尝试使用lower_bound()。它仍然没有给出预期的输出(即排序表)。typedef struct RC_table { char word[WORD_LENGTH+1];//+1 for ‘\0’ unsigned int ref_count; bool operator<(const RC_table&r){ return word<r.word; } }RC_table;
使用以下方法插入表格:

auto itr=lower_bound(RC.begin(),RC.end(),new_enry); RC.insert(itr,new_entry);

c++ vector binary-search insert-update
1个回答
0
投票
算法的最大问题是,您正在尝试使用您认为可搜索的元素对分区的两侧进行隔离。那不是你这样做的方式;在最坏的情况下,分区最终可能(并且将会)陷入1/2的重复整数除法运算。当您执行此操作时,数学运算就变得容易得多:

    分区的左端引用第一个元素,并被视为可疑元素。
  • 分区的右端是指过去的元素。并且不被视为可疑。
  • 结果是一种更简单的算法,更易于理解和维护。

    bool binary_search_RC(const char *word, int *index) { *index = 0; int left = 0; int right = static_cast<int>(RC.size()); bool found = false; while (!found && left < right) { int middle = *index = left + (right-left) / 2; int res = strncmp(word, RC[middle].word, WORD_LENGTH); if (res < 0) right = middle; else if (res > 0) *index = left = middle+1; else found = true; } return found; }

    付诸实践,一个简单的小测试工具,可以从一个简单的三个字母的字母表中生成随机的三个字符的字符串。那应该呈现出许多独特的插入物,并最终带来大量发现。最后,我们将打印整个表格,如果可以的话,最好将其排序。

    代码

    #include <iostream> #include <vector> #include <random> #define WORD_LENGTH 6 typedef struct RC_table { char word[WORD_LENGTH+1];//+1 for ‘\0’ unsigned int ref_count; } RC_table; std::vector<RC_table>RC; bool binary_search_RC(const char *word, int *index) { *index = 0; int left = 0; int right = static_cast<int>(RC.size()); bool found = false; while (!found && left < right) { int middle = *index = left + (right-left) / 2; int res = strncmp(word, RC[middle].word, WORD_LENGTH); if (res < 0) right = middle; else if (res > 0) *index = left = middle+1; else found = true; } return found; } void update_RC(const char *word_to_insert) { int index = 0; bool found = binary_search_RC(word_to_insert, &index); if (found) { ++RC[index].ref_count; std::cout << "Found entry for " << word_to_insert; std::cout << " (" << RC[index].ref_count << ")\n"; } else { std::cout << "Adding entry for " << word_to_insert << '\n'; //insert new word at index RC_table new_entry; strncpy(new_entry.word, word_to_insert, WORD_LENGTH); new_entry.word[WORD_LENGTH] = 0; new_entry.ref_count = 1; if(index==0) RC.insert(RC.begin(),new_entry); else if(index == RC.size()) RC.insert(RC.end(),new_entry); else RC.insert(RC.begin() + index, new_entry); } } int main() { // generate some random values and start adding them. a few dozen // with a severely limited alphabet should suffice. const char alphabet[] = "abc"; std::mt19937 rng{ std::random_device{}() }; std::uniform_int_distribution<std::size_t> dist(0, sizeof alphabet - 2); for (int i=0; i<50; ++i) { char word[WORD_LENGTH+1] = {}; for (int j=0; j<3; ++j) word[j] = alphabet[ dist(rng) ]; update_RC(word); } // print the table for (auto const& x : RC) std::cout << x.word << " : " << x.ref_count << '\n'; }

    输出(显然是变化的)

    Adding entry for cab Adding entry for cac Adding entry for bcc Adding entry for bbb Adding entry for cbc Adding entry for abb Found entry for cab (2) Adding entry for aba Adding entry for cca Adding entry for acc Found entry for aba (2) Found entry for bcc (2) Adding entry for cbb Found entry for cac (2) Found entry for cac (3) Adding entry for aaa Found entry for acc (2) Adding entry for bbc Adding entry for baa Adding entry for acb Found entry for aaa (2) Found entry for cca (2) Found entry for baa (2) Found entry for cbb (2) Adding entry for aac Found entry for cac (4) Adding entry for aca Adding entry for ccc Found entry for bbc (2) Adding entry for bba Adding entry for bac Adding entry for aab Found entry for bac (2) Found entry for aca (2) Found entry for bcc (3) Adding entry for caa Found entry for aaa (3) Found entry for bbc (3) Found entry for caa (2) Found entry for abb (2) Found entry for baa (3) Found entry for acc (3) Found entry for bba (2) Found entry for bbb (2) Found entry for cbc (2) Found entry for aaa (4) Found entry for baa (4) Adding entry for cba Found entry for bac (3) Found entry for bbc (4) aaa : 4 aab : 1 aac : 1 aba : 2 abb : 2 aca : 2 acb : 1 acc : 3 baa : 4 bac : 3 bba : 2 bbb : 2 bbc : 4 bcc : 3 caa : 2 cab : 2 cac : 4 cba : 1 cbb : 2 cbc : 2 cca : 2 ccc : 1
    我没有理会数学,而是将这些引用计数加起来,您应该发现它们的总和为50,即我们执行的插入次数。

    祝你好运。

  • © www.soinside.com 2019 - 2024. All rights reserved.