我试图通过使用unordered_set从向量中删除重复项。但我的设计创建了一个unordered_set,不能正确维护顺序。在这个例子中,“z”不在最后。我究竟做错了什么?先感谢您。
编辑:对不起,如果我不清楚我在寻找什么。我希望输出为“e,d,a,b,c,z”我想保留原始排序但删除重复项。我目前使用大约3个不同的for循环和init向量的额外副本。我只是在寻找一个更清洁的STL功能。
产生的输出:e d a b c a a a b b b b z打印无序集合e d a z b c
#include <iostream>
#include <iterator>
#include <algorithm>
#include <string>
#include <unordered_set>
using namespace std;
int main() {
vector<string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for (vector<string>::iterator it = terminals.begin(); it != terminals.end(); it++) // print given vector
cout << *it << " ";
cout << endl;
unordered_set<string> newSet;
copy(terminals.begin(), terminals.end(), inserter(newSet, newSet.end()));
cout << "printing unordered set" << endl;
for (unordered_set<string>::iterator it = newSet.begin(); it != newSet.end(); it++)
cout << *it << " ";
cout << endl;
//system("pause");
return 0;
}
在内部,元素不按任何特定顺序排序,而是组织成桶。放置元素的哪个存储桶完全取决于其值的哈希值。这允许快速访问单个元素,因为一旦计算了散列,它就是指元素被放入的确切存储桶。
如果您需要订购独特的终端,请使用std::set:
#include <iostream>
#include <vector>
#include <string>
#include <set>
int main() {
std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for(const std::string& terminal : terminals) // print given vector
std::cout << terminal << " ";
std::cout << "\n";;
// populate the set directly from the vectors iterators:
std::set<std::string> newSet(terminals.begin(), terminals.end());;
std::cout << "printing the (ordered) set:" << "\n";;
for(const std::string& terminal : newSet)
std::cout << terminal << " ";
std::cout << "\n";;
}
如果要维护原始订单,则不能将任何一个设置用作最终存储,但可以使用std::unordered_set
作为缓存/黑名单,以获取已插入最终存储的值。
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <unordered_set>
int main() {
std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for(const std::string& terminal : terminals) // print given vector
std::cout << terminal << " ";
std::cout << "\n";;
std::vector<std::string> newSet; // not really a set anymore
std::unordered_set<std::string> cache; // blacklist
// try to insert all terminals and only when an insert is successful,
// put the terminal in newSet
std::for_each(terminals.begin(), terminals.end(),
[&](const std::string& terminal) {
auto [it, inserted] = cache.insert(terminal);
if(inserted)
newSet.push_back(terminal);
}
);
std::cout << "printing the vector of unique terminals:" << "\n";;
for(const std::string& terminal : newSet)
std::cout << terminal << " ";
std::cout << "\n";;
}
如果你想要原始的顺序并且不介意直接对原始的terminals
矢量进行更改,你可以使用std::remove_if
和unordered_set
结合使用,因为它不需要新的矢量。这是@Marek R答案的注释变体:
首先阅读:Erase–remove idiom
int main() {
std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for(const std::string& terminal : terminals) // print given vector
std::cout << terminal << " ";
std::cout << "\n";;
std::unordered_set<std::string> cache; // blacklist
// remove_if() moves all entries in your container, for which the
// UnaryPredicate(*) returns true, to the end of the container. It returns
// an iterator pointing to the first element in the vector that was
// moved - which is a suitable starting point for a subsequent erase().
//
// (*) UnaryPredicate: A callable that returns true or false given a single
// value.
// auto past_new_end = std::vector<std::string>::iterator past_new_end
auto past_new_end = std::remove_if(terminals.begin(), terminals.end(),
// this lambda is the UnaryPredicate
[&](const std::string& terminal) {
// insert returns a std::pair<Iterator, bool>
// where the bool (.second in the pair) is false
// if the value was not inserted (=it was already present)
return cache.insert(terminal).second == false;
}
);
std::cout << "display all the entries (now with unspecified values) "
"that will be erased:\n";
std::copy(past_new_end, terminals.end(),
std::ostream_iterator<std::string>(std::cout, "<"));
std::cout << "\n";
// erase all the moved entries
terminals.erase(past_new_end, terminals.end());
std::cout << "printing the unique terminals:" << "\n";;
for(const std::string& terminal : terminals)
std::cout << terminal << " ";
std::cout << "\n";;
}
如果您想维护原始订单,但强制执行唯一性,您可能希望:
如果你想要输出有序(所以,在你的例子中,输出将是“abcdez”),那么你可以在std::set
中插入项目,否则你可以使用std::sort
后跟std::unique
来获得每个独特元素中的一个在输入中。
看起来你想要使用(ordered) set。
编辑:实际上看起来就像你没有。一个std::vector
可以工作,但它可能不是最干净的解决方法。
您还可以使用unordered map,然后将项目存储为地图的键,将索引存储为该键的相应值。
我试图通过使用unordered_set从向量中删除重复项。
为什么你认为unordered_set
可以保留任何秩序?名称明确指出没有任何特定的顺序。
您应该仅使用unordered_set
来跟踪项目是否已经按顺序找到。基于此,您可以从序列中删除项目,所以这应该是这样的:
void removeDuplicates(Data &data)
{
std::unordered_set<std::string> foundItems;
auto newEnd = std::remove_if(data.begin(), data.end(), [&foundItems](const auto &s)
{
return !foundItems.insert(s).second;
});
data.erase(newEnd, data.end());
}