我有一个充满文件中单词的向量,我必须输出文件中出现的第三个最常见的单词。我无法使用映射。我一直得到错误的输出,它不输出第三个单词。
void getThirdMostOften(vector<string>& v){
int counter=0;
int maxCounter=0;
string mostOftenWord="";
for(int i=0;i<v.size();i++){
counter=0;
for(int k=0;k<v.size();k++){
if(v.at(k)==v.at(i)){
counter++;
}
}
if(counter>maxCounter){
maxCounter=counter;
mostOftenWord=v.at(i);
}
}
sort(v.begin(), v.end(), greater<string>());
cout << v.at(2) << endl;
}
你做错了两件事:
sort(v.begin(), v.end(), greater<string>())
按字典顺序对向量进行排序,这不考虑频率。考虑使用实际的地图而不是成对的向量:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
void getThirdMostOften(const std::vector<std::string>& v) {
std::unordered_map<std::string, int> freq_map;
// Count frequency in O(n)
for (const auto& word : v) {
++freq_map[word];
}
// Check for sufficient unique words
if (freq_map.size() < 3) {
std::cout << "Less than 3 unique words\n";
return;
}
// Transfer to vector
std::vector<std::pair<std::string, int>> freq_vec(freq_map.begin(), freq_map.end());
// Partial sort to get top 3 in O(n)
std::partial_sort(
freq_vec.begin(),
std::next(freq_vec.begin(), 3),
freq_vec.end(),
[](const auto& a, const auto& b) {
return a.second > b.second;
}
);
// Output third most frequent, no guarantee which in case of ties.
std::cout << freq_vec[2].first << '\n';
}
int main() {
std::vector<std::string> words = {"apple", "banana", "apple", "orange", "banana", "apple"};
getThirdMostOften(words);
return 0;
}
对于小型数据集,由于更好的缓存局部性和更低的常数因子,
unordered_map
中的哈希表操作的开销实际上可能会使简单的基于向量的方法更快,在这种情况下,您可以使用std::nth_element
:
#include <iostream>
#include <vector>
#include <algorithm>
void getThirdMostOften(const std::vector<std::string>& v) {
std::vector<std::pair<int, std::string>> freq_vec;
// Count frequency in O(n^2), okay for small n
for (const auto& word : v) {
long count = std::count(v.begin(), v.end(), word);
freq_vec.emplace_back(-count, word);
}
// Unique and sort
std::sort(freq_vec.begin(), freq_vec.end());
freq_vec.erase(std::unique(freq_vec.begin(), freq_vec.end()), freq_vec.end());
// Check for sufficient unique words
if (freq_vec.size() < 3) {
std::cout << "Less than 3 unique words\n";
return;
}
// Output third most frequent, lexicographically breaking ties.
std::cout << freq_vec[2].second << '\n';
}
int main() {
std::vector<std::string> words = {"apple", "banana", "apple", "cherry", "banana", "apple", "orange"};
getThirdMostOften(words);
return 0;
}