Ruby：如何在数组中查找并返回重复的值？

Question

[arr是字符串数组，例如：["hello", "world", "stack", "overflow", "hello", "again"]。

检查arr是否存在重复项的简便而优雅的方法，如果是，则返回其中一个（无论哪个）。

示例：

["A", "B", "C", "B", "A"]    # => "A" or "B"
["A", "B", "C"]              # => nil

Answer 1

a = ["A", "B", "C", "B", "A"]
a.detect{ |e| a.count(e) > 1 }

更新

我知道这不是很好的答案，但我喜欢。这是一个漂亮的班轮代码。除非您需要处理海量数据集，否则它工作得很好。

正在寻找更快的解决方案？你去了！

def find_one_using_hash_map(array)
  map = {}
  dup = nil
  array.each do |v|
    map[v] = (map[v] || 0 ) + 1

    if map[v] > 1
      dup = v
      break
    end
  end

  return dup
end

线性，O（n），但是现在需要管理多个LOC，需要测试用例和东西！

如果您需要更快的解决方案，请尝试使用C代替：）

这是gits比较不同的解决方案：https://gist.github.com/naveed-ahmad/8f0b926ffccf5fbd206a1cc58ce9743e

Answer 2

[class ActiveRecordClass < ActiveRecord::Base #has two columns, a primary key (id) and an email_address (string) end ActiveRecordClass.group(:email_address).having("count(*) > 1").count.keys返回一个find_all()，其中包含array不是enum的所有元素。

Answer 3

5
投票

uniq

Answer 4

这是我对大量数据的处理-例如用于查找重复部分的旧式dBase表

Answer 5

a.each_with_object(Hash.new(0).merge dup: []){|x,h| h[:dup] << x if (h[x] += 1) == 2}[:dup]

a.inject(Hash.new(0).merge dup: []){|h,x| h[:dup] << x if (h[x] += 1) == 2;h}[:dup]

Answer 6

# Assuming ps is an array of 20000 part numbers & we want to find duplicates # actually had to it recently. # having a result hash with part number and number of times part is # duplicated is much more convenient in the real world application # Takes about 6 seconds to run on my data set # - not too bad for an export script handling 20000 parts h = {}; # or for readability h = {} # result hash ps.select{ |e| ct = ps.count(e) h[e] = ct if ct > 1 }; nil # so that the huge result of select doesn't print in the console是你的朋友！

Answer 7

如果比较两个不同的数组（而不是将其与自身比较），一种非常快的方法是使用each_with_object提供的相交运算符input = [:bla,:blubb,:bleh,:bla,:bleh,:bla,:blubb,:brrr] # to get the counts of the elements in the array: > input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1} => {:bla=>3, :blubb=>2, :bleh=>2, :brrr=>1} # to get only the counts of the non-unique elements in the array: > input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1}.reject{|k,v| v < 2} => {:bla=>3, :blubb=>2, :bleh=>2}。

Answer 8

0
投票

Ruby's Array class

Answer 9

我需要找出有多少重复项以及它们是什么，所以我编写了一个功能，该功能是基于Naveed先前发布的内容构建的：

Answer 10

0
投票

 d
=> ["A", "B", "C"]

Answer 11

def print_duplicates(array) puts "Array count: #{array.count}" map = {} total_dups = 0 array.each do |v| map[v] = (map[v] || 0 ) + 1 end map.each do |k, v| if v != 1 puts "#{k} appears #{v} times" total_dups += 1 end end puts "Total items that are duplicated: #{total_dups}" enddef duplication given_array duplicate_array = [] given_array.each_with_index do |num, index| 0.upto(given_array.length) do |ind| unless (ind) == index if (given_array[ind] == given_array[index]) && !duplicate_array.include?(given_array[ind]) duplicate_array << given_array[ind] end end end end duplicate_array end

Answer 12

您可以通过几种方式来实现，第一种选择是最快的：

ary = ["A", "B", "C", "B", "A"]

ary.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first)

ary.sort.chunk{ |e| e }.select { |e, chunk| chunk.size > 1 }.map(&:first)

和O（N ^ 2）选项（即效率较低）：

ary.select{ |e| ary.count(e) > 1 }.uniq

Answer 13

[仅找到对象的索引（从左数起）不等于对象的索引（从右数起）的第一个实例。

arr.detect {|e| arr.rindex(e) != arr.index(e) }

如果没有重复项，则返回值为零。

我相信这也是到目前为止在线程中发布的最快的解决方案，因为它不依赖于创建其他对象，并且#index和#rindex是用C实现的。big-O运行时是N ^ 2，因此比Sergio慢，但是由于“慢速”部分在C中运行，因此挂墙时间可能快得多。

Answer 14

detect仅找到一个副本。 find_all将找到所有这些文件：

a = ["A", "B", "C", "B", "A"]
a.find_all { |e| a.count(e) > 1 }

Answer 15

这里有两种查找重复项的方法。

使用一组

require 'set'

def find_a_dup_using_set(arr)
  s = Set.new
  arr.find { |e| !s.add?(e) }
end

find_a_dup_using_set arr
  #=> "hello"

使用select代替find以返回所有重复的数组。

使用Array#difference

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

def find_a_dup_using_difference(arr)
  arr.difference(arr.uniq).first
end

find_a_dup_using_difference arr
  #=> "hello"

拖放.first返回所有重复的数组。

如果没有重复，这两个方法都将返回nil。

I proposed that Array#difference被添加到Ruby核心。更多信息在我的答案Array#difference中。

基准

让我们比较建议的方法。首先，我们需要一个数组进行测试：

here

以及一种为不同测试阵列运行基准的方法：

CAPS = ('AAA'..'ZZZ').to_a.first(10_000)
def test_array(nelements, ndups)
  arr = CAPS[0, nelements-ndups]
  arr = arr.concat(arr[0,ndups]).shuffle
end

我没有包含@JjP的答案，因为仅返回一个重复项，并且修改他/她的答案以使其与@Naveed的先前答案相同。我也没有包括@Marin的答案，该答案虽然发布在@Naveed的答案之前，但返回的是所有重复项，而不仅仅是一个（略有一点，但是没有一点对两者进行评估，因为当返回一个重复项时它们是相同的。）

[我还修改了其他答案，这些答案返回所有重复项，只返回找到的第一个重复项，但这对性能基本上没有影响，因为他们在选择一个重复项之前计算了所有重复项。

每个基准的结果从最快到最慢列出：

首先假设数组包含100个元素：

require 'fruity'

def benchmark(nelements, ndups)
  arr = test_array nelements, ndups
  puts "\n#{ndups} duplicates\n"    
  compare(
    Naveed:    -> {arr.detect{|e| arr.count(e) > 1}},
    Sergio:    -> {(arr.inject(Hash.new(0)) {|h,e| h[e] += 1; h}.find {|k,v| v > 1} ||
                     [nil]).first },
    Ryan:      -> {(arr.group_by{|e| e}.find {|k,v| v.size > 1} ||
                     [nil]).first},
    Chris:     -> {arr.detect {|e| arr.rindex(e) != arr.index(e)} },
    Cary_set:  -> {find_a_dup_using_set(arr)},
    Cary_diff: -> {find_a_dup_using_difference(arr)}
  )
end

现在考虑一个具有10,000个元素的数组：

benchmark(100, 0)
0 duplicates
Running each test 64 times. Test will take about 2 seconds.
Cary_set is similar to Cary_diff
Cary_diff is similar to Ryan
Ryan is similar to Sergio
Sergio is faster than Chris by 4x ± 1.0
Chris is faster than Naveed by 2x ± 1.0

benchmark(100, 1)
1 duplicates
Running each test 128 times. Test will take about 2 seconds.
Cary_set is similar to Cary_diff
Cary_diff is faster than Ryan by 2x ± 1.0
Ryan is similar to Sergio
Sergio is faster than Chris by 2x ± 1.0
Chris is faster than Naveed by 2x ± 1.0

benchmark(100, 10)
10 duplicates
Running each test 1024 times. Test will take about 3 seconds.
Chris is faster than Naveed by 2x ± 1.0
Naveed is faster than Cary_diff by 2x ± 1.0 (results differ: AAC vs AAF)
Cary_diff is similar to Cary_set
Cary_set is faster than Sergio by 3x ± 1.0 (results differ: AAF vs AAC)
Sergio is similar to Ryan

请注意，如果benchmark(10000, 0) 0 duplicates Running each test once. Test will take about 4 minutes. Ryan is similar to Sergio Sergio is similar to Cary_set Cary_set is similar to Cary_diff Cary_diff is faster than Chris by 400x ± 100.0 Chris is faster than Naveed by 3x ± 0.1 benchmark(10000, 1) 1 duplicates Running each test once. Test will take about 1 second. Cary_set is similar to Cary_diff Cary_diff is similar to Sergio Sergio is similar to Ryan Ryan is faster than Chris by 2x ± 1.0 Chris is faster than Naveed by 2x ± 1.0 benchmark(10000, 10) 10 duplicates Running each test once. Test will take about 11 seconds. Cary_set is similar to Cary_diff Cary_diff is faster than Sergio by 3x ± 1.0 (results differ: AAE vs AAA) Sergio is similar to Ryan Ryan is faster than Chris by 20x ± 10.0 Chris is faster than Naveed by 3x ± 1.0 benchmark(10000, 100) 100 duplicates Cary_set is similar to Cary_diff Cary_diff is faster than Sergio by 11x ± 10.0 (results differ: ADG vs ACL) Sergio is similar to Ryan Ryan is similar to Chris Chris is faster than Naveed by 3x ± 1.0是用C实现的，则find_a_dup_using_difference(arr)的效率要高得多；如果将其添加到Ruby核心中，情况将会更有效。

结论

许多答案是合理的，但是使用一组是最佳的选择

。在中等难度的情况下，它是最快的；在难度最大的情况下，它是最快的，并且仅在计算上不重要的情况下-当您的选择仍然无关紧要时-可以被击败。

您可能会选择Chris的解决方案的一种非常特殊的情况是，如果您想使用该方法分别对数千个小型阵列进行重复数据消除，并希望找到通常少于10个项目的重复数据。速度更快，因为它避免了创建Set的少量额外开销。

Answer 16

16
投票

A，大多数答案是Array#difference。

Answer 17

14
投票

Ruby数组对象有一个很棒的方法O(n)。

Answer 18

7
投票

类似的事情会起作用

Answer 19

[我知道这个线程是专门针对Ruby的，但是我登陆这里寻找的是如何在Ruby on Rails中使用ActiveRecord来实现这一点，并认为我也将分享我的解决方案。

Ruby：如何在数组中查找并返回重复的值？

问题描述投票：154回答：19

19个回答

更新

最新问题

Ruby：如何在数组中查找并返回重复的值？

问题描述 投票：154回答：19

19个回答

更新

最新问题

问题描述投票：154回答：19