Ruby has a helpful method for removing duplicates from an array, the uniq method. However, there are times when you simply want to know which elements in an array are duplicates. In this guide we’ll add a method to the Array class that returns all duplicates.
Summary
Build a method that returns all of the duplicates from an array in Ruby.
Exercise File
Exercise Description
Add a new method to Ruby’s Array class that returns all duplicate values.
Example Input/Output
ints = [1, 2, 1, 4] ints.find_duplicates # => [1] invoices = [ { company: 'Google', amount: 500, date: Date.new(2017, 01, 01).to_s, employee: 'Jon Snow' }, { company: 'Yahoo', amount: 500, date: Date.new(2017, 01, 01).to_s, employee: 'Jon Snow' }, { company: 'Google', amount: 500, date: Date.new(2015, 07, 31).to_s, employee: 'Jon Snow' }, { company: 'Google', amount: 500, date: Date.new(2017, 01, 01).to_s, employee: 'Jon Snow' }, { company: 'Google', amount: 500, date: Date.new(2017, 01, 01).to_s, employee: 'Jon Snow' }, { company: 'Google', amount: 500, date: Date.new(2017, 01, 01).to_s, employee: 'Jon Snow', notes: 'Some notes' }, { company: 'Google', amount: 500, date: Date.new(2017, 01, 01).to_s, employee: 'Jon Snow', notes: 'Some notes' }, ] invoices.find_duplicates # => [ # => {:company=>"Google", :amount=>500, :date=>'2017-01-01', :employee=>"Jon Snow"}, # => {:company=>"Google", :amount=>500, :date=>'2017-01-01', :employee=>"Jon Snow"}, # => {:company=>"Google", :amount=>500, :date=>'2017-01-01', :employee=>"Jon Snow", :notes=>"Some notes"} # => ]
Real World Usage
I got the idea for this exercise when I accidentally submitted a duplicate expense into Freshbooks and the system did a great job in letting me know that I may have a potential duplicate expense. Additionally, Ruby has a very helpful Array class method, uniq
, that removes all duplicates from an array. However, Ruby doesn’t have a simple way to find all duplicates in a collection, so this will help you examine how to parse through arrays efficiently to return all of the duplicate values.
Solution
Can be found on the solutions branch on github.
I believe you can actually do this faster using a hash lookup. The following script shows benchmarks between your version and what looks to be a faster version I came up with.
# Create a very large array, with a random number of duplicates
ary = [].tap { |a| 100_000.times { a.push rand(15_000_000) } }
# Crondose find_dups method
def fast_dups(ary)
ary.select.with_index { |el, i| ary.index(el) != i }.uniq
end
# My find_dups method
def faster_dups(ary)
found = {}
dups = []
ary.each do |el|
dups.push el if found[el]
found[el] = true
end
dups.uniq
end
# Sanity check to make sure they are both returning the same values
puts ‘Sanity Check’
puts fast_dups(ary).sort == slow_dups(ary).sort
require ‘benchmark’
puts ‘Cron Dose Benchmark’
puts Benchmark.measure {
fast_dups(ary)
}
puts ‘My Dup Benchmark’
puts Benchmark.measure {
slow_dups(ary)
}