But wait! There is more to object equality. Even though overriding ==
worked for simple equality comparisons, there are some cases where that isn't just enough.
In the following example, we build an array of duplicate Item
objects and apply uniq
on it. See what happens to the uniq
:
We expected Array#uniq
to return only one element since the rest were duplicates; but it returned everything. Clearly, #uniq did not work. We did override the ==
method to return true
if the items are identical and we verified that it works by comparing an Item to its clone. So, what went wrong?
The short answer is that we failed to implement two other methods that are crucial to get object equality correct: the eql?
and hash
methods. Why do we need these two over and above the simple ==
?
There are a lot of operations in Ruby that need to check the equality of two objects. While ==
serves the purpose well, it is not really fast. For operations that might involve large number of equality checks (like Array#uniq
and Hash lookups), the speed disadvantage adds up and becomes an overhead. To get around this, Ruby provides a hash
method with every object. It returns a numeric value which is usually unique to every object.
In the following example, we print the hash values for different objects. Take a look:
Do not confuse the method hash
, which returns a hash code, with the data structure Hash. A hash code of an object is usually a short (and in Ruby, always numeric) identifier of an object. Hash is a data structure that uses the hash code of objects for fast key lookup and thus derives the name.
So instead of comparing two objects using ==
, which could be expensive when the objects are large, Ruby uses the hash
of the object when possible. Being a simple numeric value, this comparison is almost always faster than comparing the various instance variables of the underlying object.
The Array#uniq
method, as you might have guessed, uses the result of hash
to compare objects and identify duplicates. Let us see how this works out in practice:
Array#uniq
now works correctly for the item
object. This is because we implemented two methods: hash
and eql?
.
What is the hash
method doing? The ^
operator used is the binary XOR. The hash
method returns the result of XORing all the instance variables that determine the state of the object. This ensures that whenever the state of the object changes, the hash code as well changes. Distinct hash codes for distinct objects is an extremely desirable property of hash codes through which operations on collections become faster.
We also introduced the eql?
method in the above example. In fact it was called by Array#uniq
twice to check the equality of the elements of the array. Even though we use ==
to check for equality of objects, routines like Array#uniq
uses the eql?
instead. This means that we must implement the eql?
method as well whenever we override ==
. In most cases, these two methods will be identical, so you can implement the actual comparison in one method and have the other method just call it.
To summarize, if you ever override any of the ==
, eql?
or the hash
method, you must override the others as well.