Chris's coding blog

String.Equals(), == and String.Compare()

March 02, 2010

Most C# books will tell you from the early chapters that you should always override Equals in your class instead of relying on the base Object.Equals. As mentioned a few times previously, this is essential for value types as the base ValueType.Equals() method (which overrides Object.Equals) uses reflection to decide whether two objects are equal, comparing the field values. The source code for ValueType.Equals (from the Shared Source CLI) is shown below.

public override bool Equals(object obj)
{
if (obj == null)
{
return false;
}
RuntimeType type = (RuntimeType)base.GetType();
RuntimeType type2 = (RuntimeType)obj.GetType();
if (type2 != type)
{
return false;
}
object a = this;
if (CanCompareBits(this))
{
return FastEqualsCheck(a, obj);
}
FieldInfo[] fields = type.GetFields(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance);
for (int i = 0; i < fields.Length; i++)
{
object obj3 = ((RtFieldInfo)fields[i]).InternalGetValue(a, false);
object obj4 = ((RtFieldInfo)fields[i]).InternalGetValue(obj, false);
if (obj3 == null)
{
if (obj4 != null)
{
return false;
}
}
else if (!obj3.Equals(obj4))
{
return false;
}
}
return true;
}
view raw gistfile1.cs hosted with ❤ by GitHub

For reference types, the base Object.Equals() method does a call to an internal method as its signature shows:

public virtual bool Equals(object obj)
{
	return InternalEquals(this, obj);
}

[MethodImpl(MethodImplOptions.InternalCall)]
internal static extern bool InternalEquals(object objA, object objB);

This InternalEquals method can be found in the CLR VM source code here, or if you prefer to download the source, it’s at Shared Source CLI 2.0.

Assuming you know some basic C++, you can see from the InternalEquals source that all sanity checks are done inside the VM rather than the Object.Equals() source show above. These include checking both types aren’t null, checking they’re the same type (using a method table lookup). The memory addresses of the two reference types are then compared, returning true or false.

For reference types that don’t perform operator overloading, == will check whether its memory address is equal to the 2nd object. ReferenceEquals() performs the same task to this. The MSDN docs describe how “To check for reference equality, use ReferenceEquals. To check for value equality, use Equals.”. This is a bit contradictory as the Object.Equals() method checks for reference equality, so reference types that don’t override Equals() will be checking for reference equality by default.

With String.Equals(), the String class has overridden Object.Equals() so that it does its own comparison checking instead of the default reference check:

public override bool Equals(object obj)
{
	string strB = obj as string;
	if ((strB == null) && (this != null))
	{
		return false;
	}
	return EqualsHelper(this, strB);
}

EqualsHelper is the internal method that determines if two strings are the same. This use to be inside the VM source in 1.1, but has since been moved into the BCL source code. The first check it does is to determine if they are the same length, returning false if they’re not. After this it checks each character (in blocks of 10, working backwards on the string) 2 bytes at a time (i.e. a single character in Unicode).

The String class also overrides the == and != operators:

public static bool operator ==(string a, string b)
{
	return Equals(a, b);
}

Now say you do the check on two Unicode strings that don’t use the default Western ISO 1252 codepage (the default on Windows US/UK installs):

string a = "Unicode text 1";
string b = "Unicode text 2";
if ( a == b )
	Console.WriteLine("Equal from ==");
if ( a.Equals(b) )
	Console.WriteLine("Equals from Equals()");

Equals() does a 2 byte Unicode lookup (strings are stored in Unicode format internally in .NET). I’ve heard or read in the past that “equals is faster than ==”. Here’s the IL from above:

IL_0000: nop
IL_0001: ldstr "Unicode text 1"
IL_0006: stloc.0
IL_0007: ldstr "Unicode text 2"
IL_000c: stloc.1
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: call bool [mscorlib]System.String::op_Equality(string,string)
IL_0014: ldc.i4.0
IL_0015: ceq
IL_0017: stloc.2
IL_0018: ldloc.2
IL_0019: brtrue.s IL_0026
IL_001b: ldstr "Equal from =="
IL_0020: call void [mscorlib]System.Console::WriteLine(string)
IL_0025: nop
IL_0026: ldloc.0
IL_0027: ldloc.1
IL_0028: callvirt instance bool [mscorlib]System.String::Equals(string)
IL_002d: ldc.i4.0
IL_002e: ceq
IL_0030: stloc.2
IL_0031: ldloc.2
IL_0032: brtrue.s IL_003f
IL_0034: ldstr "Equals from Equals()"
IL_0039: call void [mscorlib]System.Console::WriteLine(string)
IL_003e: nop
IL_003f: ret
view raw gistfile1.cs hosted with ❤ by GitHub

Is it really faster/more optimised to call Equals()? The code above shows that it’s one less method on the stack so arguably yes. The speed difference is unlikely to be noticeable in 99% of applications, and I prefer, and always will use == unless asked not to. But it’s down to preference of the programmer, and what languages have influenced them in the past. There is an argument that says you should always use Equals as it describes to other programmers exactly what type of comparison you are doing (ordinal case-insensitive), which I can see I still prefer == though.

MSDN describes what == does (or the Equals method it calls):

This method performs an ordinal (case-sensitive and culture-insensitive) comparison.

When would you ever need a non-ordinal (the numeric value of the character) comparison, i.e. String.Compare instead of == or Equals()? Inside a method that uses sorting, for example an IComparer implementation. Sorting è,é,ê and ë for is one example. The Equals() method in comparison is more suited to checking your internal strings such as filenames, resource names, database connection strings.

assembliescsharp

I'm Chris Small, a software engineer working in London. This is my tech blog. Find out more about me via GithubStackoverflowResume