String Comparison in Python
Learn via video course
String Comparison in Python
Abstract
String comparison is the process of comparing two strings for similarity. It is an important process in computer science. String comparison in python can be made both case sensitive(using == or !=) or case insensitive(using lower() or upper()).
Scope
- The article deals with various techniques used to compare strings in python
- It also includes a detailed explanation of those techniques.
- All the techniques used are followed by examples illustrated with source codes and outputs.
Introduction
For example, how would you compare two names if you need to sort a set of names in lexicographical(alphabetical) order? Well, the answer is string comparison. Read along to know more.
String comparison in python is the process of comparing two strings and deciding if they are equal or not equal to each other. If they are not equal, one string would be larger than the other lexicographically. Finding the larger or smaller string is also part of string compaison. It has various uses.
How to Compare Strings in Python?
String comparison is an important function in Python. We can compare strings in Python in the following ways -
Using is and is not
In python, a special technique known as string interning is used. This means that python strings are cached because they are immutable objects, and only one instance can be used as a reference for multiple declarations. These cached strings have a unique ID. This helps save memory usage.
Suppose a new string variable is created with the same value as a previously created string. In that case, python points this new variable to the previously created variable instead of creating a new variable in memory. This helps reduce memory usage. In such a case, both strings have the same ID. The is and is not methods compare strings based on their IDs.
is() function
The is function compares two strings by checking their ID. If both the strings have the same ID, it returns True, else it returns False.
Syntax: str1 is str2
In this syntax, str1 and str2 are the strings to be compared using the is operator.
Example
In this example, we have created strings str1, str2, and str3. Now when we check str1 with str2 using is function, we get True as output, as both the strings, being equal, have the same ID. When we check str1 with str3, we get False as both the string are different and have different IDs. But if we append 's' to str1 and then compare str1 with str3 using is, we still get False because even though both the strings have the same values, their IDs are different and the is function only checks the IDs of two strings.
In the given example, str1 and str2 pointed to the same cached string. But when we appended s to str1, its value changed, but its ID remained the same, leading to the below result.
Output
is not() function
The is not function compares two strings by checking their ID. If both the strings have the same ID, it returns False, else it returns True.
Syntax str1 is not str2
Here str1 and str2 are the strings that are to be compared using the is not operator.
Example
In this example, we have created strings str1, str2, and str3. Now when we check str1 with str2 using is not function, we get False as output, as both the strings, being equal, have the same ID. When we check str1 with str3, we get True as both the string are different and have different IDs. But if we append 's' to str1, and then compare str1 with str3 using is not, we still get True because even though both the strings have the same values, their IDs are different and the is not function only checks the IDs of two strings.
Output
Using Relational Operators
Relational Operators are used for comparing values in python. Relational operators are generally used for numeric values but they can also be applied to the string. Relational operators return true or false depending upon the condition. In the case of strings, relational operators sequentially compare each character of both strings according to the characters' Unicode value. As comparison is done for each character, relational operators maintain lexicographical order.
Operators
-
less than(<)- This operator returns True if the first string is lexicographically smaller than the second string. Otherwise, it returns False. It has the syntax: str1 < str2 where str1 and str2 are the strings to be compared.
-
greater than(>)-This operator returns True if the first string is lexicographically larger than the second string. Otherwise, it returns False. It has the syntax: str1 > str2 where str1 and str2 are the strings to be compared.
-
less than equal to(<=)-This operator returns True if the first string is lexicographically smaller than or equal to the second string, otherwise it returns False. It has the syntax: str1 <= str2 where str1 and str2 are the strings to be compared.
-
greater than equal to(>=)-This operator returns True if the first string is lexicographically larger than or equal to the second string. Otherwise, it returns False. It has the syntax: str1 >= str2 where str1 and str2 are the strings to be compared.
Example
In this example, we have created two strings str1 and str2. Then we compared these two strings using the above-mentioned relational operators.
Output
Comparison using User-Defined Function
1. Checking if Strings are equal
We can also have a user-defined function to compare two strings. This function would return whether the strings are equal or not.
Here we have implemented the comp_str function to compare two strings. First, the function checks whether the strings are of the same length. If they have different lengths, it returns that the strings are not equal. Otherwise, it compares characters at the same indices of both strings. If a mismatch is found, it returns that the strings are not equal. If no mismatch is found after comparing all the indices, it returns that the strings are equal.
Output
2. Finding Lexicographical order of Strings
We can also have a user-defined function to compare two strings. This function would return the string that is lexicographically(alphabetically as in a dictionary) larger. Here we have implemented the comp_str function to compare two strings. It compares characters at the same indices of both strings. If a character of a string is found to have a larger ASCII value than the character of the other string, that string is the lexicographically larger string. Otherwise, the comparison continues to the next character. The strings are equal if all the characters of both strings have the same ASCII values.
Output
String Comparison using ==
The == function compares the values of two strings and returns if they are equal. If the strings are equal, it returns True. Otherwise, it returns False. The difference between the is function and == function is that while is checks the IDs of the strings, == checks the values stored in the string. The == function is case-sensitive.
Syntax str1 == str2
Here str1 and str2 are strings to be compared using the == operator.
Example
In this example, we have created strings str1, str2, and str3. When we check str1==str2 we get True because the string has the same value. When we check str1==str3 we get False because the strings have different values. Then we append s to str1 to make it equal to str3. Now, if we check str1==str3, we get True because both the strings have the same value. This is what differentiates == from is. is compares IDs of strings, whereas == checks the values of the string.
Output
String Comparison using !=
The != function compares the values of two strings and returns if they are equal. If the strings are equal, it returns False. Otherwise, it returns True. The difference between the is not function and != function is that while is not checks the IDs of the strings, != checks the values stored in the string. The != function is case-sensitive.
Syntax str1 != str2
Here str1 and str2 are the strings to be compared using the != operator.
Example
In this example, we have created strings str1, str2, and str3. When we check str1 != str2, we get False because both strings are equal. When we check str1 != str3 we get True because even though both the strings have the same values, they have different cases and as stated earlier, != is case-sensitive, so it treats both strings as different and returns True.
Output
String comparison using re.match()
re.match() is a function of the python re module, which is the Regular Expression module. This function matches two regular expression patterns and returns whether they are equal. Syntax re.match(str1, regex,FLAG)
Here str1 is the string to be matched, regex/str2 is a regular expression to be matched with the string, and FLAG is an optional parameter to specify any flags. The re module is also required to be imported to use the re.match() function.
Examples 1. Case-insensitive comparison using re.match()
Case-insensitive comparison can be made using re.match() by using one string as the input string and the second string as the regular expression. Also, the re.IGNORECASE flag is specified to make the matching case insensitive.
In this example, we have compared str1 with str2 and str3. In both cases, we get the output that the strings are equal even though str3 is differently cased because we have used case-insensitive comparison using re.IGNORECASE flag.
Output
2. Case-sensitive comparison using re.match()
Case-sensitive comparison can be made using re.match() by using one string as the input string and the second string as the regular expression. Also, the re.IGNORECASE flag is removed to make the matching case-sensitive.
In this example, we have compared str1 with str2 and str3. In the first case, we get the output that the strings are equal, but in the second case, we get the output that the strings are not equal because str3 is differently cased and thus is treated as a different string.
Output
String comparison using finditer()
The finditer() function is a function of python's re(Regular Expression) module. It returns all the occurrences of the given regular expression in the string.
Syntax re.finditer(string,pattern)
Here string is the string to be searched, and the pattern is the regular expression to be searched for in the string.
Example
In this example we have compared str1 with str2 and str3. In the first case we get the output that the strings are equal because the finditer() found an occurrence of str1 in str2, but in the second case, we get the output that the strings are not equal because str3 is differently cased and thus is treated as a different string.
Output
Comparison using is equals to
Python has a function __eq__, which checks whether two strings are equal. It returns True if the strings are equal and return False if the strings are not equal. It is a case-sensitive function.
Syntax
str1.__eq__(str2)
Here str1 and str2 are the strings to be compared using the __eq__ operator.
Example
In this example we have compared str1 with str2 and str3. In the first case, we get the output that the strings are equal because __equal__ found that the string are equal, but in the second case, we get the output that the strings are not equal because str3 is differently cased and thus is treated as a different string.
Output
Case insensitive comparison with upper() or lower()
As we have seen, most of the methods mentioned above are case-sensitive, treating upper-case and lowercase characters differently. This causes similar strings with different casings not to match. However, this problem can be overcome by using the python lower() and upper() methods.
The lower() method takes a string as a parameter and returns the string after converting all its characters to lowercase. Syntax: str.lower() where str is the string.
The upper() method takes a string as a parameter and returns the string after converting all its characters to upper case. Syntax: str.upper() where str is the string.
How can these methods help in string comparison? Well, we can first change our input string to a common case, i.e., upper case or lower case, and then we can compare them using the == method.
Examples
- Using lower()
In this example(), we compare the strings str1 and str2. Both the string have a different casing, and thus directly using the == method would have returned that they are not equal, but while using the == method, we have to use the .lower() method on both the strings, which causes both of them to be converted to lower case and then checked. Thus we get the output that both the strings are equal.
Output
- Using upper()
In this example(), we compare the strings str1 and str2. Both strings have different casings, and thus directly using the == method would have returned that they are not equal. Still, while using the == method, we have to use the .upper() method on both the strings, which causes both of them to be converted to upper case and then checked using the == operator. Thus we get the output that both the strings are equal.
Output
Comparison using casefold()
The casefold() method in python converts all the characters in a string to lowercase. It is a little different from the lower() method as lower() only converts characters that are not already in the lowercase, whereas casefold() converts all the characters in a string. It has the syntax:
str.casefold() where str is the string to be changed
We can use casefold in a similar way we used lower() to compare strings. We can first use casefold() on both of our strings and then compare them using == to get case-insensitive results.
The differentiating factor between lower and casefold is that lower works only on ASCII characters, which is a set of 128 characters consisting only of the English alphabet, numbers, and punctuations, whereas casefold works on Unicode, which is a set of 144,697 characters consisting of alphabets of most modern and historic scripts. Thus lower would only work on a string of English languages, whereas casefold, being more versatile, would work on most of the languages in use.
Example
In this example(), we are comparing the string str1 and str2. Both strings have different casings, and thus directly using the == method would have returned that they are not equal, but while using the == method, we have to use the .casefold() method on both the strings, which causes both of them to be converted to lower case and then checked. Thus we get the output that both the strings are equal.
Output
String comparison using sorted()
The sorted function in python sorts the given data. When we apply the sorted function on a string, it gives a sorted list of the characters of the string. It has the syntax :
sorted(str) where str is the string to be sorted
Sorted is used when two strings are to be compared, irrespective of the position of their characters. It means that all the anagrams of a string would match. To apply this technique, we first apply the sorted method on both of our strings and then compare them using the == method.
Example
In this example(), we are comparing the strings str1 and str2. Both the string have the same characters but in a different order, and thus directly using the == method would have returned that they are not equal, but while using the == method, we have used the sorted() method on both the strings, which causes both of them to be sorted in alphabetical order and then checked. Thus we get the output that both the strings are equal.
Output
Compare strings using collections.Counter()
The collection.Counter() function is a part of the python collection module. It is used to store the count of each hashable object. It returns a dictionary with the hashable object as the key and its count as the value.
The collection.Counter method can be used to compare two strings, similar to sorted() method. It is used to compare two strings irrespective of the index of their characters.
To use the collections.Counter() method for string comparison, we apply the method on our input strings. This returns two dictionaries with the string characters as keys and their frequencies as values. Then we compare both the dictionaries using the == method, and if the dictionaries are equal, awe can say that the strings are permutations of each other and thus equal.
Example
In this example, we are using two strings str1 and str2. First, we have created their counter dictionaries, count_str1 and count_str2, respectively. Then we compared the counter dictionaries using the == method. We have got the output as True as both of our input strings are permutations of each other.
Output
Conclusion
- In this article, we have discussed the process of string comparison.
- This is followed by different methods to compare strings.
- Then we have also seen the process of comparing strings both in a case-sensitive and case-insensitive manner.
- That is followed by methods to compare different permutations of a string.