String Split Function in Python
Learn via video course
Overview
We can split a string in Python into a list of strings by using split() function. We can do this by breaking the given string by the specified separator. We can specify maxsplit, and separator beforehand and pass them as parameters while using the split() function.
Introduction to String Split() Function of Python
The string split() function splits or breaks the entire string at particular points into a certain number of parts. Those points are uniquely identified by the computer.
A real-life example is seen in MS Excel, where the string entered in the cell is split based on some delimiter.
In the above picture, the string entered in each cell gets split into separate columns on the basis of the delimiter, i.e., hyphen ("-"), i.e., the string in the first cell "Dress-Blue-S" gets split to ["Dress", "Blue", "S"], and in a similar way rest of the cells are split. This is exactly how the string split() function works in Python.
Working of Split Function in Python
In the picture given above, we see that the input string is split into three parts, i.e., "Apple", "Mango" and "Orange", and it is split at those points where there is a semicolon present.
As we see, punctuation marks help us in separating two sections in a sentence. Similarly, delimiters separate two different regions in a stream of data. For example ',', ';', '@', '&', ':', '(', '>' etc all these are delimiters. The semicolon acts as a delimiter here; in splitting the given string.
The string split() function in Python splits a given string using a delimiter or a separator and then returns us a list of strings. Like in the figure given above, after splitting the string, we get a list of words -> ['Apple', 'Mango', 'Orange']
Syntax of Python String Split
str: variable containing the input string. Datatype – string.
Parameters of Split Function in Python
- separator: This is the delimiter. The string splits at this specified delimiter. This is optional, i.e., you may or may not specify a separator. In case no separator has been specified, the default separator is space. In the split() function, we can have one or more characters as delimiters, or we can also have two or more words as delimiters.
- maxsplit: This specifies the maximum number of times the string should be split. This is optional, i.e., you may or may not specify the maxsplit count. In case it has not been specified, the default value is -1, i.e., no limit on the number of splits. In case any negative value is entered, it works the same as in the case when no value is specified.
Return Type of Split() String Function
The split() function returns a list of strings.
Corner Cases
- The split() function can only be used on string variables. In case we use it with any other data type, it shows a syntax error.
- If we specify the maxsplit count but do not specify any separator, even then, the interpreter shows a syntax error. Example – print(str.split(,5))
Example of Python String Split()
Without Specifying Any Maxsplit
Case 1:
In the above case, no separator has been specified, and hence the default separator i.e., space is taken to split the string. Also, we see that no maxsplit count is given, so the default value is -1, i.e., no limit on the number of splits. So the string is split wherever a space is found.
Output:
Case 2
In the above case, the separator has been specified, i.e., (",") comma. But since there is no maxsplit count, so the default value again is -1, i.e., no limit on the number of splits.
Output:
Case 3:
In the above case, the separator is ("is a"), which is a set of characters, so the string is split at all those points where it finds the ("is a") substring since there is no maxsplit count.
Output:
With Specifying Maxsplit
Case 1:
In the above case, since only one parameter has been specified, the interpreter takes this as the delimiter, and since delimiters can only be string arguments, we get to see a syntax error when we run the code.
Output:
We get a syntax error -> TypeError: must be str or None, not int.
Case 2:
In the above case, the separator parameter is empty, and the maxsplit count is 2. In this case, since no delimiter has been specified, the default delimiter i.e., space, should've been taken. But instead, we get a syntax error. Thus an important point to note is that if we have specified the maxsplit count, then the delimiter has to be specified, or else it will cause an error.
Output:
We get a syntax error -> Syntax Error: Invalid Syntax
Case 3:
In the above case, the separator is a set of words, i.e., ("a and"), and the maxsplit count is 5. We see that the maxsplit count exceeds the number of occurrences of the delimiter; hence in such cases, the string is split at all points wherever the delimiter is present. This is the same as the case when maxsplit count is -1, or it has not been specified.
Output:
Summary
- We learned about the split() function along with its working.
- We also learned about maxsplit and separator parameters and their functionality.
- We saw that both the split() function parameters are optional. If they have not been specified, the interpreter takes the default values.
- We realized that the split() function works only on string variables. So in case, we have a non-string object, we need first to convert it to a string type and then use it.