Some unexpected characters are numeric or digits
@July 21, 2023
We found an interesting trait of str.isnumeric()
and str.isdigit()
that we didn’t expect…
Let’s first look at the help texts (Python 3.10.6)
isdigit(self, /)
Return True if the string is a digit string, False otherwise. A string is a digit string if all characters in the string are digits and there is at least one character in the string.
and conversely
isnumeric(self, /)
Return True if the string is a numeric string, False otherwise. A string is numeric if all characters in the string are numeric and there is at least one character in the string.
This can lead to the impression that if a string consists of only digits, it must be a number and indeed, in most cases this works as expected:
>>> num_str = “1234”
>>> num_str.isdigit()
True
>>> num_str.isnumeric()
True
>>> int(num_str)
1234
but there is now the case of raised numbers and fraction ASCII characters that we didn’t expect and that you should consider:
>>> pow3 = '³'
>>> pow3.isdigit()
True
>>> pow3.isnumeric()
True
>>> int(pow3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '³'
>>> quarter = '¼'
>>> quarter.isdigit()
False
>>> quarter.isnumeric()
True
>>> float(quarter)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '¼'
So consider using good old (and slower) regular expressions for input validation that is then fed to int()
or float()
import re
re.match('\d+', pow3) # this pretend-digit is no match for regex :D