I am passing a name of existing file as a parameter to my Python3 script and it couldn’t be passed correctly when it contained characters outside of system’s codeset (cp1251
in my case).
Here is the code I used to test:
#!/usr/bin/env python3
import sys, locale, os
print('Python location: ' + sys.exec_prefix)
print('sys.getdefaultencoding: ' + sys.getdefaultencoding())
print('sys.getfilesystemencoding: ' + sys.getfilesystemencoding())
print('locale.getpreferredencoding: ' + locale.getpreferredencoding())
print('sys.stdin.encoding: ' + str(sys.stdin.encoding))
print('sys.argv[1] type: ' + str(type(sys.argv[1])))
if (os.path.exists(sys.argv[1])): print('File found.\n')
else: print('File not found.\n')
print('sys.argv[1] value: ' + sys.argv[1])
Initially I had results like this:
Python location: c:\program files (x86)\python
sys.getdefaultencoding: utf-8
sys.getfilesystemencoding: mbcs
locale.getpreferredencoding: cp1251
sys.stdin.encoding: cp1251
sys.argv[1] type: < class ‘str’>
File not found.
When I was feeding a filename containing only latin (or cyrillic, i.e. corresponding to my cp1251
) characters, everything worked fine and file could be located. But when it contained latin diacrytic characters (â
,è
, etc), they were converted to the similar characters from basic latin and obviously file with this name couldn’t be found. Then I added `` PYTHONIOENCODING=utf-8
to system environment variables and confirmed by checking that now sys.stdin.encoding
was shown as utf-8
. With these settings, my script started worked correctly with unicode input arguments - but only when I was executing it from the command line! When I launched it from Komodo IDE (entering argument in the Debugging Options
window), unicode characters were lost. This problem seems to be specific to Komodo IDE as it doesn’t occur neither when launch from command line nor from other IDE (PyCharm). Should I change something in the settings?