# Unrecognized backslash escapes in string literals

9 messages
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 In Python, unrecognized escape sequences are treated literally, without (as far as I can tell) any sort of warning or anything. This can mask bugs, especially when Windows path names are used: >>> 'C:\sqlite\Beginner.db' 'C:\\sqlite\\Beginner.db' >>> 'c:\sqlite\beginner.db' 'c:\\sqlite\x08eginner.db' To a typical Windows user, the two strings should be equivalent - case insensitive file names, who cares whether you say "Beginner" or "beginner"? But to Python, one of them will happen to work, the other will fail badly. Why is it that Python interprets them this way, and doesn't even give a warning? What happened to errors not passing silently? Or, looking at this the other way: Is there a way to enable such warnings/errors? I can't see one in 'python[3] -h', but if there's some way elsewhere, that would be a useful thing to recommend to people (I already recommend running Python 2 with -tt). ChrisA
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 Chris Angelico writes: > In Python, unrecognized escape sequences are treated literally, > without (as far as I can tell) any sort of warning or anything. Right. Text strings literals are documented to work that way , which refers the reader to the language reference . > Why is it that Python interprets them this way, and doesn't even give > a warning? Because the interpretation of those literals is unambiguous and correct. It's unfortunate that MS Windows inherited the incompatible ?backslash is a path separator?, long after backslash was already established in many programming languages as the escape character. > Is there a way to enable such warnings/errors? A warning or error for a correctly formatted literal with an unambiguous meaning would be an up-Pythonic thing to have. I can see the motivation, but really the best solution is to learn that the backslash is an escape character in Python text string literals. This has the advantage that it's the same escape character used for text string literals in virtually every other programming language, so you're not needing to learn anything unusual. --  \        ?The deepest sin against the human mind is to believe things |   \           without evidence.? ?Thomas Henry Huxley, _Evolution and | _o__)                                                    Ethics_, 1893 | Ben Finney
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 In reply to this post by Chris Angelico On 02/22/2015 09:29 PM, Chris Angelico wrote: > In Python, unrecognized escape sequences are treated literally, > without (as far as I can tell) any sort of warning or anything. This > can mask bugs, especially when Windows path names are used: > >>>> 'C:\sqlite\Beginner.db' > 'C:\\sqlite\\Beginner.db' >>>> 'c:\sqlite\beginner.db' > 'c:\\sqlite\x08eginner.db' > > To a typical Windows user, the two strings should be equivalent - case > insensitive file names, who cares whether you say "Beginner" or > "beginner"? But to Python, one of them will happen to work, the other > will fail badly. > > Why is it that Python interprets them this way, and doesn't even give > a warning? What happened to errors not passing silently? Or, looking > at this the other way: Is there a way to enable such warnings/errors? > I can't see one in 'python[3] -h', but if there's some way elsewhere, > that would be a useful thing to recommend to people (I already > recommend running Python 2 with -tt). > > ChrisA > I've long thought they should be errors, but in Python they're not even warnings.  It's one thing to let a user be sloppy on a shell's commandline, but in a program, if you have an invalid escape sequence, it should be an invalid string literal, full stop. And Python doesn't even treat these invalid sequences the same (broken) way C does.  The documentation explicitly says it's different than C. If you're going to be different, at least be strict. -- DaveA
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 In reply to this post by Ben Finney-10 On Mon, Feb 23, 2015 at 1:41 PM, Ben Finney wrote: > Chris Angelico writes: > >> Why is it that Python interprets them this way, and doesn't even give >> a warning? > > Because the interpretation of those literals is unambiguous and correct. And it also implies that never, in the entire infinite future of Python development, will any additional escapes be invented - because then it'd be ambiguous (in versions up to X, "\s" means "\\s", and after that, "\s" means something else). > It's unfortunate that MS Windows inherited the incompatible ?backslash > is a path separator?, long after backslash was already established in > many programming languages as the escape character. I agree, the fault is primarily with Windows. But I've seen similar issues when people use /-\| for box drawing and framing and such; Windows paths are by far the most common case of this, but not the sole. >> Is there a way to enable such warnings/errors? > > A warning or error for a correctly formatted literal with an unambiguous > meaning would be an up-Pythonic thing to have. > ... > This has the advantage that it's the same escape character used for text > string literals in virtually every other programming language, so you're > not needing to learn anything unusual. And yet the treatment of the edge case differs. In C, for instance, you get a compiler warning, and then the backslash is removed and you're left with just the other character. The trouble isn't that people need to learn that backslashes are special in Python string literals. The trouble is that, especially when file names are frequently being written with uppercase first letters, it's very easy to have code that just so happens to work, without being reliable. Having spent some time working with paths like these: fn = "C:\Foo\Bar\Asdf.ext" and then to find that each of these fails, but in a different way: path = "C:\Foo\Bar\"; fn = path + "Asdf.ext" fn = "c:\foo\bar\asdf.ext" fn = "c:\users\myname\blah" would surely count as surprising. Particularly since the last one will work fine in Python 2 sans unicode_literals, and will then blow up in Python 3 - because, contrary to the "no additional escapes" assumption, Unicode strings introduced new escapes, which means that "\u0123" has different meaning in byte strings and Unicode strings. In fact, that's an exception to the usual rule of "upper case is safe", and it's one that *will* trip people up, thanks to the "C:\Users" directory on a modern Windows system. What's the betting people will blame the failure on Python 3 and/or Unicode, rather than on the sloppy use of escapes and the poor choice of path separator on a popular platform? ChrisA
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 In reply to this post by Ben Finney-10 On 02/22/2015 09:41 PM, Ben Finney wrote: > Chris Angelico writes: > >> In Python, unrecognized escape sequences are treated literally, >> without (as far as I can tell) any sort of warning or anything. > > Right. Text strings literals are documented to work that way > , > which refers the reader to the language reference > . > >> Why is it that Python interprets them this way, and doesn't even give >> a warning? > > Because the interpretation of those literals is unambiguous and correct. Correct according to a misguided language definition. > > It's unfortunate that MS Windows inherited the incompatible ?backslash > is a path separator?, long after backslash was already established in > many programming languages as the escape character. Windows "inherited" it from DOS.  But since Windows was nothing but a DOS shell for several years, that's not surprising.  The historical problem came from CP/M's use of the forward slash for a switch-character.  Since MSDOS/PCDOS/QDOS was trying to permit transliterated CP/M programs, and because subdirectories were an afterthought (version 2.0), they felt they needed to pick a different character.  At one time, the switch-character could be set by the user, but most programs ignored that, so it died. > >> Is there a way to enable such warnings/errors? > > A warning or error for a correctly formatted literal with an unambiguous > meaning would be an up-Pythonic thing to have. > > I can see the motivation, but really the best solution is to learn that > the backslash is an escape character in Python text string literals. > > This has the advantage that it's the same escape character used for text > string literals in virtually every other programming language, so you're > not needing to learn anything unusual. > I might be able to buy that argument if it was done the same way, but as it says in:    https://docs.python.org/3/reference/lexical_analysis.html#strings"""Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) """ The word "broken" is an admission that this was a flawed approach.  If it's broken, it should be an error. I'm not suggesting that the implementation should falsely trigger an error.  But that the language definition should be changed to define it as an error. -- DaveA
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 In reply to this post by Ben Finney-10 On Mon, Feb 23, 2015 at 1:41 PM, Ben Finney wrote: > Right. Text strings literals are documented to work that way > , > which refers the reader to the language reference > . BTW, quoting from that: """ Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) """ I'm not sure it's more obviously broken. Comparing Python and Pike: >>> "asdf\qwer" 'asdf\\qwer' > "asdf\qwer"; (1) Result: "asdfqwer" Which is the "more easily recognized as broken" depends on what the actual intention was. If you wanted to have a backslash (eg a path name), then the second one is, because you've just run two path components together. If you wanted to have some sort of special character ("\n"), then they're both going to be about the same - you'd expect to see "\n" in the output, one has added a backslash (assuming you're looking at the repr), the other has removed it. Likewise if you wanted some other symbol (eg forward slash), they're about the same (a doubled backslash, or a complete omission, same diff). But if you just fat-fingered a backslash into a string where it completely doesn't belong, then seeing a doubled backslash is definitely better than seeing just the following character (which would mask the error entirely). Since the interpreter can't know what the intention was, it obviously has to do just one thing and stick with it. I'm not convinced this is really an advantage. Python has been aiming more and more towards showing problems immediately, rather than having them depend on your data - for instance, instead of letting you treat bytes and characters as identical until you hit something that isn't ASCII, Py3 forces you to distinguish from the start. That said, though, there's probably a lot of code out there that depends on backslashes being non-special, so it's quite probably something that can't be changed. But it'd be nice to be able to turn on a warning for it. ChrisA
Open this post in threaded view
|

## Unrecognized backslash escapes in string literals

 Chris Angelico writes: > That said, though, there's probably a lot of code out there that > depends on backslashes being non-special, so it's quite probably > something that can't be changed. But it'd be nice to be able to turn > on a warning for it. If you're motivated to see such warnings, an appropriate place to implement them would be in PyLint or another established static code analysis tool. --  \            ?The whole area of [treating source code as intellectual |   \    property] is almost assuring a customer that you are not going | _o__)               to do any innovation in the future.? ?Gary Barnett | Ben Finney
 Ben Finney wrote: > Chris Angelico writes: > >> That said, though, there's probably a lot of code out there that >> depends on backslashes being non-special, so it's quite probably >> something that can't be changed. But it'd be nice to be able to turn >> on a warning for it. > > If you're motivated to see such warnings, an appropriate place to > implement them would be in PyLint or another established static code > analysis tool. Pylint already produces a warning. However, it cannot read the author's mind: $cat tmp.py print("C:\alpha") print("C:\beta") print("C:\gamma")$ pylint tmp.py ************* Module tmp W:  3, 0: Anomalous backslash in string: '\g'. String constant might be missing an r prefix. (anomalous-backslash-in-string) C:  1, 0: Missing module docstring (missing-docstring) The same would go for a warning built into the compiler. Maybe having editors highlight the special combinations would be the more helpful approach. A tooltip could explain the meaning.