Unicode user and file names (and v2.7.1)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Unicode user and file names (and v2.7.1)

Jeff Allen-2
I've been working on http://bugs.jython.org/issue2356 which I'd like to
get in 2.7.1 -- it seems rather poor that Jython simply does not run for
users whose names have an un-American character ;). I know this issue is
not a blocker in most minds.

I've made pretty good progress by allowing file names to be unicode
objects more often than they would be in CPython 2, which usually
returns them as bytes in some encoding that we may not know. I've got
the launcher to work properly, and straightened the logic in our
printing of trace-backs and exceptions from Java. Unicode file names
seems the way to go for Jython because:

 1. Java gives us competently decoded unicode file names, from
    java.io.File, etc.. Re-encoding the result will be a pain (and
    overlooked).
 2. We appear not to have the codec we need ('mbcs'), that CPython
    reports on Windows via sys.getfilesystemencoding().
 3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.

Most regression tests pass. However, I'm struggling with test_doctest.
Problems arise when mixing unicode and bytes when one byte is 128 and
over. This happens in ''.join(list) and formatted output like "%s %s" %
(ustr, bstr). The behaviour of these is identical with CPython: they
raise UnicodeDecodeError because the bytes are promoted to characters
with a strict ascii interpretation. This happens a lot in doctest.py and
traceback.py, for example, where file paths and stack dumps that include
them, are now frequently unicode, while other inputs are byte data
containing file paths presented in the console encoding.

I can beat this into submission with enough customisation of the stdlib
modules, but that always makes me uncomfortable. I usually see that as a
hint that user code might also need to change. This may be unfounded. I
can probably ensure no impact to users of only ascii paths, and the
others seem unable to run Jython at all (in the scope of this issue).
However, I'm seriously wondering if I should pursue the approach where
file names from Java are re-encoded to bytes (maybe as utf-8
everywhere), but that's grim.

Thoughts?

--
Jeff Allen


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jeff Allen-2
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:

> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Darjus Loktevic
Hey Jeff,

It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.

Cheers,
Darjus

On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jeff Allen-2

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,

It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.

Cheers,
Darjus

On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Darjus Loktevic

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus


On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,

It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.

Cheers,
Darjus

On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Stefan Richthofer
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jim Baker-2
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Darjus Loktevic
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jim Baker-2
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Stefan Richthofer
I'd like to have http://bugs.jython.org/issue2536 "fixed" by removing that infinite recursion test; maybe we could even close this annoying issue then. A new Jython release should have reliable test suite. However I don't know the best way to remove or blacklist a certain test.
Jeff, that's your domain IIRC...? Suggestions?
 
Also, we might want to update (some) extlibs to current versions, e.g. I tested current Guava while working on #2536 two months ago and found no issues with it (at least not regarding regrtests). http://bugs.jython.org/issue2582 requires a current jnr version in order to decide further steps on that issue. There are probably more. I don't suggest to spend much time on this. Just do updates that cause no issues and leave everything else as it is. Yes the list of extlibs to check is long. Maybe we can split this work between us and maybe more volunteers?
 
@Jim: Could you share some more insights about #2487, what you tried and why it failed. I'd like to give it at least anther look before next RC.
 
I once tried to update lib-python to CPython 2.7.13 version, but that caused lots of additional regrtest failures, 35 or so. Would probably take another year to resolve this stuff. I guess that time would be better invested in Jython 3. However, maybe we can revisit this for Jython 2.7.2.
 
BTW, has anyone thought about a path for porting Jython 2.7.1 features and fixes to Jython 3? Jython 3 was forked shortly after 2.7.0 release and already received some serious amount of work. So we already have a notable divergence between Jython 3 and Jython 2.7.1. IMO this is a high priority topic right after Jython 2.7.1 release, because every further piece of 2.7.x work can make this worse.
 
 
-Stefan
 
 
Gesendet: Samstag, 20. Mai 2017 um 20:48 Uhr
Von: "Jim Baker" <[hidden email]>
An: "Darjus Loktevic" <[hidden email]>
Cc: "Stefan Richthofer" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)
+100
 
On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.
 
Let's get Jeff's work in and do an RC?
 
Darjus
 
On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)
 
Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.
 
The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570
 
 
On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jeff Allen-2
In reply to this post by Jim Baker-2

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff
Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Darjus Loktevic
Hey Guys,

Regarding Jython3, looks like Isaiah has done a ton of work in 2016 (CCd). Not sure how far he progressed, but indeed merging will be hard and therefore I'd say we should not diverge further while developing on both branches, but instead try to finalize 2.7 and switch to Jython3 full-time.

Feel free to disagree, but here's my thinking on it:
  1. Release Jython 2.7.1
  2. Modernize the codebase. I think it's important for the project to feel modern for us to attract new contributors.
    1. Java8 as the minimum (may be too much for Jython2).
    2. Github/core-workflow
    3. (Ideally) ANTLR4 for both branches, but worst case, Jython3 only. ANTLR3 is not getting much love and ANTLR4 is quite different (does not generate AST).
    4. Gradle, directory structure.
  3. Develop Jython3 primarily. Only bugfixes for 2.7 series.
    1. Target 3.6 (really like the typing improvements).
    2. Merge JyNI if possible.
Cheers,
Darjus

On Sat, May 20, 2017 at 11:45 PM Jeff Allen <[hidden email]> wrote:

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff

Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jim Baker-2
Agreed. Jython 3 is inherently much more interesting to work on, and at this point we should be more or less done with must have functionality for 2.7.1. Get it out now! :)

I would not worry so much about backporting Antlr changes to Jython 2.7. We should restrict backports to runtime fixes, which has been the focus of all recent 2.7 work anyway.

Because of async/await and standard typing support (https://docs.python.org/3/whatsnew/3.6.html#pep-526-syntax-for-variable-annotations is particularly relevant), we should focus on 3.6, although 3.7 is not so different, at least yet (https://docs.python.org/3.7/whatsnew/3.7.html)

Personally I could see two things I might work on:
- Jim

On Sun, May 21, 2017 at 9:35 AM, Darjus Loktevic <[hidden email]> wrote:
Hey Guys,

Regarding Jython3, looks like Isaiah has done a ton of work in 2016 (CCd). Not sure how far he progressed, but indeed merging will be hard and therefore I'd say we should not diverge further while developing on both branches, but instead try to finalize 2.7 and switch to Jython3 full-time.

Feel free to disagree, but here's my thinking on it:
  1. Release Jython 2.7.1
  2. Modernize the codebase. I think it's important for the project to feel modern for us to attract new contributors.
    1. Java8 as the minimum (may be too much for Jython2).
    2. Github/core-workflow
    3. (Ideally) ANTLR4 for both branches, but worst case, Jython3 only. ANTLR3 is not getting much love and ANTLR4 is quite different (does not generate AST).
    4. Gradle, directory structure.
  3. Develop Jython3 primarily. Only bugfixes for 2.7 series.
    1. Target 3.6 (really like the typing improvements).
    2. Merge JyNI if possible.
Cheers,
Darjus

On Sat, May 20, 2017 at 11:45 PM Jeff Allen <[hidden email]> wrote:

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff

Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Updating extlibs

Stefan Richthofer
Hey all,
I spent some effort to explore feasibility of updating extlibs, especially minor versions (but sometimes even major version, e.g. guava and icu4j).
My results so far, tested with regrtest on Linux and Windows 10 using Java8 ("okay" means I did not observe additional regrtest failures):

ASM 5.0.4             -> 5.2     okay
bouncycastle:
bcpkix-jdk15on-1.54   -> 1.57    okay
bcprov-jdk15on-1.54   -> 1.57    okay

commons-compress-1.12 -> 1.14    okay
guava-20.0            -> 22.0rc1 okay
icu4j-58.1            -> 59_1    okay
Netty 4.1.6           -> 4.1.11  okay
java-sizeof-0.0.5     -- still current
jffi-1.2.13           -> 1.2.15  okay
jnr-ffi-2.1.0         -> 2.1.5   okay
jnr-netdb-1.1.6       -- still current
jnr-posix-3.0.31      -> 3.0.41  okay
jnr-constants-0.9.5   -> 0.9.9   okay
New platforms: jffi-aarch64-Linux.jar, jffi-ppc64le-Linux.jar, can be added...?
Updated various other changed platform specific jars to jffi-1.2.15 (okay as far as tested)
xercesImpl-2.11.0          -- still current
jline-2.14.2               -> 2.14.3      okay (didn't try jline-3 this time)
jarjar-1.4                 -- still current
mysql-connector-java-5.1.6 -> 5.1.42      okay
postgresql-8.3-603.jdbc4   -> 42.1.1-jre7 okay

----------These failed, so leaving them as it is:
Antlr 3.1.3                -> 3.5.2  fails
junit-4.10                 -> 4.12 or 4.11 class file for org.hamcrest.Matcher not found
(staying with 4.10 for now to avoid new dependency on hamcrest-matcher)
javax.servlet-api-2.5      -> 3.1.0  fails
mockrunner      (better don't touch; whole structure changed)
cpptasks        (better don't touch)

- will be able to test with Java 7 on Tuesday, because I left my old laptop in the office.
- will upload a fork containing these updates. Would be good if someone else could also test, especially on OSX.
Maybe some stuff was not covered by regrtests. However the chance that updates solve issues and that they create issues are probably somewhat equal and I'd prefer to focus on fixing issues with current versions rather than older ones.
So I'm strongly for getting this in if regrtests go through on Java 7 and OSX. 2.7.1RC-phase will be a good opportunity to confirm workability of these updates. Any concerns?

-Stefan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Updating extlibs

Stefan Richthofer
I uploaded the mentioned updates to
https://github.com/Stewori/jython.
See detailed changes at
https://github.com/Stewori/jython/commit/ee2b1263306f779bf8e9499afc6d02267648ac36

This might not yet work smoothly with Java 7, I will check and adjust that tomorrow.
Some packages were built from source using Java 8 and I'm not sure whether the gradle
scripts always configured Java 7 source compatibility properly.
However if some people could test it, especially on OSX, would be nice.

Best

Stefan


> Gesendet: Montag, 22. Mai 2017 um 04:07 Uhr
> Von: "Stefan Richthofer" <[hidden email]>
> An: "Jython Developers" <[hidden email]>
> Betreff: Updating extlibs
>
> Hey all,
> I spent some effort to explore feasibility of updating extlibs, especially minor versions (but sometimes even major version, e.g. guava and icu4j).
> My results so far, tested with regrtest on Linux and Windows 10 using Java8 ("okay" means I did not observe additional regrtest failures):
>
> ASM 5.0.4             -> 5.2     okay
> bouncycastle:
> bcpkix-jdk15on-1.54   -> 1.57    okay
> bcprov-jdk15on-1.54   -> 1.57    okay
>
> commons-compress-1.12 -> 1.14    okay
> guava-20.0            -> 22.0rc1 okay
> icu4j-58.1            -> 59_1    okay
> Netty 4.1.6           -> 4.1.11  okay
> java-sizeof-0.0.5     -- still current
> jffi-1.2.13           -> 1.2.15  okay
> jnr-ffi-2.1.0         -> 2.1.5   okay
> jnr-netdb-1.1.6       -- still current
> jnr-posix-3.0.31      -> 3.0.41  okay
> jnr-constants-0.9.5   -> 0.9.9   okay
> New platforms: jffi-aarch64-Linux.jar, jffi-ppc64le-Linux.jar, can be added...?
> Updated various other changed platform specific jars to jffi-1.2.15 (okay as far as tested)
> xercesImpl-2.11.0          -- still current
> jline-2.14.2               -> 2.14.3      okay (didn't try jline-3 this time)
> jarjar-1.4                 -- still current
> mysql-connector-java-5.1.6 -> 5.1.42      okay
> postgresql-8.3-603.jdbc4   -> 42.1.1-jre7 okay
>
> ----------These failed, so leaving them as it is:
> Antlr 3.1.3                -> 3.5.2  fails
> junit-4.10                 -> 4.12 or 4.11 class file for org.hamcrest.Matcher not found
> (staying with 4.10 for now to avoid new dependency on hamcrest-matcher)
> javax.servlet-api-2.5      -> 3.1.0  fails
> mockrunner      (better don't touch; whole structure changed)
> cpptasks        (better don't touch)
>
> - will be able to test with Java 7 on Tuesday, because I left my old laptop in the office.
> - will upload a fork containing these updates. Would be good if someone else could also test, especially on OSX.
> Maybe some stuff was not covered by regrtests. However the chance that updates solve issues and that they create issues are probably somewhat equal and I'd prefer to focus on fixing issues with current versions rather than older ones.
> So I'm strongly for getting this in if regrtests go through on Java 7 and OSX. 2.7.1RC-phase will be a good opportunity to confirm workability of these updates. Any concerns?
>
> -Stefan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Updating extlibs

Jim Baker-2
These changes look good to me. I will test out your patch, but all of this is in line with similar updates we have made in the past, usually around this time in the dev cycle.

I'm glad that moving to Gradle will make this re-pinning to upstream dependencies much easier going forward!

On Mon, May 22, 2017 at 8:46 AM, Stefan Richthofer <[hidden email]> wrote:
I uploaded the mentioned updates to
https://github.com/Stewori/jython.
See detailed changes at
https://github.com/Stewori/jython/commit/ee2b1263306f779bf8e9499afc6d02267648ac36

This might not yet work smoothly with Java 7, I will check and adjust that tomorrow.
Some packages were built from source using Java 8 and I'm not sure whether the gradle
scripts always configured Java 7 source compatibility properly.
However if some people could test it, especially on OSX, would be nice.

Best

Stefan


> Gesendet: Montag, 22. Mai 2017 um 04:07 Uhr
> Von: "Stefan Richthofer" <[hidden email]>
> An: "Jython Developers" <[hidden email]>
> Betreff: Updating extlibs
>
> Hey all,
> I spent some effort to explore feasibility of updating extlibs, especially minor versions (but sometimes even major version, e.g. guava and icu4j).
> My results so far, tested with regrtest on Linux and Windows 10 using Java8 ("okay" means I did not observe additional regrtest failures):
>
> ASM 5.0.4             -> 5.2     okay
> bouncycastle:
> bcpkix-jdk15on-1.54   -> 1.57    okay
> bcprov-jdk15on-1.54   -> 1.57    okay
>
> commons-compress-1.12 -> 1.14    okay
> guava-20.0            -> 22.0rc1 okay
> icu4j-58.1            -> 59_1    okay
> Netty 4.1.6           -> 4.1.11  okay
> java-sizeof-0.0.5     -- still current
> jffi-1.2.13           -> 1.2.15  okay
> jnr-ffi-2.1.0         -> 2.1.5   okay
> jnr-netdb-1.1.6       -- still current
> jnr-posix-3.0.31      -> 3.0.41  okay
> jnr-constants-0.9.5   -> 0.9.9   okay
> New platforms: jffi-aarch64-Linux.jar, jffi-ppc64le-Linux.jar, can be added...?
> Updated various other changed platform specific jars to jffi-1.2.15 (okay as far as tested)
> xercesImpl-2.11.0          -- still current
> jline-2.14.2               -> 2.14.3      okay (didn't try jline-3 this time)
> jarjar-1.4                 -- still current
> mysql-connector-java-5.1.6 -> 5.1.42      okay
> postgresql-8.3-603.jdbc4   -> 42.1.1-jre7 okay
>
> ----------These failed, so leaving them as it is:
> Antlr 3.1.3                -> 3.5.2  fails
> junit-4.10                 -> 4.12 or 4.11 class file for org.hamcrest.Matcher not found
> (staying with 4.10 for now to avoid new dependency on hamcrest-matcher)
> javax.servlet-api-2.5      -> 3.1.0  fails
> mockrunner      (better don't touch; whole structure changed)
> cpptasks        (better don't touch)
>
> - will be able to test with Java 7 on Tuesday, because I left my old laptop in the office.
> - will upload a fork containing these updates. Would be good if someone else could also test, especially on OSX.
> Maybe some stuff was not covered by regrtests. However the chance that updates solve issues and that they create issues are probably somewhat equal and I'd prefer to focus on fixing issues with current versions rather than older ones.
> So I'm strongly for getting this in if regrtests go through on Java 7 and OSX. 2.7.1RC-phase will be a good opportunity to confirm workability of these updates. Any concerns?
>
> -Stefan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Alan Kennedy-2
In reply to this post by Darjus Loktevic
Hi folks,

Great to see a solid 2.7.1 jython, and work begin in earnest on jython 3.

I have only one small suggestion to make: if jython 2.7.1 is going to be one of the last 2.7 releases, maybe consider numbering it in a way that indicates it is derived from the latest version of cpython 2.7.12. This could indicate that it is as up-to-date as it can be, i.e. not derived from cpython 2.7.1 and then abandoned.

Perception of abandonment is often a problem for jython: I think it's worth an effort to counter this mis-perception.

Regards,

Alan.


On Sun, May 21, 2017 at 4:35 PM, Darjus Loktevic <[hidden email]> wrote:
Hey Guys,

Regarding Jython3, looks like Isaiah has done a ton of work in 2016 (CCd). Not sure how far he progressed, but indeed merging will be hard and therefore I'd say we should not diverge further while developing on both branches, but instead try to finalize 2.7 and switch to Jython3 full-time.

Feel free to disagree, but here's my thinking on it:
  1. Release Jython 2.7.1
  2. Modernize the codebase. I think it's important for the project to feel modern for us to attract new contributors.
    1. Java8 as the minimum (may be too much for Jython2).
    2. Github/core-workflow
    3. (Ideally) ANTLR4 for both branches, but worst case, Jython3 only. ANTLR3 is not getting much love and ANTLR4 is quite different (does not generate AST).
    4. Gradle, directory structure.
  3. Develop Jython3 primarily. Only bugfixes for 2.7 series.
    1. Target 3.6 (really like the typing improvements).
    2. Merge JyNI if possible.
Cheers,
Darjus

On Sat, May 20, 2017 at 11:45 PM Jeff Allen <[hidden email]> wrote:

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff

Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Jim Baker-2
Alan,

That's a great suggestion. 2.7 was specifically chosen to show this correspondence. In the past, we were not so as focused on compatibility, but most of the changes — and corresponding delays — in what we have been planning to call 2.7.1 are because of the continued development on CPython 2.7, by backporting fixes from successive versions of CPython 3.

So calling it 2.7.12 helps illustrate this. Any other thoughts on Alan's proposal?

- Jim

On Mon, May 22, 2017 at 11:25 AM, Alan Kennedy <[hidden email]> wrote:
Hi folks,

Great to see a solid 2.7.1 jython, and work begin in earnest on jython 3.

I have only one small suggestion to make: if jython 2.7.1 is going to be one of the last 2.7 releases, maybe consider numbering it in a way that indicates it is derived from the latest version of cpython 2.7.12. This could indicate that it is as up-to-date as it can be, i.e. not derived from cpython 2.7.1 and then abandoned.

Perception of abandonment is often a problem for jython: I think it's worth an effort to counter this mis-perception.

Regards,

Alan.


On Sun, May 21, 2017 at 4:35 PM, Darjus Loktevic <[hidden email]> wrote:
Hey Guys,

Regarding Jython3, looks like Isaiah has done a ton of work in 2016 (CCd). Not sure how far he progressed, but indeed merging will be hard and therefore I'd say we should not diverge further while developing on both branches, but instead try to finalize 2.7 and switch to Jython3 full-time.

Feel free to disagree, but here's my thinking on it:
  1. Release Jython 2.7.1
  2. Modernize the codebase. I think it's important for the project to feel modern for us to attract new contributors.
    1. Java8 as the minimum (may be too much for Jython2).
    2. Github/core-workflow
    3. (Ideally) ANTLR4 for both branches, but worst case, Jython3 only. ANTLR3 is not getting much love and ANTLR4 is quite different (does not generate AST).
    4. Gradle, directory structure.
  3. Develop Jython3 primarily. Only bugfixes for 2.7 series.
    1. Target 3.6 (really like the typing improvements).
    2. Merge JyNI if possible.
Cheers,
Darjus

On Sat, May 20, 2017 at 11:45 PM Jeff Allen <[hidden email]> wrote:

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff

Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Stefan Richthofer
If we move away from 2.7.1 at all, the micro version should match the point where we updated python-lib last time IMO. I doubt it is 2.7.12. Maybe it's 2.7.6 or so I suspect (sorry if I should be wrong). Does someone now an efficient way to look it up?
A different numbering scheme for marketing purposes would be misleading and might disappoint users even more.
 
Also. I don't find Jython 2.7.1 should be the last Jython 2.7 or likewise. There will continue to be (maybe minor) progress and we should release this based on time intervals (6months was the plan, wasn't it?). IMO it's not so important that huge progress happens from version to version. Progress from 2.7.0 to 2.7.1 is actually far too large. Much more important is that there is progress at all and that it's displayed to the community by frequent releases.
 
-Stefan
 
Gesendet: Montag, 22. Mai 2017 um 19:50 Uhr
Von: "Jim Baker" <[hidden email]>
An: "Alan Kennedy" <[hidden email]>
Cc: "Darjus Loktevic" <[hidden email]>, "Jeff Allen" <[hidden email]>, "Stefan Richthofer" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)
Alan,
 
That's a great suggestion. 2.7 was specifically chosen to show this correspondence. In the past, we were not so as focused on compatibility, but most of the changes — and corresponding delays — in what we have been planning to call 2.7.1 are because of the continued development on CPython 2.7, by backporting fixes from successive versions of CPython 3.
 
So calling it 2.7.12 helps illustrate this. Any other thoughts on Alan's proposal?
 
- Jim
 
On Mon, May 22, 2017 at 11:25 AM, Alan Kennedy <[hidden email]> wrote:
Hi folks,
 
Great to see a solid 2.7.1 jython, and work begin in earnest on jython 3.
 
I have only one small suggestion to make: if jython 2.7.1 is going to be one of the last 2.7 releases, maybe consider numbering it in a way that indicates it is derived from the latest version of cpython 2.7.12. This could indicate that it is as up-to-date as it can be, i.e. not derived from cpython 2.7.1 and then abandoned.
 
Perception of abandonment is often a problem for jython: I think it's worth an effort to counter this mis-perception.
 
Regards,
 
Alan.
 
 
On Sun, May 21, 2017 at 4:35 PM, Darjus Loktevic <[hidden email]> wrote:
Hey Guys,
 
Regarding Jython3, looks like Isaiah has done a ton of work in 2016 (CCd). Not sure how far he progressed, but indeed merging will be hard and therefore I'd say we should not diverge further while developing on both branches, but instead try to finalize 2.7 and switch to Jython3 full-time.
 
Feel free to disagree, but here's my thinking on it:
  1. Release Jython 2.7.1
  2. Modernize the codebase. I think it's important for the project to feel modern for us to attract new contributors.
    1. Java8 as the minimum (may be too much for Jython2).
    2. Github/core-workflow
    3. (Ideally) ANTLR4 for both branches, but worst case, Jython3 only. ANTLR3 is not getting much love and ANTLR4 is quite different (does not generate AST).
    4. Gradle, directory structure.
  3. Develop Jython3 primarily. Only bugfixes for 2.7 series.
    1. Target 3.6 (really like the typing improvements).
    2. Merge JyNI if possible.
Cheers,
Darjus
 
On Sat, May 20, 2017 at 11:45 PM Jeff Allen <[hidden email]> wrote:

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff
 
Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100
 
On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.
 
Let's get Jeff's work in and do an RC?
 
Darjus
 
On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)
 
Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.
 
The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570
 
 
On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
 
 
 
 
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
 
 
 
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode user and file names (and v2.7.1)

Javen O'Neal
In reply to this post by Jim Baker-2
+0.5

Calling it 2.7.12 implicitly claims that it is 100% compatible with CPython 2.7.12. If you need to make a small security fix or bug fix, would it be released as 2.7.12.1 or 2.7.13?

On May 22, 2017 10:51, "Jim Baker" <[hidden email]> wrote:
Alan,

That's a great suggestion. 2.7 was specifically chosen to show this correspondence. In the past, we were not so as focused on compatibility, but most of the changes — and corresponding delays — in what we have been planning to call 2.7.1 are because of the continued development on CPython 2.7, by backporting fixes from successive versions of CPython 3.

So calling it 2.7.12 helps illustrate this. Any other thoughts on Alan's proposal?

- Jim

On Mon, May 22, 2017 at 11:25 AM, Alan Kennedy <[hidden email]> wrote:
Hi folks,

Great to see a solid 2.7.1 jython, and work begin in earnest on jython 3.

I have only one small suggestion to make: if jython 2.7.1 is going to be one of the last 2.7 releases, maybe consider numbering it in a way that indicates it is derived from the latest version of cpython 2.7.12. This could indicate that it is as up-to-date as it can be, i.e. not derived from cpython 2.7.1 and then abandoned.

Perception of abandonment is often a problem for jython: I think it's worth an effort to counter this mis-perception.

Regards,

Alan.


On Sun, May 21, 2017 at 4:35 PM, Darjus Loktevic <[hidden email]> wrote:
Hey Guys,

Regarding Jython3, looks like Isaiah has done a ton of work in 2016 (CCd). Not sure how far he progressed, but indeed merging will be hard and therefore I'd say we should not diverge further while developing on both branches, but instead try to finalize 2.7 and switch to Jython3 full-time.

Feel free to disagree, but here's my thinking on it:
  1. Release Jython 2.7.1
  2. Modernize the codebase. I think it's important for the project to feel modern for us to attract new contributors.
    1. Java8 as the minimum (may be too much for Jython2).
    2. Github/core-workflow
    3. (Ideally) ANTLR4 for both branches, but worst case, Jython3 only. ANTLR3 is not getting much love and ANTLR4 is quite different (does not generate AST).
    4. Gradle, directory structure.
  3. Develop Jython3 primarily. Only bugfixes for 2.7 series.
    1. Target 3.6 (really like the typing improvements).
    2. Merge JyNI if possible.
Cheers,
Darjus

On Sat, May 20, 2017 at 11:45 PM Jeff Allen <[hidden email]> wrote:

Thanks all. +1 on the RC. Nearly there with my bit.

I have fixed the  test_runpy failure James reported. It's not Linux-specific, just I had to quieten the unlink() error to see it on Windows. Bonus: we now pass the standard CPython test_runpy. The regrtest has been running one last time as I typed. I've pushed to https://bitbucket.org/tournesol/jython-utf8 just now.

I will next merge into the Jython trunk. That may not be totally smooth because of the pervasive change. And now I think about it, it's worth a note in NEWS. My time is a little limited today, so it could be much later today or tomorrow evening.

Jeff

Jeff Allen
On 20/05/2017 19:48, Jim Baker wrote:
+100

On Sat, May 20, 2017 at 12:33 PM, Darjus Loktevic <[hidden email]> wrote:
Agreed regarding not blocking on 2487. That whole area needs a rewrite and we could potentially utilize libraries available for Java 8.

Let's get Jeff's work in and do an RC?

Darjus

On Fri, May 19, 2017 at 7:51 PM Jim Baker <[hidden email]> wrote:
I don't necessarily see http://bugs.jython.org/issue2487 as a blocker, but it would be nice. It just hasn't come up in real usage, unlike the earlier iteration of the bug which Darjus hacked around by busy waiting. I did spend some time on trying to get the publication to work without racing, using the approach I detail in that bug, but no luck yet. (But mostly because an utter lack of time to spend on the issue.)

Merging in Jeff's recent work on Unicode is important and we should get it in. I haven't had a chance to test myself, but given Jeff's amazing attention to detail, I'm sure it's ready.

The blocker for the RC - because we lost OSX support of setuptools support of installed executables - is fixed, as I just finally confirmed: http://bugs.jython.org/issue2570


On Fri, May 19, 2017 at 8:26 PM, Stefan Richthofer <[hidden email]> wrote:
AFAIK every release happens by having a successful RC that is renamed to 'release' after a while. So, per definition another RC is inevitable.
That said, I suppose we should get http://bugs.jython.org/issue2487 fixed before we can release. I guess Jeff's work will be ready until then. At least that decision can be postponed until an RC is actually doable.
 
 
Gesendet: Samstag, 20. Mai 2017 um 02:35 Uhr
Von: "Darjus Loktevic" <[hidden email]>
An: "Jeff Allen" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Unicode user and file names (and v2.7.1)

Hey Jeff,

Sounds good. Let's do another rc but to be honest I'm not even sure the RC matters much if there aren't people trying it except us.

Thoughts?
Darjus

 
On Fri, May 19, 2017, 1:19 AM Jeff Allen <[hidden email]> wrote:

Hi Darjus.

On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support.

I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release?

The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow.

I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days.

I will now dive under the desk and wire up my Linux dev box.

Jeff Allen
On 16/05/2017 21:46, Darjus Loktevic wrote:
Hey Jeff,
 
It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me.
I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0.
 
Cheers,
Darjus
 
On Mon, May 1, 2017 at 6:34 AM Jeff Allen <[hidden email]> wrote:
I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty
well. Rather than just push directly I have published to here:

https://bitbucket.org/tournesol/jython-utf8

I write to ask for a second or third pair of eyes on it. Please tell me
you can see it and whether it breaks things you care about.

I touched a lot of files in the core and import system: quite a lot of
tricky stuff with loaders and search paths has been adjusted. I think it
a good sign that I changed hardly anything in the standard library we
inherit from CPython, that we hadn't already specialised.

By "works pretty well" above, I mean that the regression tests run
cleanly for me when my user name is "Épreuve", where previously Jython
died horribly. The launcher works from a Chinese user name too, as long
as I localise Windows to China (CPython 2.7 feature). I can use the
prompt and runs some tests with that setup, but I can't run the
regression test yet, and printing a stack dump is fatal, so there's a
bit more to do for Chinese.

I think this means we have solid support for "latin-1" languages, but
there are still places where we fatally assume bytes are Unicode code
points.

Jeff Allen

On 05/04/2017 08:57, Jeff Allen wrote:
> I've been working on http://bugs.jython.org/issue2356 which I'd like to
> get in 2.7.1 -- it seems rather poor that Jython simply does not run for
> users whose names have an un-American character ;). I know this issue is
> not a blocker in most minds.
>
> I've made pretty good progress by allowing file names to be unicode
> objects more often than they would be in CPython 2, which usually
> returns them as bytes in some encoding that we may not know. I've got
> the launcher to work properly, and straightened the logic in our
> printing of trace-backs and exceptions from Java. Unicode file names
> seems the way to go for Jython because:
>
>   1. Java gives us competently decoded unicode file names, from
>      java.io.File, etc.. Re-encoding the result will be a pain (and
>      overlooked).
>   2. We appear not to have the codec we need ('mbcs'), that CPython
>      reports on Windows via sys.getfilesystemencoding().
>   3. We do this already. In 2.7.0, os.getcwd() returns unicode if necessary.
>
> Most regression tests pass. However, I'm struggling with test_doctest.
> Problems arise when mixing unicode and bytes when one byte is 128 and
> over. This happens in ''.join(list) and formatted output like "%s %s" %
> (ustr, bstr). The behaviour of these is identical with CPython: they
> raise UnicodeDecodeError because the bytes are promoted to characters
> with a strict ascii interpretation. This happens a lot in doctest.py and
> traceback.py, for example, where file paths and stack dumps that include
> them, are now frequently unicode, while other inputs are byte data
> containing file paths presented in the console encoding.
>
> I can beat this into submission with enough customisation of the stdlib
> modules, but that always makes me uncomfortable. I usually see that as a
> hint that user code might also need to change. This may be unfounded. I
> can probably ensure no impact to users of only ascii paths, and the
> others seem unable to run Jython at all (in the scope of this issue).
> However, I'm seriously wondering if I should pursue the approach where
> file names from Java are re-encoded to bytes (maybe as utf-8
> everywhere), but that's grim.
>
> Thoughts?
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
12
Loading...