Using Python instead of Bash

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
I help someone that has problems reading. For this I take photo's of
text, use convert from ImageMagick to make a good contrast (original
paper is grey) and use lpr to print it a little bigger.

Normally I would implement this in Bash, but I thought it a good idea
to implement it in Python. This is my first try:
    import glob
    import subprocess

    treshold = 66
    count = 0
    for input in sorted(glob.glob('*.JPG')):
        count += 1
        output = '{0:02d}.png'.format(count)
        print('Going to convert {0} to {1}'.format(input, output))
        p = subprocess.Popen(['convert', '-threshold', '{0}%'.format(treshold), input, output])
        p.wait()
        print('Going to print {0}'.format(output))
        p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output])
        p.wait()

There have to be some improvements: display before printing,
possibility to change threshold, ? But is this a good start, or should
I do it differently?

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Alain Ketterlin-2
Cecil Westerhof <Cecil at decebal.nl> writes:

> I help someone that has problems reading. For this I take photo's of
> text, use convert from ImageMagick to make a good contrast (original
> paper is grey) and use lpr to print it a little bigger.

>     import glob
>     import subprocess
>
>     treshold = 66
>     count = 0
>     for input in sorted(glob.glob('*.JPG')):
>         count += 1
>         output = '{0:02d}.png'.format(count)
>         print('Going to convert {0} to {1}'.format(input, output))
>         p = subprocess.Popen(['convert', '-threshold', '{0}%'.format(treshold), input, output])
>         p.wait()
>         print('Going to print {0}'.format(output))
>         p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output])
>         p.wait()

Maybe using check_call() would be simpler, but it will, well, check for
the exit code of convert/lpr (which you should do anyway). And I would
call that variable "threshold" instead of "treshold".

(What I don't see is the advantage you find in writing such scripts in
python instead of sh, but I guess you have your own reasons.)

-- Alain.


Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cem Karan
In reply to this post by Cecil Westerhof
> I help someone that has problems reading. For this I take photo's of
> text, use convert from ImageMagick to make a good contrast (original
> paper is grey) and use lpr to print it a little bigger.
>
> Normally I would implement this in Bash, but I thought it a good idea
> to implement it in Python. This is my first try:
>    import glob
>    import subprocess
>
>    treshold = 66
>    count = 0
>    for input in sorted(glob.glob('*.JPG')):
>        count += 1
>        output = '{0:02d}.png'.format(count)
>        print('Going to convert {0} to {1}'.format(input, output))
>        p = subprocess.Popen(['convert', '-threshold', '{0}%'.format(treshold), input, output])
>        p.wait()
>        print('Going to print {0}'.format(output))
>        p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output])
>        p.wait()
>
> There have to be some improvements: display before printing,
> possibility to change threshold, ? But is this a good start, or should
> I do it differently?


As a first try, I think its pretty good, but to really answer your question, I think we could use a little more information.  

- Are you using python 2, or python 3?  There are slightly easier ways to do this using concurrent.futures objects, but they are only available under python 3. (See https://docs.python.org/3/library/concurrent.futures.html)

- In either case, subprocess.call(), subprocess.check_call(), or subprocess.check_output() may be easier to use.  That said, your code is perfectly fine!  The only real difference is that subprocess.call() will automatically wait for the call to complete, so you don't need to use p.wait() from above.  (See https://docs.python.org/2.7/library/subprocess.html, and https://docs.python.org/3/library/subprocess.html)



The following codes does the conversion in parallel, and submits the jobs to the printer serially.  That should ensure that the printed output is also in sorted order, but you might want to double check before relying on it too much.  The major problem with it is that you can't display the output before printing; since everything is running in parallel, you'll have race conditions if you try.  **I DID NOT TEST THIS CODE, I JUST TYPED IT OUT IN MY MAIL CLIENT!**  Please test it carefully before relying on it!

"""
import subprocess
import concurrent.futures
import glob
import os.path

_THRESHOLD = 66

def _collect_filenames():
    files = glob.glob('*.JPG')

    # I build a set of the real paths so that if you have
    # symbolic links that all point to the same file, they
    # they are automatically collapsed to a single file
    real_files = {os.path.realpath(x) for x in files}
    base_files = [os.path.splitext(x)[0] for x in real_files]
    return base_files

def _convert(base_file_name):
    """
    This code is slightly different from your code.  Instead
    of using numbers as names, I use the base name of file and
    append '.png' to it.  You may need to adjust this to ensure
    you don't overwrite anything.
    """
    input = base_file_name + ".JPG"
    output = base_file_name + ".png"
    subprocess.call(['convert', '-threshold', '{0}%'.format(_THRESHOLD), input, output])

def _print_files_in_order(base_files):
    base_files.sort()
    for f in base_files:
        output = f + ".png"
        subprocess.call(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output])

def driver():
    base_files = _collect_filenames()

    # If you use an executor as a context manager, then the
    # executor will wait until all of the submitted jobs finish
    # before it returns.  The submitted jobs will execute in
    # parallel.
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for f in base_files:
            executor.submit(_convert_and_print, f)

    _print_files_in_order(base_files)
"""

Thanks,
Cem Karan

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
In reply to this post by Alain Ketterlin-2
Op Sunday 31 May 2015 16:02 CEST schreef Alain Ketterlin:

> Cecil Westerhof <Cecil at decebal.nl> writes:
>
>> I help someone that has problems reading. For this I take photo's
>> of text, use convert from ImageMagick to make a good contrast
>> (original paper is grey) and use lpr to print it a little bigger.
>
>> import glob
>> import subprocess
>>
>> treshold = 66 count = 0 for input in sorted(glob.glob('*.JPG')):
>> count += 1 output = '{0:02d}.png'.format(count) print('Going to
>> convert {0} to {1}'.format(input, output)) p =
>> subprocess.Popen(['convert', '-threshold', '{0}%'.format(treshold),
>> input, output]) p.wait() print('Going to print {0}'.format(output))
>> p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4',
>> output]) p.wait()
>
> Maybe using check_call() would be simpler, but it will, well, check
> for the exit code of convert/lpr (which you should do anyway).

That is a lot better yes. I did not like that I needed two calls.


> And I would call that variable "threshold" instead of "treshold".

Yep, I saw the mistake after posting. :-(


> (What I don't see is the advantage you find in writing such scripts
> in python instead of sh, but I guess you have your own reasons.)

Several.

The first is that I want to get some Python experience.

But later on I want to display the converted photo and if it is not
correct, change the threshold and convert it again with a new
threshold. 66% is good most of the time, but not always. Sometimes it
has to be bigger or smaller.

And after that I want to put a GUI around it.

So, that is why I am using Python.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Ethan Furman-2
In reply to this post by Cecil Westerhof
Just for funsies, I did a slight rewrite using my scription[1] library:

--8<------------------------------------------------------------------------------
#!/usr/bin/python3
"""
Convert scanned images: contrast is increased, size is enlarged,
format is changed from jpeg to png.
"""

from __future__ import print_function   # only needed if running in 2
from glob import glob
from scription import *

@Command(
         threshold=Spec('cutoff value for white vs black', OPTION, type=int, default=66),
         verify=Spec('confirm conversion by viewing on screen', FLAG),
         )
def image_convert(verify, threshold):
     for count, input in enumerate(sorted(glob('*.JPG')), start=1):
         output = '{0:02d}.png'.format(count)
         # only prints if --verbose on commandline
         print('Going to convert {} to {}'.format(input, output))
         while True:
             # Execute automatically parses into a list
             attempt = Execute(
                     'convert -threshold {}% {} {}'.format(threshold, input, output),
                     interactive='echo',
                     )
             if attempt.returncode:
                 raise SystemExit(attempt.returncode)
             if not verify:
                 break
             attempt = Execute(
                     'display_the_png {}'.format(output),
                     interactive='echo',
                     )
             if attempt.returncode:
                 raise SystemExit(attempt.returncode)
             if get_response('Does the image look good?'):
                 break
             threshold = get_response("New threshold value:", type=int)
         # conversion looks good, or manual verification skipped
         print('going to print {0}'.format(output))
         attempt = Execute(
                 'lpr -o fit-to-page -o media=A4 {}'.format(output),
                 interactive='echo',
                 )
         if attempt.returncode:
             raise SystemExit(attempt.returncode)

Main()
--8<------------------------------------------------------------------------------

Opinions about the usability of the above script, both as the script writer and as the user, welcomed.

--
~Ethan~


[1] yes, everyone apparently writes their own command-line processor -- this one is mine.  ;)

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Larry Hudson
In reply to this post by Cecil Westerhof
On 05/31/2015 05:42 AM, Cecil Westerhof wrote:
> I help someone that has problems reading. For this I take photo's of
> text, use convert from ImageMagick to make a good contrast (original
> paper is grey) and use lpr to print it a little bigger.
>
I''m wondering why you bother to take a photo, which then has to be adjusted for quality.  A
screen-capture program is much easier and immediately gives you a perfect(?) starting image.

Linux Mint that I use has the program 'gnome-screenshot' (in the main menu under Accessories as
'Screenshot').  It gives you the options to capture the whole screen, the current active window,
or any arbitrary rectangular area of the screen.  It saves it as a .png and allows you to
specify the path/filename to save it.  I'm sure there are many other equivalent programs
available as well, and this one or others are likely available by default in other Linux
distros.  Or easily installed if not.

Of course, this just gives you the original 'raw' image that you can then process further as
necessary.  But this is _FAR_ easier and better quality than the round-about method of using a
camera.

      -=- Larry -=-


Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Jussi Piitulainen
Larry Hudson writes:

> On 05/31/2015 05:42 AM, Cecil Westerhof wrote:
>> I help someone that has problems reading. For this I take photo's of
>> text, use convert from ImageMagick to make a good contrast (original
>> paper is grey) and use lpr to print it a little bigger.
>>
> I''m wondering why you bother to take a photo, which then has to be
> adjusted for quality.  A screen-capture program is much easier and
> immediately gives you a perfect(?) starting image.

"paper"

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

alister
On Mon, 01 Jun 2015 11:06:33 +0300, Jussi Piitulainen wrote:

> Larry Hudson writes:
>
>> On 05/31/2015 05:42 AM, Cecil Westerhof wrote:
>>> I help someone that has problems reading. For this I take photo's of
>>> text, use convert from ImageMagick to make a good contrast (original
>>> paper is grey) and use lpr to print it a little bigger.
>>>
>> I''m wondering why you bother to take a photo, which then has to be
>> adjusted for quality.  A screen-capture program is much easier and
>> immediately gives you a perfect(?) starting image.
>
> "paper"

Have you looked at using OCR software combined with a scanner?
I have used tesseract in the past with very god results.


--
Eloquence is logic on fire.

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
In reply to this post by Larry Hudson
Op Monday 1 Jun 2015 09:56 CEST schreef Larry Hudson:

> On 05/31/2015 05:42 AM, Cecil Westerhof wrote:
>> I help someone that has problems reading. For this I take photo's
>> of text, use convert from ImageMagick to make a good contrast
>> (original paper is grey) and use lpr to print it a little bigger.
>>
> I''m wondering why you bother to take a photo, which then has to be
> adjusted for quality.  A screen-capture program is much easier and
> immediately gives you a perfect(?) starting image.

I am amazed about the technical progress: I did not know it was
possible to take a screen-shot from a booklet.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Marko Rauhamaa
Cecil Westerhof <Cecil at decebal.nl>:

> I am amazed about the technical progress: I did not know it was
> possible to take a screen-shot from a booklet.

Well, take a look at this video that demonstrates the possibilities of
the technology:

   https://www.youtube.com/watch?v=MOXQo7nURs0


Marko

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
In reply to this post by alister
Op Monday 1 Jun 2015 11:07 CEST schreef alister:

> On Mon, 01 Jun 2015 11:06:33 +0300, Jussi Piitulainen wrote:
>
>> Larry Hudson writes:
>>
>>> On 05/31/2015 05:42 AM, Cecil Westerhof wrote:
>>>> I help someone that has problems reading. For this I take photo's
>>>> of text, use convert from ImageMagick to make a good contrast
>>>> (original paper is grey) and use lpr to print it a little bigger.
>>>>
>>> I''m wondering why you bother to take a photo, which then has to
>>> be adjusted for quality. A screen-capture program is much easier
>>> and immediately gives you a perfect(?) starting image.
>>
>> "paper"
>
> Have you looked at using OCR software combined with a scanner?
> I have used tesseract in the past with very god results.

Does not work because it is not just straight text, there are headings
with text floating around it.

Also the funny thing is that I first scanned it. But it gave several
problems. One of them was that it is a format between A4 and A5.
Taking pictures is faster and gives better results.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Grant Edwards-7
In reply to this post by Cecil Westerhof
On 2015-05-31, Cecil Westerhof <Cecil at decebal.nl> wrote:

> I help someone that has problems reading. For this I take photo's of
> text, use convert from ImageMagick to make a good contrast (original
> paper is grey) and use lpr to print it a little bigger.
>
> Normally I would implement this in Bash, but I thought it a good idea
> to implement it in Python.

Why?  Is it difficult to do/maintain in Bash?

If you want to write Python, then you should write Python.  You're
still writing Bash, so you should probably do it in Bash.  If you just
want to invoke a set of external programs on a set of files, then Bash
is probably the right language to use: that's pretty much what Bash is
for: manipulating files by invoking other programs.

> This is my first try:
>     import glob
>     import subprocess
>
>     treshold = 66
>     count = 0
>     for input in sorted(glob.glob('*.JPG')):
>         count += 1
>         output = '{0:02d}.png'.format(count)
>         print('Going to convert {0} to {1}'.format(input, output))
>         p = subprocess.Popen(['convert', '-threshold', '{0}%'.format(treshold), input, output])
>         p.wait()
>         print('Going to print {0}'.format(output))
>         p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output])
>         p.wait()
>
> There have to be some improvements: display before printing,
> possibility to change threshold, ? But is this a good start, or
> should I do it differently?

If all the work is being done by making a series of calls to
subprocess, then you should think about using bash instead.

If you really do want to do this in Python, you can use a Python
binding to the ImageMagick libraries:

  https://wiki.python.org/moin/ImageMagick
  http://www.imagemagick.org/download/python/
  http://stackoverflow.com/questions/7895278/can-i-access-imagemagick-api-with-python

Or maybe you can use Pillow:

  http://pillow.readthedocs.org/
  https://github.com/python-pillow/Pillow 
  https://python-pillow.github.io/
 
--
Grant Edwards               grant.b.edwards        Yow! Of course, you
                                  at               UNDERSTAND about the PLAIDS
                              gmail.com            in the SPIN CYCLE --

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Laura Creighton-2
In reply to this post by Cecil Westerhof
In a message of Mon, 01 Jun 2015 12:44:34 +0200, Cecil Westerhof writes:
>Also the funny thing is that I first scanned it. But it gave several
>problems. One of them was that it is a format between A4 and A5.
>Taking pictures is faster and gives better results.
>
>--
>Cecil Westerhof
>Senior Software Engineer
>LinkedIn: http://www.linkedin.com/in/cecilwesterhof

My irony detector may be on the fritz today, but, well, if you run into
some weird format between A4 and A5, you probably have a USA size.
see: http://en.wikipedia.org/wiki/Paper_size#North_American_paper_sizes

Most scanning programs have a way of specifying what paper size you want,
though those made in the USA have a natural tendancy to default to the
wierd American sizes.

Apologies if you already know all about this ...

Laura

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
In reply to this post by Cecil Westerhof
Op Monday 1 Jun 2015 17:44 CEST schreef Laura Creighton:

> In a message of Mon, 01 Jun 2015 12:44:34 +0200, Cecil Westerhof
> writes:
>> Also the funny thing is that I first scanned it. But it gave
>> several problems. One of them was that it is a format between A4
>> and A5. Taking pictures is faster and gives better results.
>>
>> --
>> Cecil Westerhof
>> Senior Software Engineer
>> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
>
> My irony detector may be on the fritz today, but, well, if you run
> into some weird format between A4 and A5, you probably have a USA
> size. see:
> http://en.wikipedia.org/wiki/Paper_size#North_American_paper_sizes

Nope, the dimensions are 177 mm x 227 mm.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Laura Creighton-2
In a message of Mon, 01 Jun 2015 18:44:01 +0200, Cecil Westerhof writes:

>Op Monday 1 Jun 2015 17:44 CEST schreef Laura Creighton:
>
>> In a message of Mon, 01 Jun 2015 12:44:34 +0200, Cecil Westerhof
>> writes:
>>> Also the funny thing is that I first scanned it. But it gave
>>> several problems. One of them was that it is a format between A4
>>> and A5. Taking pictures is faster and gives better results.
>>>
>>> --
>>> Cecil Westerhof
>>> Senior Software Engineer
>>> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
>>
>> My irony detector may be on the fritz today, but, well, if you run
>> into some weird format between A4 and A5, you probably have a USA
>> size. see:
>> http://en.wikipedia.org/wiki/Paper_size#North_American_paper_sizes
>
>Nope, the dimensions are 177 mm x 227 mm.
>
>--
>Cecil Westerhof
>Senior Software Engineer
>LinkedIn: http://www.linkedin.com/in/cecilwesterhof
>--
>https://mail.python.org/mailman/listinfo/python-list

Truly?  That's (very close to) 7 inch by 9 inch, 177.8 mm x 228.6 mm
and 7 by 9 is what pre-metric Britian called 'Small Post Quarto'.
I wonder if this merely a coincidence, or does some software really
still like this size?  How very weird.

Laura

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
In reply to this post by Cecil Westerhof
Op Monday 1 Jun 2015 20:42 CEST schreef Laura Creighton:

> In a message of Mon, 01 Jun 2015 18:44:01 +0200, Cecil Westerhof
> writes:
>> Op Monday 1 Jun 2015 17:44 CEST schreef Laura Creighton:
>>
>>> In a message of Mon, 01 Jun 2015 12:44:34 +0200, Cecil Westerhof
>>> writes:
>>>> Also the funny thing is that I first scanned it. But it gave
>>>> several problems. One of them was that it is a format between A4
>>>> and A5. Taking pictures is faster and gives better results.
>>>>
>>>> --
>>>> Cecil Westerhof
>>>> Senior Software Engineer
>>>> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
>>>
>>> My irony detector may be on the fritz today, but, well, if you run
>>> into some weird format between A4 and A5, you probably have a USA
>>> size. see:
>>> http://en.wikipedia.org/wiki/Paper_size#North_American_paper_sizes
>>
>> Nope, the dimensions are 177 mm x 227 mm.
>>
>> --
>> Cecil Westerhof
>> Senior Software Engineer
>> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>
> Truly?  That's (very close to) 7 inch by 9 inch, 177.8 mm x 228.6 mm
> and 7 by 9 is what pre-metric Britian called 'Small Post Quarto'.
> I wonder if this merely a coincidence, or does some software really
> still like this size?  How very weird.

Well, it is possible I did not measure correctly. ;-) It is a 32 page
booklet.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Laura Creighton-2
In a message of Mon, 01 Jun 2015 21:57:05 +0200, Cecil Westerhof writes:

>> Truly?  That's (very close to) 7 inch by 9 inch, 177.8 mm x 228.6 mm
>> and 7 by 9 is what pre-metric Britian called 'Small Post Quarto'.
>> I wonder if this merely a coincidence, or does some software really
>> still like this size?  How very weird.
>
>Well, it is possible I did not measure correctly. ;-) It is a 32 page
>booklet.
>
>--
>Cecil Westerhof
>Senior Software Engineer
>LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Ah, I have been misunderstanding the question.  It's the booklet itself
that is 7 by 9, not the rasters (or whatever) you get when you first
scan the thing.  Sorry about that.

Laura

Reply | Threaded
Open this post in threaded view
|

Using Python instead of Bash

Cecil Westerhof
In reply to this post by Cecil Westerhof
Op Monday 1 Jun 2015 22:42 CEST schreef Laura Creighton:

> In a message of Mon, 01 Jun 2015 21:57:05 +0200, Cecil Westerhof
> writes:
>>> Truly? That's (very close to) 7 inch by 9 inch, 177.8 mm x 228.6
>>> mm and 7 by 9 is what pre-metric Britian called 'Small Post
>>> Quarto'. I wonder if this merely a coincidence, or does some
>>> software really still like this size? How very weird.
>>
>> Well, it is possible I did not measure correctly. ;-) It is a 32
>> page booklet.
>>
>> --
>> Cecil Westerhof
>> Senior Software Engineer
>> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
>
> Ah, I have been misunderstanding the question. It's the booklet
> itself that is 7 by 9, not the rasters (or whatever) you get when
> you first scan the thing. Sorry about that.

No problem.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof