Quantcast

HTML to printable file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

HTML to printable file

Tony Cappellini-2

Have any of you used or written a tool which will convert a web page into a file that
is cleanly paginated and printable?

I need to print out some documentation which was only written in HTML.
It's more convenient to have paper docs you can make notes on and use when you're away from a wifi connection.

Below is just one of many pages from http://dangerousprototypes.com/docs which I'd like to get into a printable format


The Python Weekly archives is another site I'd like to be able to download and have available offline
(although not necessarily printable)



I'm hoping I don't need to reinvent the wheel.

Thanks


_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Monte Davidoff
Hi Tony,

On 3/29/12 11:04 PM, Tony Cappellini wrote:
> Have any of you used or written a tool which will convert a web page
> into a file that
> is cleanly paginated and printable?

Perhaps I misunderstood the problem. The tool I've used for this task is
my web browser (Firefox, Safari). On a Mac, I use Print > Save as PDF to
get the output into a file.

Monte
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Ian Zimmerman-2
In reply to this post by Tony Cappellini-2

Tony> I'm hoping I don't need to reinvent the wheel.

lynx -dump | pr  ?

--
Ian Zimmerman
gpg public key: 1024D/C6FF61AD
fingerprint: 66DC D68F 5C1B 4D71 2EE5  BD03 8A00 786C C6FF 61AD
http://www.gravatar.com/avatar/c66875cda51109f76c6312f4d4743d1e.png
Rule 420: All persons more than eight miles high to leave the court.
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Tony Cappellini-2
In reply to this post by Monte Davidoff
Monte,

While that will work for the current page, I should have mentioned I'm looking for a program that should be
able to follow urls several levels (the level of urls will be determined by the user).

Since there is a lot of documentation, I was hoping to find a tool I can pass  a top level url (or series of urls)
to the program and let it do all the work.

In the case of PythonWeekly, many of the main urls link to other offsite urls.
Saving the main url to pdf wont get the offsite content.

thanks



On Thu, Mar 29, 2012 at 11:15 PM, Monte Davidoff <[hidden email]> wrote:
Hi Tony,


On 3/29/12 11:04 PM, Tony Cappellini wrote:
Have any of you used or written a tool which will convert a web page into a file that
is cleanly paginated and printable?

Perhaps I misunderstood the problem. The tool I've used for this task is my web browser (Firefox, Safari). On a Mac, I use Print > Save as PDF to get the output into a file.

Monte


_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Michael Pittaro-3
In reply to this post by Tony Cappellini-2
On Thu, Mar 29, 2012 at 11:04 PM, Tony Cappellini <[hidden email]> wrote:
>
> Have any of you used or written a tool which will convert a web page into a
> file that
> is cleanly paginated and printable?
>
> I need to print out some documentation which was only written in HTML.
> It's more convenient to have paper docs you can make notes on and use when
> you're away from a wifi connection.
>

Check out Print Friendly http://www.printfriendly.com/

It's a web service that converts a page / URL to a PDF.   I've had
good luck with it converting doc pages to PDF for kindle reading. It's
not perfect, but did a reasonable job on the bus pirate page
(actually,  it usually does better.)
>
>
> I'm hoping I don't need to reinvent the wheel.
>

You might have to grease it, though.  Print friendly only handles one
page, it doesn't walk the tree of included links.  But it's a web
service, so you can probably call it from urllib without too much
effort.

There are no terms of service or API posted on the site, but it looks
like a bay area project if you need to get in touch with them.

mike
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Akkana Peck
In reply to this post by Tony Cappellini-2
Tony Cappellini writes:
> While that will work for the current page, I should have mentioned I'm
> looking for a program that should be
> able to follow urls several levels (the level of urls will be determined by
> the user).

It's fairly easy to convert HTML pages to PDF with Python QtWebView
and QPrinter (you don't have to be running KDE or set up a GUI). I've
never found a way to do the same using python-webkit (the GTK bindings).
http://shallowsky.com/blog/programming/html-slides-to-pdf.html

I'm sure the webkit API will let you get a list of links and follow
them, though I don't know the calls offhand. Or you could get them
with a grep of the original page, and run your print script on the
list of URLs (as I do there).

        ...Akkana
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Tony Cappellini-2
In reply to this post by Michael Pittaro-3


>>Check out Print Friendly http://www.printfriendly.com/

Thanks. I'll give it a try



_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

jalopyuser
In reply to this post by Tony Cappellini-2
Tony Cappellini <[hidden email]> wrote:

> Monte,
>
> While that will work for the current page, I should have mentioned I'm
> looking for a program that should be
> able to follow urls several levels (the level of urls will be determined by
> the user).

Sounds like you want to save a whole site, not just a Web page.

Anyway, the tools I like are "wkpdf" and its alters done with Qt and
GTK++.  They use WebKit to render to PDF.  Pretty common by now.

Try wk2pdf.py, for instance.

To walk a site, you might try the Plucker tool.  It does a pretty good
job, if you can still find it.

Bill
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Tony Cappellini-2


>>Sounds like you want to save a whole site, not just a Web page.
No, not a whole site. Just many pages


_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Chris Clark
In reply to this post by jalopyuser
On Friday 2012-03-30 12:27 (-0700), Bill Janssen <[hidden email]> wrote:
> Tony Cappellini<[hidden email]>  wrote:
>
>> While that will work for the current page, I should have mentioned I'm
>> looking for a program that should be
>> able to follow urls several levels (the level of urls will be determined by
>> the user).
> Sounds like you want to save a whole site, not just a Web page....
> To walk a site, you might try the Plucker tool.  It does a pretty good
> job, if you can still find it.

Yup this isn't a printing issue, its a scraping issue.
Plucker/plucker-desktop is around in debian and supports the depth
option you want. The GU is kinda old and clunky but works.

On the other hand wget does this too, but it is less user friendly ;-) I
would strongly encourage you to NOT write tools for this as it can get
complex. There are some wget GUI wrappers knocking around (I can't
recommend any though).


The printing piece is another (complex) problem, do you need to flatten
(links between) the pages or print them separately, and in what order? I
don't have a good answer to that, it is a navigation problem. But
pulling down the pages is the first piece.

Chris

_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Chris Clark
On Friday 2012-03-30 12:57 (-0700), Chris Clark
<[hidden email]> wrote:

>
>> Tony Cappellini<[hidden email]>  wrote:
>>
>>> While that will work for the current page, I should have mentioned I'm
>>> looking for a program that should be
>>> able to follow urls several levels (the level of urls will be
>>> determined by
>>> the user).
>>
> ..... wget does this too, but it is less user friendly ;-) I would
> strongly encourage you to NOT write tools for this as it can get
> complex. There are some wget GUI wrappers knocking around (I can't
> recommend any though).

I forgot to include an example:

Pull down (up to a depth of 5) recursively, rename links and stay on web
site (do not follow external links)

      wget -L --recursive --convert-links .....


Also see --restrict-file-names, --no-directories, and  --level

Chris
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Pedro Kroger
In reply to this post by Tony Cappellini-2
Hi Tony,

I think wget and Print Friendly are great suggestions.

I'd add htmldoc [1] and lxml [2], depending on what you want. htmldoc is a nice tool (gui and command line) to generate pdfs from html and you can use lxml to easily extract the content and remove cruft.

For instance, I don't quite like the way printfriendly manage images for the Bus Blaster project:

http://dangerousprototypes.com/docs/Bus_Blaster

http://www.printfriendly.com/print/v2?url=http%3A%2F%2Fdangerousprototypes.com%2Fdocs%2FBus_Blaster

For this kind of thing I'd use lxml to extract only the content and remove the things I don't want such as table of contents:

https://gist.github.com/2277731

[1] http://www.htmldoc.org/
[2] http://lxml.de/

Cheers,

Pedro

--
http://pedrokroger.net


_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HTML to printable file

Tony Cappellini-2
Thanks to all who  replied.

On Sun, Apr 1, 2012 at 12:03 PM, Pedro Kroger <[hidden email]> wrote:
Hi Tony,

I think wget and Print Friendly are great suggestions.

I'd add htmldoc [1] and lxml [2], depending on what you want. htmldoc is a nice tool (gui and command line) to generate pdfs from html and you can use lxml to easily extract the content and remove cruft.

For instance, I don't quite like the way printfriendly manage images for the Bus Blaster project:

http://dangerousprototypes.com/docs/Bus_Blaster

http://www.printfriendly.com/print/v2?url=http%3A%2F%2Fdangerousprototypes.com%2Fdocs%2FBus_Blaster

For this kind of thing I'd use lxml to extract only the content and remove the things I don't want such as table of contents:

https://gist.github.com/2277731

[1] http://www.htmldoc.org/
[2] http://lxml.de/

Cheers,

Pedro

--
http://pedrokroger.net




_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Loading...