请问从html源代码中提取元标记的内容用什么库比较方便啊?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

请问从html源代码中提取元标记的内容用什么库比较方便啊?

acestrong
DIP里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
--
Best Regards!

Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
Tao Cheng
E-mail: [hidden email] ;acestrong@nuaa.edu.cn
Tel: 86-025-84892273
==================================================
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

Jiahua Huang
抓网页用美丽的汤,

不过只是几个元数据,自己写正则也行

On Jan 18, 2008 7:17 PM, 刀巴虫子 <[hidden email]> wrote:
> DIP里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

acestrong
正在试用美丽的汤,挺好用的,已经提取出来了~~
谢谢哦~~

在08-1-18,Jiahua Huang <[hidden email]> 写道:
抓网页用美丽的汤,

不过只是几个元数据,自己写正则也行

On Jan 18, 2008 7:17 PM, 刀巴虫子 <[hidden email]> wrote:
> DIP里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese



--
Best Regards!

Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
Tao Cheng
E-mail: [hidden email] ;acestrong@nuaa.edu.cn
Tel: 86-025-84892273
==================================================
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

beck917

不错不错,有机会也尝试下美丽的汤还没用过.:-)

 

发件人: [hidden email] [mailto:[hidden email]] 代表 刀巴虫子
发送时间: 2008118 19:59
收件人: [hidden email]
主题: Re: [python-chinese] 请问从html源代码中提取元标记的内容用什么库比较方便啊?

 

正在试用美丽的汤,挺好用的,已经提取出来了~~
谢谢哦~~

08-1-18Jiahua Huang <[hidden email]> 写道:

抓网页用美丽的汤,

不过只是几个元数据,自己写正则也行

On Jan 18, 2008 7:17 PM,
刀巴虫子 <[hidden email]> wrote:
> DIP
里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese




--
Best Regards


Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
Tao Cheng
E-mail: [hidden email] ;acestrong@nuaa.edu.cn
Tel: 86-025-84892273
==================================================


_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

RE: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

tairan wang
冒昧的问一下 美丽的汤 是什么?




From: [hidden email]
To: [hidden email]
Date: Sat, 19 Jan 2008 09:37:31 +0800
Subject: [python-chinese] 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

不错不错,有机会也尝试下美丽的汤还没用过.:-)

 

发件人: [hidden email] [mailto:[hidden email]] 代表 刀巴虫子
发送时间: 2008118 19:59
收件人: [hidden email]
主题: Re: [python-chinese] 请问从html源代码中提取元标记的内容用什么库比较方便啊?

 

正在试用美丽的汤,挺好用的,已经提取出来了~~
谢谢哦~~

08-1-18Jiahua Huang <[hidden email]> 写道:

抓网页用美丽的汤,

不过只是几个元数据,自己写正则也行

On Jan 18, 2008 7:17 PM,
刀巴虫子 <[hidden email]> wrote:
> DIP
里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese




--
Best Regards


Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
Tao Cheng
E-mail: [hidden email] ;acestrong@nuaa.edu.cn
Tel: 86-025-84892273
==================================================



Express yourself instantly with MSN Messenger! MSN Messenger
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

RE: RE: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

nathan.wu
beautifulsoup google it


From: [hidden email]
To: [hidden email]
Date: Sun, 20 Jan 2008 05:52:34 +0000
Subject: [python-chinese] RE: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

冒昧的问一下 美丽的汤 是什么?




From: [hidden email]
To: [hidden email]
Date: Sat, 19 Jan 2008 09:37:31 +0800
Subject: [python-chinese] 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

不错不错,有机会也尝试下美丽的汤还没用过.:-)

 

发件人: [hidden email] [mailto:[hidden email]] 代表 刀巴虫子
发送时间: 2008118 19:59
收件人: [hidden email]
主题: Re: [python-chinese] 请问从html源代码中提取元标记的内容用什么库比较方便啊?

 

正在试用美丽的汤,挺好用的,已经提取出来了~~
谢谢哦~~

08-1-18Jiahua Huang <[hidden email]> 写道:

抓网页用美丽的汤,

不过只是几个元数据,自己写正则也行

On Jan 18, 2008 7:17 PM,
刀巴虫子 <[hidden email]> wrote:
> DIP
里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese




--
Best Regards


Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
Tao Cheng
E-mail: [hidden email] ;acestrong@nuaa.edu.cn
Tel: 86-025-84892273
==================================================



Express yourself instantly with MSN Messenger! MSN Messenger


用 Windows Live Spaces 展示个性自我,与好友分享生活! 了解更多信息!
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

realfun
是这个吗?
http://www.crummy.com/software/BeautifulSoup/

这两天都访问不了啊

在08-1-20,cunheise < [hidden email]> 写道:
beautifulsoup google it


From: [hidden email]
To: [hidden email]
Date: Sun, 20 Jan 2008 05:52:34 +0000
Subject: [python-chinese] RE: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?


冒昧的问一下 美丽的汤 是什么?




From: [hidden email]
To: [hidden email]
Date: Sat, 19 Jan 2008 09:37:31 +0800
Subject: [python-chinese] 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

不错不错,有机会也尝试下美丽的汤 还没用过 .:-)

 

发件人: [hidden email] [mailto:[hidden email]] 代表 刀巴虫子
发送时间: 20081 18 19:59
收件人: [hidden email]
主题: Re: [python-chinese] 请问从html源代码中提取元标记的内容用什么库比较方便啊?

 

正在试用美丽的汤,挺好用的,已经提取出来了~~
谢谢哦~~

08-1-18Jiahua Huang <[hidden email]> 写道:

抓网页用美丽的汤,

不过只是几个元数据,自己写正则也行

On Jan 18, 2008 7:17 PM,
刀巴虫子 <[hidden email]> wrote:
> DIP
里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: <a href="http://python.cn/mailman/listinfo/python-chinese" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://python.cn/mailman/listinfo/python-chinese




--
Best Regards


Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
Tao Cheng
E-mail: [hidden email] ;acestrong@<a href="http://nuaa.edu.cn/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> nuaa.edu.cn
Tel: 86-025-84892273
==================================================



Express yourself instantly with MSN Messenger! <a href="http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">MSN Messenger


用 Windows Live Spaces 展示个性自我,与好友分享生活! <a href="http://spaces.live.com/?page=HP" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">了解更多信息!

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to   [hidden email]
Detail Info: <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://python.cn/mailman/listinfo/python-chinese" target="_blank"> http://python.cn/mailman/listinfo/python-chinese



--
http://www.2maomao.com/blog
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?

HG-11
美丽的汤真的很好喝,强烈建议各位尝尝。
特别是大冬天的,一口下肚,极大减少分析html的麻烦。
可以提早上床暖被窝了!
:P


2008/1/20 realfun <[hidden email]>:

> 是这个吗?
> http://www.crummy.com/software/BeautifulSoup/
>
> 这两天都访问不了啊
>
> 在08-1-20,cunheise < [hidden email]> 写道:
> >
> > beautifulsoup google it
> >
> >
> >
> > ________________________________
>  From: [hidden email]
> > To: [hidden email]
> > Date: Sun, 20 Jan 2008 05:52:34 +0000
> > Subject: [python-chinese] RE: 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?
> >
> >
> >
> > 冒昧的问一下 美丽的汤 是什么?
> >
> >
> >
> >
> >
> > ________________________________
>  From: [hidden email]
> > To: [hidden email]
> > Date: Sat, 19 Jan 2008 09:37:31 +0800
> > Subject: [python-chinese] 答复: 请问从html源代码中提取元标记的内容用什么库比较方便啊?
> >
> >
> >
> >
> > 不错不错,有机会也尝试下美丽的汤 …还没用过 .:-)
> >
> >
> >
> >
> >
> > 发件人: [hidden email]
> [mailto:[hidden email]] 代表 刀巴虫子
> > 发送时间: 2008年1月 18日 19:59
> >
> > 收件人: [hidden email]
> > 主题: Re: [python-chinese] 请问从html源代码中提取元标记的内容用什么库比较方便啊?
> >
> >
> >
> >
> > 正在试用美丽的汤,挺好用的,已经提取出来了~~
> > 谢谢哦~~
> >
> >
> > 在08-1-18,Jiahua Huang < [hidden email]> 写道:
> >
> > 抓网页用美丽的汤,
> >
> > 不过只是几个元数据,自己写正则也行
> >
> > On Jan 18, 2008 7:17 PM, 刀巴虫子 < [hidden email]> wrote:
> > > DIP里介绍的是sgmllib里的SGMLParser,有更加方便的库吗?
> > >
> > _______________________________________________
> > python-chinese
> > Post: send [hidden email]
> > Subscribe: send subscribe to [hidden email]
> > Unsubscribe: send unsubscribe to  [hidden email]
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
> >
> >
> > --
> > Best Regards!
> >
> > Ace Strong
> >
> > ==================================================
> > Nanjing University of Aeronautics and Astronautics.
> > College of Civil Aviation
> > Tao Cheng
> > E-mail: [hidden email] ;acestrong@ nuaa.edu.cn
> > Tel: 86-025-84892273
> > ==================================================
> >
> > ________________________________
>  Express yourself instantly with MSN Messenger! MSN Messenger
> >
> > ________________________________
> 用 Windows Live Spaces 展示个性自我,与好友分享生活! 了解更多信息!
> > _______________________________________________
> >
> > python-chinese
> > Post: send [hidden email]
> > Subscribe: send subscribe to [hidden email]
> > Unsubscribe: send unsubscribe to   [hidden email]
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
>
> --
> http://www.2maomao.com/blog
> _______________________________________________
> python-chinese
> Post: send [hidden email]
> Subscribe: send subscribe to [hidden email]
> Unsubscribe: send unsubscribe to  [hidden email]
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>



--
关注LAMP平台、安全、及web开发的个人blog: http://hackgou.itbbq.com
PGP KeyID: hackgou#Gmail.com
PGP KeyServ: subkeys.pgp.net
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese