怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

hongqing lv
 
a=open('a.txt').read()
b= a.decode('utf-8').encode('gb2312')
print b
 
结果提示:
UnicodeEncodeError: 'gb2312' codec can't encode character u'\ufeff' in position 0: illegal multibyte sequence
该怎么转换呢?

hongqing.lv
2008-01-06

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

Anthony-106
b= a.decode('utf-8').encode('gbk)

On Jan 6, 2008 6:40 PM, hongqing.lv <[hidden email]> wrote:
 
a=open('a.txt').read()
b= a.decode('utf-8').encode('gb2312')
print b
 
结果提示:
UnicodeEncodeError: 'gb2312' codec can't encode character u'\ufeff' in position 0: illegal multibyte sequence
该怎么转换呢?

2008-01-06

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese



--
---------------------------------------------------
www.douban.com/people/tutuqiang/
---------------------------------------------------
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

hongqing lv
gbk也提示不行呀
UnicodeEncodeError: 'gbk' codec can't encode character u'\ufeff' in position 0: illegal multibyte sequence
用记事本新建一个文本文件.然后写入几个中文字.然后保存时选utf-8
再保存,就可以测试了.但是结果怎么都过不去.

hongqing.lv
2008-01-06

发件人: Tu Tu
发送时间: 2008-01-06 18:43:14
收件人: [hidden email]
抄送:
主题: Re: [python-chinese]怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?
 
b= a.decode('utf-8').encode('gbk)

On Jan 6, 2008 6:40 PM, hongqing.lv <[hidden email]> wrote:
 
a=open('a.txt').read()
b= a.decode('utf-8').encode('gb2312')
print b
 
结果提示:
UnicodeEncodeError: 'gb2312' codec can't encode character u'\ufeff' in position 0: illegal multibyte sequence
该怎么转换呢?

2008-01-06

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese



--
---------------------------------------------------
www.douban.com/people/tutuqiang/
---------------------------------------------------

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

jessinio liang
怀疑你的编码本身有问题,推介第三方库:

chardet.feedparser.org 检测文件的编码
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

Anthony-106
直接用记事本和ultraedit就能判断文档的编码类型了:)

On Jan 6, 2008 9:33 PM, jessinio liang <[hidden email]> wrote:
怀疑你的编码本身有问题,推介第三方库:

chardet.feedparser.org 检测文件的编码
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese



--
---------------------------------------------------
www.douban.com/people/tutuqiang/
---------------------------------------------------
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

hongqing lv
发现结果是这样的.大家可以在自己的机器上试一下.
问题在于Windows的记事本.我昨天是用Windows的记事本新建一个文件.然后存入几个字符.保存时选择UTF-8类型.
转换必然出错.然后我在UltraEdit中新建一个文本文件.然后保存时,选择UTF-8就会有问题.然后另存为UTF-8无BOM 这种类型
就是好的.
原来是windows的一个BUG.能不能算是Python的一个Bug呢?
 
下面是转换的代码.这段代码是没有问题.
import os,sys
a=open('s.php').read()
b=a.decode('utf-8').encode('gb2312')
open('s1.txt','w').write(b)
print b
 

hongqing.lv
2008-01-07

发件人: Tu Tu
发送时间: 2008-01-06 21:39:45
收件人: [hidden email]
抄送:
主题: Re: [python-chinese]怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?
 
直接用记事本和ultraedit就能判断文档的编码类型了:)

On Jan 6, 2008 9:33 PM, jessinio liang <[hidden email]> wrote:
怀疑你的编码本身有问题,推介第三方库:

chardet.feedparser.org 检测文件的编码
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese



--
---------------------------------------------------
www.douban.com/people/tutuqiang/
---------------------------------------------------

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

Jiahua Huang
别折腾了, Windows 那破 notepad.exe 的所谓 UTF-8 根本就不是 UTF-8,
你用 UltraEdit 都比它强。


或者,你一定要那破记事本的话,就编码选 它所谓 "unicode",
python 里用
b = open('a.txt').read().decode('utf16').encode('gb18030')




On Jan 7, 2008 8:49 AM, hongqing.lv <[hidden email]> wrote:
>
>
> 发现结果是这样的.大家可以在自己的机器上试一下.
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

hongqing lv
谢谢.高人.
b = open('a.txt').read().decode('utf16').encode('gb18030') #utf16 也提示出错.utf8就可以.
在我的系统上windows xp.这样写就可以生成Ascii码的.
b = open('a.txt').read().decode('utf8').encode('gb18030') #后面的一定不能用gb2312,
昨天折腾了一天,原来是记事本有毛病,晕死.
 

hongqing.lv
2008-01-07

发件人: Jiahua Huang
发送时间: 2008-01-07 08:59:07
收件人: [hidden email]
抄送:
主题: Re: [python-chinese]怎么样把一个
b = open('a.txt').read().decode('utf16').encode('gb18030')
UTF8的文件编码转换成ASCII的,也就是gb2312的?
 
别折腾了, Windows 那破 notepad.exe 的所谓 UTF-8 根本就不是 UTF-8,
你用 UltraEdit 都比它强。
 
 
或者,你一定要那破记事本的话,就编码选 它所谓 "unicode",
python 里用
 
 
 
 
On Jan 7, 2008 8:49 AM, hongqing.lv  <[hidden email] > wrote:
>
>
> 发现结果是这样的.大家可以在自己的机器上试一下.
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

wisyou
在python里,如果没有特别指明,它内部的汉字编码都是unicode(其实可能是utf16), 即使你指明你的程序的编码是UTF-8,实际上在python内部还是当作unicode实现的。 如果希望转码。很简单就实现了。比如读一个文本文件,把它转 成另一个编码可以这样做
tmps=file(filename,"rb").read()
tmps.decode("utf-8").encode("gbk")

这里要说明一样。utf-8, gb2312, gb18030, gbk的编码的汉字范围是不同的。所以不能够完全转化成功。如果有这样的事情发生。

目前gbk和utf-8是最通用的。



在2008-01-07,"hongqing.lv" <[hidden email]> 写道:

>
>谢谢.高人.
>b = open('a.txt').read().decode('utf16').encode('gb18030')
>#utf16 也提示出错.utf8就可以.
>在我的系统上windows
>xp.这样写就可以生成Ascii码的.
>
>b = open('a.txt').read().decode('utf8').encode('gb18030')
>#后面的一定不能用gb2312,
>
>昨天折腾了一天,原来是记事本有毛病,晕死.
>
>
>
>
>
>hongqing.lv
>2008-01-07
>
>
>
>发件人: Jiahua
>Huang
>发送时间:
>2008-01-07 08:59:07
>收件人:
>[hidden email]
>抄送:
>主题: Re:
>[python-chinese]怎么样把一个
>b = open('a.txt').read().decode('utf16').encode('gb18030')UTF8的文件编码转换成ASCII的,也就是gb2312的?
>
>
>别折腾了, Windows 那破 notepad.exe 的所谓 UTF-8 根本就不是 UTF-8,
>你用 UltraEdit 都比它强。
>
>
>或者,你一定要那破记事本的话,就编码选 它所谓 "unicode",
>python 里用
>
>
>
>
>On Jan 7, 2008 8:49 AM, hongqing.lv
>&lt;[hidden email] &gt; wrote:
>&gt;
>&gt;
>&gt; 发现结果是这样的.大家可以在自己的机器上试一下.
>_______________________________________________
>python-chinese
>Post: send [hidden email]
>Subscribe: send subscribe to [hidden email]
>Unsubscribe: send unsubscribe to [hidden email]
>Detail Info: http://python.cn/mailman/listinfo/python-chinese
>



网 易 有 道 词 典 -- 全 球 最 强 大 的 免 费 英 汉 互 译 词 典 ( 只 有 2 兆 )
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

东子
In reply to this post by hongqing lv
给win下使用notepad 的同志们推荐一个替代notepad的编辑器,notepad2
地址:http://www.flos-freeware.ch/notepad2.html
有中文版: http://files.myopera.com/danei_archive/files/Notepad2_2.0.18_chs.zip (原始链接不能下载,请使用迅雷)

网站上还给出了如何替换windows自带notepad的方法。我一直用他,不错的。

在08-1-7, hongqing.lv <[hidden email]> 写道:
谢谢.高人.
b = open('a.txt').read().decode('utf16').encode('gb18030') #utf16 也提示出错.utf8就可以.
在我的系统上windows xp.这样写就可以生成Ascii码的.
b = open('a.txt').read().decode('utf8').encode('gb18030') #后面的一定不能用gb2312,
昨天折腾了一天,原来是记事本有毛病,晕死.
 

<a href="http://hongqing.lv" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">hongqing.lv
2008-01-07

发件人: Jiahua Huang
发送时间: 2008-01-07 08:59:07
收件人: [hidden email]
抄送:
主题: Re: [python-chinese]怎么样把一个
b = open('a.txt').read().decode('utf16').encode('gb18030')
UTF8的文件编码转换成ASCII的,也就是gb2312的?
 
别折腾了, Windows 那破 notepad.exe 的所谓 UTF-8 根本就不是 UTF-8,
你用 UltraEdit 都比它强。
 
 
或者,你一定要那破记事本的话,就编码选 它所谓 "unicode",
python 里用
 
 
 
 
On Jan 7, 2008 8:49 AM, <a href="http://hongqing.lv" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">hongqing.lv  <[hidden email] > wrote:
>
>
> 发现结果是这样的.大家可以在自己的机器上试一下.
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: <a href="http://python.cn/mailman/listinfo/python-chinese" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://python.cn/mailman/listinfo/python-chinese

_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to   [hidden email]
Detail Info: <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://python.cn/mailman/listinfo/python-chinese" target="_blank"> http://python.cn/mailman/listinfo/python-chinese



--
东子(新手上路中)
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

limodou
In reply to this post by wisyou
On Jan 7, 2008 9:33 AM,  <[hidden email]> wrote:

> 在python里,如果没有特别指明,它内部的汉字编码都是unicode(其实可能是utf16),
> 即使你指明你的程序的编码是UTF-8,实际上在python内部还是当作unicode实现的。 如果希望转码。很简单就实现了。比如读一个文本文件,把它转
> 成另一个编码可以这样做
> tmps=file(filename,"rb").read()
> tmps.decode("utf-8").encode("gbk")
>
> 这里要说明一样。utf-8, gb2312, gb18030, gbk的编码的汉字范围是不同的。所以不能够完全转化成功。如果有这样的事情发生。
>
> 目前gbk和utf-8是最通用的。
>
在3.0之下,python其实不象java,内部并不是统一的unicode编码的。到了3.0才统一。

--
I like python!
UliPad <<The Python Editor>>: http://code.google.com/p/ulipad/
meide <<wxPython UI module>>: http://code.google.com/p/meide/
My Blog: http://www.donews.net/limodou
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

Jiahua Huang
In reply to this post by wisyou
字符集范围大小是
utf8 > gb18030 > gbk >  gb2312

这些人惯用的 gb2312 是非常小的字符集,
要告诉人别乱用  gb2312, 需要 gb 的请改为 gb18030

On Jan 7, 2008 9:33 AM,  <[hidden email]> wrote:
> 目前gbk和utf-8是最通用的。
>
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

Jiahua Huang
In reply to this post by hongqing lv
或说是微软或你的问题,

gb 中文不应该使用 gb2312 ,
要改为 gb18030。

gb2312 是几十年前只有几千个字的小字符集,
只是微软不标准地把 gb2312 映射到 gb18030 欺骗了你

On Jan 7, 2008 9:26 AM, hongqing.lv <[hidden email]> wrote:
>
> b = open('a.txt').read().decode('utf8').encode('gb18030') #后面的一定不能用gb2312,
>
> 昨天折腾了一天,原来是记事本有毛病,晕死.
>
>
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

答复: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

Handle Huang
In reply to this post by 东子

UE不就都解决了吗?

 

发件人: [hidden email] [mailto:[hidden email]] 代表 东子/hydon
发送时间: 200817 9:39
收件人: [hidden email]
主题: Re: [python-chinese] 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

 

win下使用notepad 的同志们推荐一个替代notepad的编辑器,notepad2
地址:http://www.flos-freeware.ch/notepad2.html
有中文版: http://files.myopera.com/danei_archive/files/Notepad2_2.0.18_chs.zip (原始链接不能下载,请使用迅雷)

网站上还给出了如何替换windows自带notepad的方法。我一直用他,不错的。

08-1-7 hongqing.lv <[hidden email]> 写道:

谢谢.高人.

b = open('a.txt').read().decode('utf16').encode('gb18030') #utf16 也提示出错.utf8就可以.

在我的系统上windows xp.这样写就可以生成Ascii码的.

b = open('a.txt').read().decode('utf8').encode('gb18030') #后面的一定不能用gb2312,

昨天折腾了一天,原来是记事本有毛病,晕死.

 


2008-01-07


发件人: Jiahua Huang

发送时间: 2008-01-07 08:59:07

收件人: [hidden email]

抄送:

主题: Re: [python-chinese]怎么样把一个

b = open('a.txt').read().decode('utf16').encode('gb18030')

UTF8的文件编码转换成ASCII的,也就是gb2312的?

 

别折腾了, Windows 那破 notepad.exe 的所谓 UTF-8 根本就不是 UTF-8,

你用 UltraEdit 都比它强。

 

 

或者,你一定要那破记事本的话,就编码选 它所谓 "unicode"

python 里用

 

 

 

 

On Jan 7, 2008 8:49 AM, hongqing.lv  <[hidden email] > wrote:

> 

> 

发现结果是这样的.大家可以在自己的机器上试一下.

_______________________________________________

python-chinese

Post: send [hidden email]

Subscribe: send subscribe to [hidden email]

Unsubscribe: send unsubscribe to  [hidden email]


_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to   [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese




--
东子(新手上路中)


_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese
Reply | Threaded
Open this post in threaded view
|

Re: 怎么样把一个UTF8的文件编码转换成ASCII的,也就是gb2312的?

yuting cui
In reply to this post by Jiahua Huang
第一...楼主的问题是bom弄的
第二...windows默认带的是gbk或者说cp936
第三...gb18030和unicode3.0表示范围几乎一致
第四...gb18030是gbk的超集,理论上gbk的东西都能用gb18030打开;同样gbk也是gb2312的超集,gb2312的东西都能用gbk打开,按需求选字符集就可以了
第五...gb18030处理效率很差(没办法,为了向下兼容)....

在 08-1-7,Jiahua Huang<[hidden email]> 写道:

> 或说是微软或你的问题,
>
> gb 中文不应该使用 gb2312 ,
> 要改为 gb18030。
>
> gb2312 是几十年前只有几千个字的小字符集,
> 只是微软不标准地把 gb2312 映射到 gb18030 欺骗了你
>
> On Jan 7, 2008 9:26 AM, hongqing.lv <[hidden email]> wrote:
> >
> > b = open('a.txt').read().decode('utf8').encode('gb18030') #后面的一定不能用gb2312,
> >
> > 昨天折腾了一天,原来是记事本有毛病,晕死.
> >
> >
> _______________________________________________
> python-chinese
> Post: send [hidden email]
> Subscribe: send subscribe to [hidden email]
> Unsubscribe: send unsubscribe to  [hidden email]
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send [hidden email]
Subscribe: send subscribe to [hidden email]
Unsubscribe: send unsubscribe to  [hidden email]
Detail Info: http://python.cn/mailman/listinfo/python-chinese