在unicode字符串中转换字节字符串-尊龙凯时

问题描述

我有这样的代码:

a = "u0432"
b = u"u0432"
c = b"u0432"
d = c.decode('utf8')
print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)

然后输出:

 в
 в
 b'\u0432'
 u0432

为什么在后一种情况下我看到的是字符代码，而不是字符?如何将 byte 字符串转换为 unicode 字符串，以便在输出时我看到的是字符而不是其代码?

why in the latter case i see a character code, instead of the character? how i can transform byte string to unicode string that in case of an output i saw the character, instead of its code?

推荐答案

在字符串(或 python 2 中的 unicode 对象)中，u 有一个特殊的含义，即这里来了一个 unicode由它的 unicode id 指定的字符".因此 u"u0432" 将产生字符 в.

in strings (or unicode objects in python 2), u has a special meaning, namely saying, "here comes a unicode character specified by it's unicode id". hence u"u0432" will result in the character в.

b'' 前缀告诉你这是一个 8 位字节序列，并且 bytes 对象没有 unicode 字符，所以 u 代码没有特殊意义.因此，b"u0432" 只是字节的序列 ,u,0,4、3 和 2.

the b'' prefix tells you this is a sequence of 8-bit bytes, and bytes object has no unicode characters, so the u code has no special meaning. hence, b"u0432" is just the sequence of the bytes ,u,0,4,3 and 2.

本质上，您有一个 8 位字符串，其中不包含 unicode 字符，而是包含 unicode 字符的规范.

essentially you have an 8-bit string containing not a unicode character, but the specification of a unicode character.

您可以使用 unicode 转义编码器转换此规范.

you can convert this specification using the unicode escape encoder.

>>> c.decode('unicode_escape')
'в'