个人工具

Python 官方简明教程 3

来自Ubuntu中文

Yq-ysy讨论 | 贡献2011年5月18日 (三) 16:05的版本 — 3.1.2. Strings

跳转至: 导航, 搜索

— 3. 非正式地介绍Python(译文尚未校对)

在随后的例子中,输入和输出被分别显示在时隐时现的提示符(>>> 和 ...)之后:为了实践这个例子,当提示符出现时,您必须在提示符后用键盘输入每一个字符;每一行不会由一个解释器输出的提示符起头。记住,在例子中独自占一行的提示符表示您必须键入一个空行;这是用于结束一个多行命令。

本手册中的一些例子,甚至在交互提示符后输入的内容中,包括了注释。Python中的注释由哈希字符 # 起头,一直延续到物理行尾。一个注释可以从行首开始显示,或者跟随在空格或代码之后,但不能位于文本串里面。一个文本串里的哈希字符就只是一个哈希字符。既然注释是用来阐述说明代码的,不会被Python解释执行,所以您可以在实践操作中省略键入例子中的注释。

例如:

# 这是第一行注释
SPAM = 1                 # 这是第二行注释
                         # ……这是第二行注释!
STRING = "# 这不是一行注释。"

— 3.1. 把Python作为一个计算器

让我们尝试一些简单的Python命令。打开解释器等待主提示符>>>的出现(不需要很长时间)。

— 3.1.1. 数字

解释器可以作为一个简单计算器:您可以在解释器里键入一个表达式,它将返回计算出的数值。表达式的语法很直白:和其它程序语言使用同样的操作符 + , - , * 和 / (如同Pascal语言或C语言);括号可以用来分组计算。例如:

>>> 2+2
4
>>> # 这是一行注释
... 2+2
4
>>> 2+2  # 这是和代码同在一行的注释
4
>>> (50-5*6)/4
5.0
>>> 8/5 # 整数除法时不会丢失分数
1.6

注意:在不同的机器上浮点运算的结果可能会不一样。在稍后的章节中我们会介绍有关控制浮点运算输出的内容。请参考《浮点算法》:关于浮点数及其概念全面细致的研究中出现的问题和限制。

在整数除法中,如果只想得到整数的结果(商或余数),丢弃其它零碎的结果,可以使用操作符 // :

>>> # 整数除法返回值:
... 7//3
2
>>> 7//-3
-3

等号符(‘=’)用于给变量赋值。赋值之后,在解释器的提示符前不会有任何结果显示。

>>> width = 20
>>> height = 5*9
>>> width * height
900

可以同时给多个变量赋予同一个值:

>>> x = y = z = 0  # Zero x, y and z
>>> x
0
>>> y
0
>>> z
0

在使用变量之前必须先“定义”变量(即赋予变量一个值),否则会有出错提示:

>>> # try to access an undefined variable
... n
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'n' is not defined

完全支持浮点数;可以进行类型混合的整数转换为浮点数操作运算:

>>> 3 * 3.75 / 1.5
7.5
>>> 7.0 / 2
3.5

也支持复数;虚数带有 j 或 J 后缀。带有非零实部的复数记为 (实部+虚部j),或者使用 complex(实部,虚部) 这功能来创建复数。

>>> 1j * 1J
(-1+0j)
>>> 1j * complex(0, 1)
(-1+0j)
>>> 3+1j*3
(3+3j)
>>> (3+1j)*3
(9+3j)
>>> (1+2j)/(1+1j)
(1.5+0.5j)

复数总是由两个浮点数来表示,实部和虚部。如果想要从复数z中提取其中一部分,使用 z.real 和 z.imag。

>>> a=1.5+0.5j
>>> a.real
1.5
>>> a.imag
0.5

浮点数和整数之间的转换功能(float(), int())不能用于复数——没有一个正确的方法可以把复数转换为实数。可以使用 abs(z) 获得复数的量值,或者使用 z.real 获得复数的实部:

>>> a=3.0+4.0j
>>> float(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: can't convert complex to float; use abs(z)
>>> a.real
3.0
>>> a.imag
4.0
>>> abs(a)  # sqrt(a.real**2 + a.imag**2)
5.0

在交互模式中,最后被输出的表达式结果被赋予变量 _ 。这能令您在把Python作为一个桌面计算器使用时,在连续计算方面应用得更方便,例如:

>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax
12.5625
>>> price + _
113.0625
>>> round(_, 2)
113.06

这个 _ 变量应该被用户视为只读变量。不要故意地给它赋值——这样您将会创建一个具有相同名称的独立本地变量,并且掩蔽了这个内置变量特有的魔力。

— 3.1.2. 字符串

除了数字,Python也能以几种方式来操作字符串。字符串可以被封装在单引号或双引号中:

>>> 'spam eggs'
'spam eggs'
>>> 'doesn\'t'
"doesn't"
>>> "doesn't"
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "\"Yes,\" he said."
'"Yes," he said.'
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'

如果字符串在输入时被封装在引号内,或者引号和其它特殊字符被反斜杠这个转义符忽略时,解释器将会输出同样的字符串操作结果,返回明确的值。如果字符串包含有单引号不包含有双引号,则字符串应该被双引号封装,否则只用总引号封装即可。这样输入的字符串能让print() 函数产生更容易阅读的输出结果。

字符串文本可以用以下几种方法跨过多行。可以用续行符,即在每行最后一个字符后使用反斜线来说明下一行是这一行逻辑上的延续:

hello = "This is a rather long string containing\n\
several lines of text just as you would do in C.\n\
    Note that whitespace at the beginning of the line is\
 significant."

print(hello)

注意,要换新的一行输出仍然需要嵌入字符 \n ——如果新的一行直接跟着反斜杠则不会换到新的一行输出。这个例子将会输出如下:

This is a rather long string containing
several lines of text just as you would do in C.
    Note that whitespace at the beginning of the line is significant.

或者,字符串可以被三个双引号 """ 或者三个单引号封装。使用三引号时,行的末端不必再添加转义符反斜杠,但这些行仍将被包括在同一个字符串内。因此以下的例子使用了一个义符反斜杠,防止在输出刚开始时产生一个不需要的空行。

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")

随后产生的输出如下:

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

如果我们在一个“raw”后使用字符串,那么 \n 不会被转换成新的一行输出,但是行尾末端的的反斜杠,以及新的一行的源代码,都将作为整体数据包括在这个字符串内。例如这样:

hello = r"This is a rather long string containing\n\
several lines of text much as you would do in C."

print(hello)

将会输出:

This is a rather long string containing\n\
several lines of text much as you would do in C.

字符串可以使用 + 操作符串连在一起,或者用 * 操作符重复:

>>> word = 'Help' + 'A'
>>> word
'HelpA'
>>> '<' + word*5 + '>'
'<HelpAHelpAHelpAHelpAHelpA>'

两个紧靠在一起的字符串文本将自动串连;上例的第一行也可以写成 word = 'Help' 'A' ;这样的操作只在两个文本中有效,不能随意用于字符串表达式中: Two string literals next to each other are automatically concatenated; the first line above could also have been written word = 'Help' 'A'; this only works with two literals, not with arbitrary string expressions:

>>> 'str' 'ing'                   #  <-  这样操作正确
'string'
>>> 'str'.strip() + 'ing'   #  <-  这样操作正确
'string'
>>> 'str'.strip() 'ing'     #  <-  这样操作错误
  File "<stdin>", line 1, in ?
    'str'.strip() 'ing'
                      ^
SyntaxError: invalid syntax

Strings can be subscripted (indexed); like in C, the first character of a string has subscript (index) 0. There is no separate character type; a character is simply a string of size one. As in the Icon programming language, substrings can be specified with the slice notation: two indices separated by a colon.

>>> word[4]
'A'
>>> word[0:2]
'He'
>>> word[2:4]
'lp'

Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced.

>>> word[:2]    # The first two characters
'He'
>>> word[2:]    # Everything except the first two characters
'lpA'

Unlike a C string, Python strings cannot be changed. Assigning to an indexed position in the string results in an error:

>>> word[0] = 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'str' object does not support item assignment
>>> word[:1] = 'Splat'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'str' object does not support slice assignment

However, creating a new string with the combined content is easy and efficient:

>>> 'x' + word[1:]
'xelpA'
>>> 'Splat' + word[4]
'SplatA'

Here’s a useful invariant of slice operations: s[:i] + s[i:] equals s.

>>> word[:2] + word[2:]
'HelpA'
>>> word[:3] + word[3:]
'HelpA'

Degenerate slice indices are handled gracefully: an index that is too large is replaced by the string size, an upper bound smaller than the lower bound returns an empty string.

>>> word[1:100]
'elpA'
>>> word[10:]

>>> word[2:1]

Indices may be negative numbers, to start counting from the right. For example:

>>> word[-1]     # The last character
'A'
>>> word[-2]     # The last-but-one character
'p'
>>> word[-2:]    # The last two characters
'pA'
>>> word[:-2]    # Everything except the last two characters
'Hel'

But note that -0 is really the same as 0, so it does not count from the right!

>>> word[-0]     # (since -0 equals 0)
'H'

Out-of-range negative slice indices are truncated, but don’t try this for single-element (non-slice) indices:

>>> word[-100:]
'HelpA'
>>> word[-10]    # error
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: string index out of range

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:

 +---+---+---+---+---+
 | H | e | l | p | A |
 +---+---+---+---+---+
 0   1   2   3   4   5
-5  -4  -3  -2  -1

The first row of numbers gives the position of the indices 0...5 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

The built-in function len() returns the length of a string:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

See also

Sequence Types — str, bytes, bytearray, list, tuple, range

   Strings are examples of sequence types, and support the common operations supported by such types.

String Methods

   Strings support a large number of methods for basic transformations and searching.

String Formatting

   Information about string formatting with str.format() is described here.

Old String Formatting Operations

   The old formatting operations invoked when strings and Unicode strings are the left operand of the % operator are described in more detail here.

— 3.1.3. About Unicode

Starting with Python 3.0 all strings support Unicode (see http://www.unicode.org/).

Unicode has the advantage of providing one ordinal for every character in every script used in modern and ancient texts. Previously, there were only 256 possible ordinals for script characters. Texts were typically bound to a code page which mapped the ordinals to script characters. This lead to very much confusion especially with respect to internationalization (usually written as i18n — 'i' + 18 characters + 'n') of software. Unicode solves these problems by defining one code page for all scripts.

If you want to include special characters in a string, you can do so by using the Python Unicode-Escape encoding. The following example shows how:

>>> 'Hello\u0020World !'
'Hello World !'

The escape sequence \u0020 indicates to insert the Unicode character with the ordinal value 0x0020 (the space character) at the given position.

Other characters are interpreted by using their respective ordinal values directly as Unicode ordinals. If you have literal strings in the standard Latin-1 encoding that is used in many Western countries, you will find it convenient that the lower 256 characters of Unicode are the same as the 256 characters of Latin-1.

Apart from these standard encodings, Python provides a whole set of other ways of creating Unicode strings on the basis of a known encoding.

To convert a string into a sequence of bytes using a specific encoding, string objects provide an encode() method that takes one argument, the name of the encoding. Lowercase names for encodings are preferred.

>>> "Äpfel".encode('utf-8')
b'\xc3\x84pfel'

— 3.1.4. Lists

Python knows a number of compound data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. List items need not all have the same type.

>>> a = ['spam', 'eggs', 100, 1234]
>>> a
['spam', 'eggs', 100, 1234]

Like string indices, list indices start at 0, and lists can be sliced, concatenated and so on:

>>> a[0]
'spam'
>>> a[3]
1234
>>> a[-2]
100
>>> a[1:-1]
['eggs', 100]
>>> a[:2] + ['bacon', 2*2]
['spam', 'eggs', 'bacon', 4]
>>> 3*a[:3] + ['Boo!']
['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boo!']

All slice operations return a new list containing the requested elements. This means that the following slice returns a shallow copy of the list a:

>>> a[:]
['spam', 'eggs', 100, 1234]

Unlike strings, which are immutable, it is possible to change individual elements of a list:

>>> a
['spam', 'eggs', 100, 1234]
>>> a[2] = a[2] + 23
>>> a
['spam', 'eggs', 123, 1234]

Assignment to slices is also possible, and this can even change the size of the list or clear it entirely:

>>> # Replace some items:
... a[0:2] = [1, 12]
>>> a
[1, 12, 123, 1234]
>>> # Remove some:
... a[0:2] = []
>>> a
[123, 1234]
>>> # Insert some:
... a[1:1] = ['bletch', 'xyzzy']
>>> a
[123, 'bletch', 'xyzzy', 1234]
>>> # Insert (a copy of) itself at the beginning
>>> a[:0] = a
>>> a
[123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
>>> # Clear the list: replace all items with an empty list
>>> a[:] = []
>>> a
[]

The built-in function len() also applies to lists:

>>> a = ['a', 'b', 'c', 'd']
>>> len(a)
4

It is possible to nest lists (create lists containing other lists), for example:

>>> q = [2, 3]
>>> p = [1, q, 4]
>>> len(p)
3
>>> p[1]
[2, 3]
>>> p[1][0]
2

You can add something to the end of the list:

>>> p[1].append('xtra')
>>> p
[1, [2, 3, 'xtra'], 4]
>>> q
[2, 3, 'xtra']

Note that in the last example, p[1] and q really refer to the same object! We’ll come back to object semantics later.

— 3.2. First Steps Towards Programming

Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub-sequence of the Fibonacci series as follows:

>>> # Fibonacci series:
... # the sum of two elements defines the next
... a, b = 0, 1
>>> while b < 10:
...     print(b)
...     a, b = b, a+b
...
1
1
2
3
5
8

This example introduces several new features.

  • The first line contains a multiple assignment: the variables a and b simultaneously get the new values 0 and 1. On the last line this is used again, demonstrating that the expressions on the right-hand side are all evaluated first before any of the assignments take place. The right-hand side expressions are evaluated from the left to the right.
  • The while loop executes as long as the condition (here: b < 10) remains true. In Python, like in C, any non-zero integer value is true; zero is false. The condition may also be a string or list value, in fact any sequence; anything with a non-zero length is true, empty sequences are false. The test used in the example is a simple comparison. The standard comparison operators are written the same as in C: < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to) and != (not equal to).
  • The body of the loop is indented: indentation is Python’s way of grouping statements. Python does not (yet!) provide an intelligent input line editing facility, so you have to type a tab or space(s) for each indented line. In practice you will prepare more complicated input for Python with a text editor; most text editors have an auto-indent facility. When a compound statement is entered interactively, it must be followed by a blank line to indicate completion (since the parser cannot guess when you have typed the last line). Note that each line within a basic block must be indented by the same amount.
  • The print() function writes the value of the expression(s) it is given. It differs from just writing the expression you want to write (as we did earlier in the calculator examples) in the way it handles multiple expressions, floating point quantities, and strings. Strings are printed without quotes, and a space is inserted between items, so you can format things nicely, like this:
>>> i = 256*256
>>> print('The value of i is', i)
The value of i is 65536

The keyword end can be used to avoid the newline after the output, or end the output with a different string:

>>> a, b = 0, 1
>>> while b < 1000:
...     print(b, end=',')
...     a, b = b, a+b
...
1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,
————— 返回《 Python 官方教程 》目录 —————
—— 返回《 Python 手册 》总目录 ——