第一部分
文章的地址:sdiehl.github.io/gevent-tuto…
Written by the Gevent Community
gevent is a concurrency library based around libev. It provides a clean API for a variety of concurrency and network related tasks.
gevent是一个并发性库基于libev,它提供了一个纯净的API 用来处理各类问题和网络相关任务.
Introduction 介绍
The structure of this tutorial assumes an intermediate level knowledge of Python but not much else. No knowledge of concurrency is expected. The goal is to give you the tools you need to get going with gevent, help you tame your existing concurrency problems and start writing asynchronous applications today.
这教程的结构假设一个中等层次的Python程序员且没有什么并发性的知识.目的是给你知道怎么使用gevent,帮助你解决并发性的问题,和开始编写异步的应用程序.
Core 核心
Greenlets
The primary pattern used in gevent is the Greenlet, a lightweight coroutine provided to Python as a C extension module. Greenlets all run inside of the OS process for the main program but are scheduled cooperatively. This differs from any of the real parallelism constructs provided by multiprocessing
or multithreading
libraries which do spin processes and POSIX threads which are truly parallel.
在gevent中主要使用Greenlet,给Python提供一个轻量级的协同程序,作为一个C的扩展模块.Greenlets主程序运行的所有系统进程是合理安排的. 这不同于任何multiprocessing
或者multithreading
提供的库和POSIX线程,这是真正的并行多处理器或多线程库提供真正的并行结构。
Synchronous & Asynchronous Execution 同步&异步执行
The core idea of concurrency is that a larger task can be broken down into a collection of subtasks whose operation does not depend on the other tasks and thus can be run asynchronously instead of one at a time synchronously. A switch between the two executions is known as a context switch.
A context switch in gevent done through yielding. In this case example we have two contexts which yield to each other through invoking gevent.sleep(0)
.
并发的核心思想是一个更大的任务可以分解成多个子任务,其运行不依赖于其他任务的集合,因此可以异步运行 ,而不是一个在时间 同步。两个执行程序间的转换是一个关联转换。
在gevent中一个关联转换可以通过 yielding 来实现.在这个例子,两个程序的转换是通过调用 gevent.sleep(0)
.
import gevent
def foo():
print('Running in foo')
gevent.sleep(0)
print('Explicit context switch to foo again')
def bar():
print('Explicit context to bar')
gevent.sleep(0)
print('Implicit context switch back to bar')
gevent.joinall([
gevent.spawn(foo),
gevent.spawn(bar),
])
"""
Running in foo
Explicit context to bar
Explicit context switch to foo again
Implicit context switch back to bar
"""
复制代码
It is illuminating to visualize the control flow of the program or walk through it with a debugger to see the context switches as they occur.
在调解器里面清楚地看到程序在两个转换之间是怎么运行的.
The real power of gevent comes when we use it for network and IO bound functions which can be cooperatively scheduled. Gevent has taken care of all the details to ensure that your network libraries will implicitly yield their greenlet contexts whenever possible. I cannot stress enough what a powerful idiom this is. But maybe an example will illustrate.
当我们将gevent用于可以协作调度的网络和IO绑定函数时,它的真正威力就来了。Gevent处理了所有细节,以确保您的网络库尽可能隐式地产生greenlet上下文。这是一个多么有力的成语,我怎么强调都不过分。但也许一个例子可以说明。
In this case the select()
function is normally a blocking call that polls on various file descriptors.
在这种情况下,select()函数通常是一个阻塞调用,用于轮询各种文件描述符。
import time
import gevent
from gevent import select
start = time.time()
tic = lambda: 'at %1.1f seconds' % (time.time() - start)
def gr1():
# Busy waits for a second, but we don't want to stick around...
print('Started Polling: ', tic())
select.select([], [], [], 2)
print('Ended Polling: ', tic())
def gr2():
# Busy waits for a second, but we don't want to stick around...
print('Started Polling: ', tic())
select.select([], [], [], 2)
print('Ended Polling: ', tic())
def gr3():
print("Hey lets do some stuff while the greenlets poll, at", tic())
gevent.sleep(1)
gevent.joinall([
gevent.spawn(gr1),
gevent.spawn(gr2),
gevent.spawn(gr3),
])
"""
Started Polling: at 0.0 seconds
Started Polling: at 0.0 seconds
Hey lets do some stuff while the greenlets poll, at at 0.0 seconds
Ended Polling: at 2.0 seconds
Ended Polling: at 2.0 seconds
"""
复制代码
A somewhat synthetic example defines a task
function which is non-deterministic (i.e. its output is not guaranteed to give the same result for the same inputs). In this case the side effect of running the function is that the task pauses its execution for a random number of seconds.
一个比较综合的例子,定义一个task函数,它是不确定的(并不能保证相同的输入输出).在这种情况运行task函数的作用只是暂停其执行几秒钟的随机数.
import gevent
import random
def task(pid):
"""
Some non-deterministic task
"""
gevent.sleep(random.randint(0,2)*0.001)
print('Task', pid, 'done')
def synchronous():
for i in range(1,10):
task(i)
def asynchronous():
threads = [gevent.spawn(task, i) for i in xrange(10)]
gevent.joinall(threads)
print('Synchronous:')
synchronous()
print('Asynchronous:')
asynchronous()
复制代码
# 结果
Synchronous:
Task 1 done
Task 2 done
Task 3 done
Task 4 done
Task 5 done
Task 6 done
Task 7 done
Task 8 done
Task 9 done
Asynchronous:
Task 1 done
Task 6 done
Task 5 done
Task 0 done
Task 9 done
Task 8 done
Task 7 done
Task 4 done
Task 3 done
Task 2 done
复制代码
In the synchronous case all the tasks are run sequentially, which results in the main programming blocking ( i.e. pausing the execution of the main program ) while each task executes.
在同步的情况所有任务都会顺序的运行,当每个任务执行的时候导致主程序 blocking.
The important parts of the program are the gevent.spawn
which wraps up the given function inside of a Greenlet thread. The list of initialized greenlets are stored in the array threads
which is passed to the gevent.joinall
function which blocks the current program to run all the given greenlets. The execution will step forward only when all the greenlets terminate.
程序重要的部分是包装起来的函数gevent.spawn
, 它是Greenlet的线程. 初始化的greenlets储存在一个数组threads
,然后提交给 gevent.joinall
函数,然后阻塞当前的程序去运行所有greenlets.只有当所有greenlets停止的时候程序才会继续运行.
The important fact to notice is that the order of execution in the async case is essentially random and that the total execution time in the async case is much less than the sync case. In fact the maximum time for the synchronous case to complete is when each tasks pauses for 2 seconds resulting in a 20 seconds for the whole queue. In the async case the maximum runtime is roughly 2 seconds since none of the tasks block the execution of the others.
要注意的是异步的情况程序是无序的,异步的执行时间是远少于同步的.事实上同步去完成每个任务停止2秒的话,结果是要20秒才能完成整个队列.在异步的情况最大的运行时间大概就是2秒,因为每个任务的执行都不会阻塞其他的任务.
A more common use case, fetching data from a server asynchronously, the runtime of fetch()
will differ between requests given the load on the remote server.
一个更常见的情况,是从服务器上异步获取数据,请求之间 fetch()
的运行时间会给服务器带来不同的负载.
import gevent.monkey
gevent.monkey.patch_socket()
import gevent
import urllib2
import simplejson as json
def fetch(pid):
response = urllib2.urlopen('http://json-time.appspot.com/time.json')
result = response.read()
json_result = json.loads(result)
datetime = json_result['datetime']
print 'Process ', pid, datetime
return json_result['datetime']
def synchronous():
for i in range(1,10):
fetch(i)
def asynchronous():
threads = []
for i in range(1,10):
threads.append(gevent.spawn(fetch, i))
gevent.joinall(threads)
print 'Synchronous:'
synchronous()
print 'Asynchronous:'
asynchronous()
复制代码
Determinism 确定性
As mentioned previously, greenlets are deterministic. Given the same inputs and they always produce the same output. For example lets spread a task across a multiprocessing pool compared to a gevent pool.
正如之前提到的,greenlets是确定性的.给相同的输入就总会提供相同的输出.例如展开一个任务来比较一个multiprocessing pool和一个gevent pool.
import time
def echo(i):
time.sleep(0.001)
return i
# Non Deterministic Process Pool
from multiprocessing.pool import Pool
p = Pool(10)
run1 = [a for a in p.imap_unordered(echo, xrange(10))]
run2 = [a for a in p.imap_unordered(echo, xrange(10))]
run3 = [a for a in p.imap_unordered(echo, xrange(10))]
run4 = [a for a in p.imap_unordered(echo, xrange(10))]
print( run1 == run2 == run3 == run4 )
# Deterministic Gevent Pool
from gevent.pool import Pool
p = Pool(10)
run1 = [a for a in p.imap_unordered(echo, xrange(10))]
run2 = [a for a in p.imap_unordered(echo, xrange(10))]
run3 = [a for a in p.imap_unordered(echo, xrange(10))]
run4 = [a for a in p.imap_unordered(echo, xrange(10))]
print( run1 == run2 == run3 == run4 )
"""
False
True
"""
复制代码
Even though gevent is normally deterministic, sources of non-determinism can creep into your program when you begin to interact with outside services such as sockets and files. Thus even though green threads are a form of “deterministic concurrency”, they still can experience some of the same problems that POSIX threads and processes experience.
尽管gevent通常是确定性的,但当您开始与外部服务(如sockets和files)交互时,非确定性的来源可能会潜入您的程序。因此,即使green线程是“确定性并发”的一种形式,它们仍然会遇到与POSIX线程和进程相同的一些问题。
The perennial problem involved with concurrency is known as a race condition. Simply put is when two concurrent threads / processes depend on some shared resource but also attempt to modify this value. This results in resources whose values become time-dependent on the execution order. This is a problem, and in general one should very much try to avoid race conditions since they result program behavior which is globally non-deterministic.
并发所涉及的常年问题称为race条件。简单地说,当两个并发线程/进程依赖于某些共享资源,但也尝试修改此值时。这将导致资源的值随执行顺序而变为时间依赖。这是一个问题,一般来说,人们应该尽量避免race条件,因为它们导致程序行为是全局不确定性的。
The best approach to this is to simply avoid all global state all times. Global state and import-time side effects will always come back to bite you!
最好的方法就是一直避免所有的全局状态
Spawning Threads
gevent provides a few wrappers around Greenlet initialization. Some of the most common patterns are:
gevent提供了一些Greenlet初始化的封装.部分比较常用的模块是:
import gevent
from gevent import Greenlet
def foo(message, n):
"""
Each thread will be passed the message, and n arguments
in its initialization.
"""
gevent.sleep(n)
print(message)
# Initialize a new Greenlet instance running the named function
# foo
thread1 = Greenlet.spawn(foo, "Hello", 1)
# Wrapper for creating and runing a new Greenlet from the named
# function foo, with the passed arguments
thread2 = gevent.spawn(foo, "I live!", 2)
# Lambda expressions
thread3 = gevent.spawn(lambda x: (x+1), 2)
threads = [thread1, thread2, thread3]
# Block until all threads complete.
gevent.joinall(threads)
"""
Hello
I live!
"""
复制代码
In addition to using the base Greenlet class, you may also subclass Greenlet class and overload the _run
method.
除了用Greenlet的基类,你也可以用Greenlet的子类,重载_run
方法.
from gevent import Greenlet
class MyGreenlet(Greenlet):
def __init__(self, message, n):
Greenlet.__init__(self)
self.message = message
self.n = n
def _run(self):
print(self.message)
gevent.sleep(self.n)
g = MyGreenlet("Hi there!", 3)
g.start()
g.join()
"""
Hi there!
"""
复制代码
Greenlet State 状态
Like any other segment of code, Greenlets can fail in various ways. A greenlet may fail to throw an exception, fail to halt or consume too many system resources.
像其他编程,Greenlets会以不同的方式失败.一个greenlet可能会抛出一个异常, 失败会使程序停止或者消耗系统很多资源.
The internal state of a greenlet is generally a time-dependent parameter. There are a number of flags on greenlets which let you monitor the state of the thread
greenlet内部的状态通常是一个按时间变化的参数.以下几个状态让你可以监听线程的状态.
started
— Boolean, indicates whether the Greenlet has been started. 表明是否Greenlet已经开始ready()
— Boolean, indicates whether the Greenlet has halted. 表明是否Greenlet已经停止successful()
— Boolean, indicates whether the Greenlet has halted and not thrown an exception. 表明是否Greenlet已经停止并且没有抛出异常value
— arbitrary, the value returned by the Greenlet. 任意,Greenlet返回的值exception
— exception, uncaught exception instance thrown inside the greenlet 异常,greenlet内部实例没有被捕抓的异常
import gevent
def win():
return 'You win!'
def fail():
raise Exception('You fail at failing.')
winner = gevent.spawn(win)
loser = gevent.spawn(fail)
print(winner.started) # True
print(loser.started) # True
# Exceptions raised in the Greenlet, stay inside the Greenlet.
try:
gevent.joinall([winner, loser])
except Exception as e:
print('This will never be reached')
print(winner.value) # 'You win!'
print(loser.value) # None
print(winner.ready()) # True
print(loser.ready()) # True
print(winner.successful()) # True
print(loser.successful()) # False
# The exception raised in fail, will not propogate outside the
# greenlet. A stack trace will be printed to stdout but it
# will not unwind the stack of the parent.
print(loser.exception)
# It is possible though to raise the exception again outside
# raise loser.exception
# or with
# loser.get()
"""
True
True
You win!
None
True
True
True
False
You fail at failing.
"""
复制代码
Program Shutdown 程序关闭
Greenlets that fail to yield when the main program receives a SIGQUIT may hold the program’s execution longer than expected. This results in so called “zombie processes” which need to be killed from outside of the Python interpreter.
当主程序接受到一个SIGQUIT的时候,Greenlets的失败可能会让程序的执行比预想中长时间.这样的结果称为”zombie processes” ,需要让Python解析器以外的程序杀掉.
A common pattern is to listen SIGQUIT events on the main program and to invoke gevent.shutdown
before exit.
一个常用的模块是在主程序中监听SIGQUIT事件和退出前调用 gevent.shutdown
.
import gevent
import signal
def run_forever():
gevent.sleep(1000)
if __name__ == '__main__':
gevent.signal(signal.SIGQUIT, gevent.shutdown)
thread = gevent.spawn(run_forever)
thread.join()
复制代码
Timeouts 超时设定
Timeouts are a constraint on the runtime of a block of code or a Greenlet.
超时是对一推代码或者一个Greenlet运行时间的一种约束.
import gevent
from gevent import Timeout
seconds = 10
timeout = Timeout(seconds)
timeout.start()
def wait():
gevent.sleep(10)
try:
gevent.spawn(wait).join()
except Timeout:
print 'Could not complete'
复制代码
Or with a context manager in a with
a statement.
或者是带着一个语境的管理在一个with的状态.
import gevent
from gevent import Timeout
time_to_wait = 5 # seconds
class TooLong(Exception):
pass
with Timeout(time_to_wait, TooLong):
gevent.sleep(10)
复制代码
In addition, gevent also provides timeout arguments for a variety of Greenlet and data stucture related calls. For example:
另外,gevent同时也提供timeout的参数给各种Greenlet和数据结构相关的调用.例如:
import gevent
from gevent import Timeout
def wait():
gevent.sleep(2)
timer = Timeout(1).start()
thread1 = gevent.spawn(wait)
try:
thread1.join(timeout=timer)
except Timeout:
print('Thread 1 timed out')
# --
timer = Timeout.start_new(1)
thread2 = gevent.spawn(wait)
try:
thread2.get(timeout=timer)
except Timeout:
print('Thread 2 timed out')
# --
try:
gevent.with_timeout(1, wait)
except Timeout:
print('Thread 3 timed out')
"""
Thread 1 timed out
Thread 2 timed out
Thread 3 timed out
"""
复制代码
Monkeypatching
Alas we come to dark corners of Gevent. I’ve avoided mentioning monkey patching up until now to try and motivate the powerful coroutine patterns, but the time has come to discuss the dark arts of monkey-patching. If you noticed above we invoked the command monkey.patch_socket()
. This is a purely side-effectful command to modify the standard library’s socket library.
我们来到了Gevent的黑暗角落。我一直避免提及猴子修补,直到现在尝试和激励强大的协同模式,但现在到了讨论猴子修补的黑暗艺术的时候了。如果你注意到上面我们调用了命令monkey.patch_socket()
。这是一个纯副作用的命令,用于修改标准库的套接字库。
import socket
print(socket.socket)
print("After monkey patch")
from gevent import monkey
monkey.patch_socket()
print(socket.socket)
import select
print(select.select)
monkey.patch_select()
print("After monkey patch")
print(select.select)
"""
class 'socket.socket'
After monkey patch
class 'gevent.socket.socket'
built-in function select
After monkey patch
function select at 0x1924de8
"""
复制代码
Python’s runtime allows for most objects to be modified at runtime including modules, classes, and even functions. This is generally an astoudingly bad idea since it creates an “implicit side-effect” that is most often extremely difficult to debug if problems occur, nevertheless in extreme situations where a library needs to alter the fundamental behavior of Python itself monkey patches can be used. In this case gevent is capable of patching most of the blocking system calls in the standard library including those in socket
, ssl
, threading
and select
modules to instead behave cooperatively.
Python的运行时允许在运行时修改大多数对象,包括模块、类甚至函数。这通常是一个非常糟糕的想法,因为它会产生一种“隐含的副作用”,如果出现问题,通常很难进行调试,然而在极端情况下,库需要改变Python本身的基本行为,可以使用monkey补丁。在这种情况下,gevent能够修补标准库中的大多数阻塞系统调用,包括“socket”、“ssl”、“threading”和“select”模块中的那些调用,从而实现协作行为。
For example, the Redis python bindings normally uses regular tcp sockets to communicate with the redis-server
instance. Simply by invoking gevent.monkey.patch_all()
we can make the redis bindings schedule requests cooperatively and work with the rest of our gevent stack.
例如,Redis-python绑定通常使用常规tcp套接字与“Redis-server”实例通信。只需调用gevent.monkey.patch_all()
我们就可以使redis绑定协作地调度请求,并与gevent堆栈的其余部分一起工作。
This lets us integrate libraries that would not normally work with gevent without ever writing a single line of code. While monkey-patching is still evil, in this case it is a “useful evil”.
这使我们能够集成那些通常不需要编写一行代码就无法与gevent一起工作的库。虽然猴子修补仍然是evil的,在这种情况下,它是一个“有用的evil”。