工作时需要跑六十万条数据,老大说python实现并行有两种方式,一种是使用多进程库,一种是利用Shell脚本并行。写了两个小demo:
Multiprocess多进程
import time
import os
import multiprocessing
from multiprocessing import Pool
def run(k):
# print(k, multiprocessing.current_process().name) # 打印当前进程名称
# time.sleep(1)
for i in range(5):
time.sleep(1)
def run_pool():
def pool():
cpu_count = os.cpu_count()
p = Pool(cpu_count) # 8
# p.map(run, range(40)) # 排除打印语句运行 6.40s
p.map(run, range(8)) # 排除打印语句运行 5.45s
p.close()
p.join()
pool()
if __name__ == '__main__':
t0 = time.time()
run_pool()
t1 = time.time()
print(t1 - t0)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
注意:
- 进程的开闭也需要一定的时间,所以使用map函数时迭代次数尽量与进程数一致
- 服务器上运行代码时进程数要小于电脑cpu核数,还有其他人要运行程序
Shell脚本并行运行程序
test.py
import sys
receive = sys.argv[1:] # 接收程序外部传递的参数
start, end = int(receive[0]), int(receive[1])
for i in range(start, end):
print("hello world" + str(i))
- 1
- 2
- 3
- 4
- 5
test.sh
#!/bin/Bash
total_num=10
machine=5
for ((i=0;i<$machine;i++))
do
each_num=$[ $total_num / $machine ]
start=$[ $i * $each_num ]
end=$[ ($i+1) * $each_num ]
python -u test.py $start $end &
echo $start, $end, $each_num
done
wait
echo "END"
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
运行结果