Вы можете сделать что-то вроде этого:
with open('file') as file:
lines = file.readlines()
headers = lines[0:1]
rest = lines[1:]
chunk_size = 4
def chunks(lst, chunk_size):
for i in xrange(0, len(lst), chunk_size):
yield lst[i:i + chunk_size]
def write_rows(rows, file):
for row in rows:
file.write('%s' % row)
part = 1
for chunk in chunks(rest, chunk_size):
with open('part%d' % part, 'w') as file:
write_rows(headers, file)
write_rows(chunk, file)
part += 1
Вот тестовый прогон:
$ cat file && python mkt.py && for p in part*; do echo ---- $p; cat $p; done
header
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---- part1
header
1
2
3
4
---- part2
header
5
6
7
8
---- part3
header
9
10
11
12
---- part4
header
13
14
Очевидно, измените значения chunk_size
и способ получения headers
в зависимости от их количества.
Кредиты:
Изменить - чтобы сделать это построчно, чтобы избежать проблем с памятью, вы можете сделать что-то вроде этого:
from itertools import islice
headers_count = 5
chunk_size = 250000
with open('file') as fin:
headers = list(islice(fin, headers_count))
part = 1
while True:
line_iter = islice(fin, chunk_size)
try:
first_line = line_iter.next()
except StopIteration:
break
with open('part%d' % part, 'w') as fout:
for line in headers:
fout.write(line)
fout.write(first_line)
for line in line_iter:
fout.write(line)
part += 1
Кредиты:
Контрольный пример (поместите вышеуказанное в файл с именем mkt2.py
):
Создайте файл, содержащий 5-строчный заголовок и 1234567 строк в нем:
with open('file', 'w') as fout:
for i in range(5):
fout.write(10 * ('header %d ' % i) + '\n')
for i in range(1234567):
fout.write(10 * ('line %d ' % i) + '\n')
Скрипт для проверки (помещается в файл с именем rt.sh
):
rm part*
echo ---- file
head -n7 file
tail -n2 file
python mkt2.py
for i in part*; do
echo ---- $i
head -n7 $i
tail -n2 $i
done
Пример вывода:
$ sh rt.sh
---- file
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0
line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1
line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565
line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566
---- part1
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0
line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1
line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998
line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999
---- part2
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000
line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001
line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998
line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999
---- part3
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000
line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001
line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998
line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999
---- part4
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000
line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001
line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998
line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999
---- part5
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000
line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001
line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565
line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566
Сроки выше:
real 0m0.935s
user 0m0.708s
sys 0m0.200s
Надеюсь, это поможет.