Как читать 10 записей каждый раз из CSV в python или Pyspark? - PullRequest
0 голосов
/ 26 апреля 2020

У меня есть CSV-файл с 100 000 строк, и я хочу прочитать по 10 строк за раз и обработать каждую строку, чтобы каждый раз сохранять в соответствующий файл и спать в течение 5 секунд. Я пытаюсь Nslice, но он читает только первые 10 и останавливается. Я хочу, чтобы программа работала до EOF. Я использую jupyter, python2 & pyspark, если это какая-либо помощь.

from itertools import islice
with open("per-vehicle-records-2020-01-31.csv") as f:
    while True:
        next_n_lines = list(islice(f, 10))
        if not next_n_lines:
            break
        else:
            print(next_n_lines)
            sleep(5)

это не разделяет каждую строку. Он объединяет 10 строк в список

['"cosit","year","month","day","hour","minute","second","millisecond","minuteofday","lane","lanename","straddlelane","straddlelanename","class","classname","length","headway","gap","speed","weight","temperature","duration","validitycode","numberofaxles","axleweights","axlespacings"\n', '"000000000997","2020","1","31","1","30","2","0","90","1","Test1","0","","5","HGV_RIG","11.4","2.88","3.24","70.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","3","0","90","2","Test2","0","","2","CAR","5.2","3.17","2.92","71.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","5","0","90","1","Test1","0","","2","CAR","5.1","2.85","2.51","70.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","6","0","90","2","Test2","0","","2","CAR","5.1","3.0","2.94","69.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","9","0","90","1","Test1","0","","5","HGV_RIG","11.5","3.45","3.74","70.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","10","0","90","2","Test2","0","","2","CAR","5.4","3.32","3.43","71.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","13","0","90","2","Test2","0","","2","CAR","5.3","3.19","3.23","71.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","13","0","90","1","Test1","0","","2","CAR","5.2","3.45","3.21","70.0","0.0","0.0","0","0","0","",""\n', '"000000000997","2020","1","31","1","30","16","0","90","1","Test1","0","","5","HGV_RIG","11.0","2.9","3.13","69.0","0.0","0.0","0","0","0","",""\n']

Ответы [ 2 ]

0 голосов
/ 26 апреля 2020

islice перезапустите генератор, поэтому вам нужно повторить его после присвоения

from itertools import islice
with open("per-vehicle-records-2020-01-31.csv") as f:
    while True:
        next_n_lines = islice(f, 10)
        if not next_n_lines:
            break
        else:
            for line in next_n_lines:
               print(line)
            sleep(5)

вы читаете больше здесь Как читать файл N строк одновременно в Python

0 голосов
/ 26 апреля 2020

Это должно работать:

import pandas as pd
import time
path_data = 'per-vehicle-records-2020-01-31.csv'

reader = pd.read_csv(path_data, sep=';', chunksize=10, iterator=True)
for i in reader:
    df = next(reader)
    print(df)
    time.sleep(5) 

Размер фрагмента будет читать каждые 10 строк, а для l oop следует убедиться, что они читаются таким образом, и спать 5 секунд между каждой итерацией.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...