У меня есть каталог, полный .csv
файлов, сжатых с использованием lzma
. Пример сжатого имени файла:
tradeopt.is-pnl.BBG_XASX_WES_S-BBG_XASX_WOW_S.csv.lzma
Я могу распаковать файлы, но у меня проблема с чтением .csv
во фрейм данных панд. Мне интересно, правильно ли я распаковал .csv
?
Ниже приведен пример того, как a .csv
должен выглядеть после распаковки, и что я хотел бы прочитать в кадре данных (df
). Обратите внимание на структуру файла (в ней есть отдельные строки информации для каждой даты):
date,BBG.XASX.WES.S_price,BBG.XASX.WES.S_pos,BBG.XASX.WES.S_item,BBG.XASX.WES.S_cost,BBG.XASX.WES.S_pnl_pre_cost,BBG.XASX.WES.S_pnl_pos_cost,BBG.XASX.WES.S_pnl_per_pre_cost,BBG.XASX.WES.S_pnl_per_post_cost,BBG.XASX.WOW.S_price,BBG.XASX.WOW.S_pos,BBG.XASX.WOW.S_item,BBG.XASX.WOW.S_cost,BBG.XASX.WOW.S_pnl_pre_cost,BBG.XASX.WOW.S_pnl_pos_cost,BBG.XASX.WOW.S_pnl_per_pre_cost,BBG.XASX.WOW.S_pnl_per_post_cost,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos,max_position,BBG.XASX.WES.S_round_lot_size,BBG.XASX.WES.S_price_acc,BBG.XASX.WOW.S_round_lot_size,BBG.XASX.WOW.S_price_acc
2017-09-18,31.2514928,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,19.387644,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,5247540.857991701,1,31.2514928,1,19.387644
2017-09-19,31.406473100000003,-167085.0,-167085.0,-708.4193253183225,0.0,-708.4193253183225,-0.0,-0.0135,19.4242966,270154.0,270154.0,-708.4194421963141,0.0,-708.4194421963141,-0.0,-0.0135,-1416.8387675146366,0.0,-1416.8387675146366,-0.0,-0.027,5247540.857991701,1,31.406473100000003,1,19.4242966
2017-09-20,31.5733096,-167085.0,0.0,-0.0,-27875.876602499746,-27875.876602499746,-0.5284099200040693,-0.5284099200040693,19.5654078,270154.0,0.0,-0.0,38121.755124798976,38121.755124798976,0.7264674902050183,0.7264674902050183,-0.0,10245.87852229923,10245.87852229923,0.19805757020094905,0.19805757020094905,5247540.857991701,1,31.5733096,1,19.5654078
2017-09-21,30.9462009,-167085.0,0.0,-0.0,104780.45713950042,104780.45713950042,2.0264480994822254,2.0264480994822254,19.3406687,270154.0,0.0,-0.0,-60714.1668213997,-60714.1668213997,-1.1486553323974191,-1.1486553323974191,-0.0,44066.29031810071,44066.29031810071,0.8777927670848062,0.8777927670848062,5247540.857991701,1,30.9462009,1,19.3406687
2017-09-22,31.238218300000003,-167085.0,0.0,-0.0,-48791.727278999984,-48791.727278999984,-0.934808116121022,-0.934808116121022,19.4537909,270154.0,0.0,-0.0,30560.414818800986,30560.414818800986,0.584892910140189,0.584892910140189,-0.0,-18231.312460199,-18231.312460199,-0.34991520598083303,-0.34991520598083303,5247540.857991701,1,31.238218300000003,1,19.4537909
2017-09-25,31.013542,-167085.0,0.0,-0.0,37540.039585500024,37540.039585500024,0.7244457920994707,0.7244457920994707,19.2197338,270154.0,0.0,-0.0,-63231.46179340035,-63231.46179340035,-1.2031439075455563,-1.2031439075455563,-0.0,-25691.422207900323,-25691.422207900323,-0.47869811544608565,-0.47869811544608565,5247540.857991701,1,31.013542,1,19.2197338
2017-09-26,30.980316,0.0,167085.0,-698.8067233461,5551.566210000776,4852.759486654676,0.10724874465450895,0.09374874465450896,19.2044031,0.0,-270154.0,-700
Теперь, пожалуйста, посмотрите вывод, сгенерированный с использованием приведенного ниже кода, который я скопировал с консоли из файла, который я распаковал. Снова обратите внимание на структуру файла (на этот раз все строки информации находятся в одной строке:
b'date,BBG.XASX.WES.S_price,BBG.XASX.WES.S_pos,BBG.XASX.WES.S_item,BBG.XASX.WES.S_cost,BBG.XASX.WES.S_pnl_pre_cost,BBG.XASX.WES.S_pnl_pos_cost,BBG.XASX.WES.S_pnl_per_pre_cost,BBG.XASX.WES.S_pnl_per_post_cost,BBG.XASX.WOW.S_price,BBG.XASX.WOW.S_pos,BBG.XASX.WOW.S_item,BBG.XASX.WOW.S_cost,BBG.XASX.WOW.S_pnl_pre_cost,BBG.XASX.WOW.S_pnl_pos_cost,BBG.XASX.WOW.S_pnl_per_pre_cost,BBG.XASX.WOW.S_pnl_per_post_cost,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos,max_position,BBG.XASX.WES.S_round_lot_size,BBG.XASX.WES.S_price_acc,BBG.XASX.WOW.S_round_lot_size,BBG.XASX.WOW.S_price_acc\n2017-09-18,31.2514928,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,19.387644,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,5247540.857991701,1,31.2514928,1,19.387644\n2017-09-19,31.406473100000003,-167085.0,-167085.0,-708.4193253183225,0.0,-708.4193253183225,-0.0,-0.0135,19.4242966,270154.0,270154.0,-708.4194421963141,0.0,-708.4194421963141,-0.0,-0.0135,-1416.8387675146366,0.0,-1416.8387675146366,-0.0,-0.027,5247540.857991701,1,31.406473100000003,1,19.4242966\n2017-09-20,31.5733096,-167085.0,0.0,-0.0,-27875.876602499746,-27875.876602499746,-0.5284099200040693,-0.5284099200040693,19.5654078,270154.0,0.0,-0.0,38121.755124798976,38121.755124798976,0.7264674902050183,0.7264674902050183,-0.0,10245.87852229923,10245.87852229923,0.19805757020094905,0.19805757020094905,5247540.857991701,1,31.5733096,1,19.5654078\n2017-09-21,30.9462009,-167085.0,0.0,-0.0,104780.45713950042,104780.45713950042,2.0264480994822254,2.0264480994822254,19.3406687,270154.0,0.0,-0.0,-60714.1668213997,-60714.1668213997,-1.1486553323974191,-1.1486553323974191,-0.0,44066.29031810071,44066.29031810071,0.8777927670848062,0.8777927670848062,5247540.857991701,1,30.9462009,1,19.3406687\n2017-09-22,31.238218300000003,-167085.0,0.0,-0.0,-48791.727278999984,-48791.727278999984,-0.934808116121022,-0.934808116121022,19.4537909,270154.0,0.0,-0.0,30560.414818800986,30560.414818800986,0.584892910140189,0.584892910140189,-0.0,-18231.312460199,-18231.312460199,-0.34991520598083303,-0.34991520598083303,5247540.857991701,1,31.238218300000003,1,19.4537909\n2017-09-25,31.013542,-167085.0,0.0,-0.0,37540.039585500024,37540.039585500024,0.7244457920994707,0.7244457920994707,19.2197338,270154.0,0.0,-0.0,-63231.46179340035,-63231.46179340035,-1.2031439075455563,-1.2031439075455563,-0.0,-25691.422207900323,-25691.422207900323,-0.47869811544608565,-0.47869811544608565,5247540.857991701,1,31.013542,1,19.2197338\n2017-09-26,30.980316,0.0,167085.
Код, который я пытаюсь использовать, выглядит следующим образом:
for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True):
print(subdirname)
fnamenoextenstion = subdirname.split('.lzma')[0]
with lzma.open(subdirname) as f:
file_content = f.read()
df=pd.read_csv(f,engine='c',header=0,parse_dates=[0],index_col=0)
print(df.head())
Я получаю ошибку:
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 538, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:6171)
EmptyDataError: No columns to parse from file
Для информации я использую Python версии 3.6.0
Любая помощь для переноса данных в информационный фрейм будет очень признательна
Спасибо