У меня есть сотни файлов по 10 Гбайт каждый. Мне нужно переформатировать файлы и объединить их в более удобный формат таблицы, чтобы я мог группировать, суммировать, усреднять и т. Д. Данные. Переформатирование данных с использованием Python заняло бы больше недели. Даже после того, как я переформатирую его в таблицу, я не знаю, будет ли он слишком большим для фрейма данных, но одна проблема за раз.
Кто-нибудь может предложить более быстрый способ переформатирования текстовых файлов? Я рассмотрю что-нибудь C ++, Perl и т. Д.
Пример данных:
Scenario: Modeling_5305 (0.0001)
Position: NORTHERN UTILITIES SR NT,
" ","THEO/Effective Duration","THEO/Yield","THEO/Implied Spread","THEO/Value","THEO/Price","THEO/Outstanding Balance","THEO/Effective Convexity","ID","WAL","Type","Maturity Date","Coupon Rate","POS/Position Units","POS/Portfolio","POS/User Defined 1","POS/SE Cash 1","User Defined 2","CMO WAL","Spread Over Yield",
"2017/12/31",16.0137 T,4.4194 % SEMI 30/360,0.4980 % SEMI 30/360,"6,934,452.0000 USD","6,884,052.0000 USD","7,000,000.0000 USD",371.6160 T,CachedFilterPartitions-PL_SPLITTER.2:665876C#3,29.8548 T,Fixed Rate Bond,2047/11/01,4.3200 % SEMI 30/360,"70,000.0000",All Portfolios,030421000,0.0000 USD,FRB,N/A,0.4980 % SEMI 30/360,
"2018/01/12",15.5666 T,4.8499 % SEMI 30/360,0.4980 % SEMI 30/360,"6,477,803.7492 USD","6,418,163.7492 USD","7,000,000.0000 USD",356.9428 T,CachedFilterPartitions-PL_SPLITTER.2:665876C#3,29.8219 T,Fixed Rate Bond,2047/11/01,4.3200 % SEMI 30/360,"70,000.0000",All Portfolios,030421000,0.0000 USD,FRB,N/A,0.4980 % SEMI 30/360,
Scenario: Modeling_5305 (0.0001)
Position: OLIVIA ISSUER TR SER A (A,
" ","THEO/Effective Duration","THEO/Yield","THEO/Implied Spread","THEO/Value","THEO/Price","THEO/Outstanding Balance","THEO/Effective Convexity","ID","WAL","Type","Maturity Date","Coupon Rate","POS/Position Units","POS/Portfolio","POS/User Defined 1","POS/SE Cash 1","User Defined 2","CMO WAL","Spread Over Yield",
"2017/12/31",1.3160 T,19.0762 % SEMI 30/360,0.2990 % SEMI 30/360,"3,862,500.0000 USD","3,862,500.0000 USD","5,000,000.0000 USD",2.3811 T,CachedFilterPartitions-PL_SPLITTER.2:681071AA4,1.3288 T,Interest Rate Index Linked Note,2019/05/30,0.0000 % MON 30/360,"50,000.0000",All Portfolios,010421002,0.0000 USD,IRLIN,N/A,0.2990 % SEMI 30/360,
"2018/01/12",1.2766 T,21.9196 % SEMI 30/360,0.2990 % SEMI 30/360,"3,815,391.3467 USD","3,815,391.3467 USD","5,000,000.0000 USD",2.2565 T,CachedFilterPartitions-PL_SPLITTER.2:681071AA4,1.2959 T,Interest Rate Index Linked Note,2019/05/30,0.0000 % MON 30/360,"50,000.0000",All Portfolios,010421002,0.0000 USD,IRLIN,N/A,0.2990 % SEMI 30/360,
Я хотел бы переформатировать в эту таблицу CSV, чтобы я мог импортировать в dataframe:
Position, Scenario, TimeSteps, THEO/Value
NORTHERN UTILITIES SR NT, Modeling_5305, 2018/01/12, 6477803.7492
OLIVIA ISSUER TR SER A (A, Modeling_5305, 2018/01/12, 3815391.3467