R data.table fread не работает на специальном символе - PullRequest
0 голосов
/ 26 января 2019

Я могу дать вам только изображение данных, с которыми я работаю, или персонажа, который создает мои проблемы в файле .csv. Я не знаю, как получить этого персонажа.

pillar character is stopping fread

Этот персонаж столпа останавливает работу фреда. Есть ли, чтобы избежать этого? readr read_csv работает через них без проблем. Я пытался удалить, сделать его символьным столбцом, использовать comment.char = "", но, похоже, ничего не работает.

Вот то, что я надеюсь получить (то, что я получаю с read_csv)

# A tibble: 5 x 4
     X1 trade date       trade_condition
  <dbl> <dbl> <date>     <chr>          
1  2902  28.3 2019-01-14 -12------P---- 
2  2903  28.0 2019-01-14 P              
3  2904  28.0 2019-01-14 P              
4  2905  28.0 2019-01-14 P              
5  2906  28.1 2019-01-14 P 

Я использую data.table_1.12.0

Вот многословный = T

omp_get_max_threads() = 8
omp_get_thread_limit() = 2147483647
DTthreads = 0
RestoreAfterFork = true
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=8)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file C:/Users/Markku/Desktop/KONECRANES_2019.01.14/trades.csv
  File opened, size = 592KB (606768 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<,trade,date,trade_condition,sy>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 100 lines of 9 fields using quote rule 0
  Detected 9 columns on line 1. This line is either column names or first data row. Line starts as: <<,trade,date,trade_condition,sy>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 9
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 10 because (606767 bytes from row 1 to eof) / (2 * 27623 jump0size) == 10
  Type codes (jump 000)    : 57AAAA5AA  Quote rule 0
  A line with too-few fields (4/9) was found on line 4 of sample jump 7. Most likely this jump landed awkwardly so type bumps here will be skipped.
  A line with too-few fields (4/9) was found on line 13 of sample jump 9. Most likely this jump landed awkwardly so type bumps here will be skipped.
  Type codes (jump 010)    : 57AAAA5AA  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 858 sample rows
  =====
  Sampled 858 rows (handled \n inside quoted fields) at 11 jump points
  Bytes from first data row on line 2 to the end of last row: 606683
  Line length: mean=213.01 sd=86.78 min=59 max=372
  Estimated number of rows: 606683 / 213.01 = 2849
  Initial alloc = 5698 rows (2849 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : 57AAAA5AA
[10] Allocate memory for the datatable
  Allocating 9 column slots (9 - 0 dropped) with 5698 rows
[11] Read the data
  jumps=[0..1), chunk_size=606683, total_size=606683
  Restarting team from jump 0. nSwept==0 quoteRule==1
  jumps=[0..1), chunk_size=606683, total_size=606683
  Restarting team from jump 0. nSwept==0 quoteRule==2
  jumps=[0..1), chunk_size=606683, total_size=606683
  Restarting team from jump 0. nSwept==0 quoteRule==3
  jumps=[0..1), chunk_size=606683, total_size=606683
Read 2903 rows x 9 columns from 592KB (606768 bytes) file in 00:00.014 wall clock time
[12] Finalizing the datatable
  Type counts:
         2 : int32     '5'
         1 : float64   '7'
         6 : string    'A'
=============================
   0.003s ( 21%) Memory map 0.001GB file
   0.007s ( 50%) sep=',' ncol=9 and header detection
   0.000s (  0%) Column type detection using 858 sample rows
   0.000s (  0%) Allocation of 5698 rows x 9 cols (0.000GB) of which 2903 ( 51%) rows used
   0.004s ( 29%) Reading 1 chunks (0 swept) of 0.579MB (each chunk 2903 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.002s ( 14%) Transpose
   +    0.002s ( 14%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.014s        Total

Warning message:
In fread(trades_file, verbose = T) :
  Stopped early on line 2905. Expected 9 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<2903,28.04,2019-01-14,"P>>
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...