После обновления R до версии 3.5.1 и обновления до последней версии (v. 1.11.18) data.table fread () зависает при вызове некоторых файлов, но не других.
> test_1<-fread("Dec_1_10.csv", verbose=TRUE)
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 4 threads (omp_get_max_threads()=4, nth=4)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
Opening file Dec_1_10.csv
File opened, size = 334.9MB (351129569 bytes).
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
'. Final end-of-line is missing. Using cow page to write 0 to the last byte.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <<ID,NAME,GENDE>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
sep=',' with 1 lines of 26029650 fields using quote rule 0
sep=',' with 9 lines of 31 fields using quote rule 2
Detected 31 columns on line 2. This line is either column names or first data row. Line starts as: <<0126_V3","DSRI",>>
Quote rule picked = 2
fill=false and the most number of columns found is 31
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 1 because (102965936 bytes from row 1 to eof) / (2 * 91563770 jump0size) == 0
A line with too-many fields (31/31) was found on line 9 of sample jump 0.
Type codes (jump 000) : AAAA2AAA52AAAAAAAA2AA22AAAAAA2A Quote rule 2
Types in 1st data row match types in 2nd data row but previous row has 18402118 fields. Taking previous row as column names. All rows were sampled since file is small so we know nrow=8 exactly
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : AAAA2AAA52AAAAAAAA2AA22AAAAAA2A2222222222222222222222222222222222222222222222222...2222222222
[10] Allocate memory for the datatable
Allocating 18402118 column slots (18402118 - 0 dropped) with 8 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=102965936
Too few rows allocated. Allocating additional 1024 rows (now nrows=1032) and continue reading from jump 0
.,и затем зависает здесь, пока я не принудительно завершу работу R.
Вызов fread () для других файлов .csv, кажется, работает нормально, но все файлы, которые у меня есть этой конкретной структуры / размера, не могут быть проанализированы.
Редактировать: я позволяю сеансу R работать несколько часов, а не принудительно завершать его через несколько минут.
Error: vector memory exhausted (limit reached?)
In addition: Warning messages:
1: In FUN(X[[i]], ...) :
Found and resolved improper quoting in first 100 rows. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
2: In FUN(X[[i]], ...) :
Detected 5471442 column names but the data has 31 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
Я попытался пропустить первую строку данных и указать имена столбцов.Похоже, ни один из них не решает проблему.