Integer64 для преобразования времени данных в выпуске R? - PullRequest
0 голосов
/ 07 марта 2019

Учитывая следующее dataframe из integer64 эпохи unix:

data_df <- structure(list(time_stamp = structure(c(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396
), class = "integer64")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

Я хочу преобразовать его в дату (as.POSIXct или anytime()), но получаю ошибку:

    data_df %>%
  dplyr::select(time_stamp) %>% 
  head(10) %>%
  dplyr::mutate(dt = anytime(time_stamp)) %>% dput()

Дает:

structure(list(time_stamp = structure(c(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396
    ), class = "integer64"), dt = structure(c(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396
    ), class = c("POSIXct", "POSIXt"), tzone = "Etc/UTC")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -10L))

data_df %>%
  dplyr::select(time_stamp) %>% 
  head(10) %>%
  dplyr::mutate(dt = as.POSIXct(time_stamp))

Ошибка в as.POSIXct.default (time_stamp): не знаю, как преобразовать 'time_stamp' в класс “POSIXct”

Посоветуйте, пожалуйста, как справиться с integer64 эпохами.

1 Ответ

1 голос
/ 10 марта 2019

Простите за прямой язык, но ваш вопрос не имеет смысла.Взять первый элемент вашего набора данных: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396.Это просто невозможно представить в любых перечисленных вами типов данных.В том числе integer64.Полная остановка.

Теперь, так получилось, что мой пакет nanotime делает это в наилучшем доступном разрешении , что составляет наносекунды, представленные в 64-целых числах.И 64-битные целые числа позволяют наносекундные приращения начиная с эпохи, с точностью около 19 цифр.Не более 100 цифр, которые вы требовали.Переменная (маленькая память) не может.

Что касается nanotime, example() показывает некоторые варианты использования, включая синтаксический анализ:

R> library(nanotime)
R> example(nanotime)

nanotmR> x <- nanotime("1970-01-01T00:00:00.000000001+00:00")

nanotmR> print(x)
[1] "1970-01-01T00:00:00.000000001+00:00"

nanotmR> x <- x + 1

nanotmR> print(x)
[1] "1970-01-01T00:00:00.000000002+00:00"

nanotmR> format(x)
[1] "1970-01-01T00:00:00.000000002+00:00"

nanotmR> x <- x + 10

nanotmR> print(x)
[1] "1970-01-01T00:00:00.000000012+00:00"

nanotmR> format(x)
[1] "1970-01-01T00:00:00.000000012+00:00"

nanotmR> format(nanotime(Sys.time()) + 1:3)  # three elements each 1 ns apart
[1] "2019-03-10T20:06:53.534292001+00:00" "2019-03-10T20:06:53.534292002+00:00" 
[3] "2019-03-10T20:06:53.534292003+00:00"
R> 

Лучше всего, data.table поддерживаетinteger64 тип пакетов bit64, который используется здесь.Основываясь на примере:

R> library(data.table)
data.table 1.12.0  Latest news: r-datatable.com
R> dt <- data.table(ns = nanotime(Sys.time()) + 1:3)
R> dt[]
                                    ns
1: 2019-03-10T20:08:48.165136001+00:00
2: 2019-03-10T20:08:48.165136002+00:00
3: 2019-03-10T20:08:48.165136003+00:00
R> dt[, pt := as.POSIXct(ns)]
R> dt[]
                                    ns                         pt
1: 2019-03-10T20:08:48.165136001+00:00 2019-03-10 15:08:48.165136
2: 2019-03-10T20:08:48.165136002+00:00 2019-03-10 15:08:48.165136
3: 2019-03-10T20:08:48.165136003+00:00 2019-03-10 15:08:48.165136
R> 

Я использую это двойное представление гранулярности наносекунды с представлением POSIXct для использования R, включая построение графиков в течение всего дня.(Обратите внимание, что существует ошибка форматирования, которая показывает столбец nanotime / integer64 в UTC, но базовое представление является правильным и корректным, как показывает преобразование pt в POSIXct. В настоящее время это происходит только после 15:00 в моем часовом поясе.)

...