Может быть, это может помочь, прежде всего, давайте введем функцию arules
itemInfo()
:
library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
4 ham sausage meat and sausage
5 meat sausage meat and sausage
6 finished products sausage meat and sausage
Теперь, как вы сказали, у Groceries
есть пара уровней, в других руках ваши:
trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
labels
1 A001
2 A002
3 A003
4 A004
5 A005
Теперь вы должны добавить его к своим данным, чтобы вы могли сделать это:
library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)
Теперь:
itemInfo(trans4)
labels level1
1 A001 A01
2 A002 A01
3 A003 A02
4 A005 A03
5 A004 A03
И
str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
.. .. ..@ p : int [1:6] 0 2 5 6 8 11
.. .. ..@ Dim : int [1:2] 5 5
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 5 obs. of 2 variables:
.. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
.. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3 # here we go!!!
..@ itemsetInfo:'data.frame': 5 obs. of 1 variable:
.. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...
С данными:
dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04",
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L,
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003",
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L,
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02",
"A03"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))