У меня есть данные, как показано ниже, которые я хочу сделать предварительный анализ:
selected <- c("1", "1", "1", "0", "1", "0", "0", "1", "0", "0", NA)
teammember1 <- c("M", "M", "F", "M", "M", "F", "M", "M", "M", "F", "M")
teammember2 <- c("M", "M", "M", "M", "M", "M", "M", "F", "M", "F", "F")
teammember3 <- c("M", "M", "", "", "", "", "M", "", "M", "F", "")
selection <- data.frame(teammember1, teammember2, teammember3, selected)
Я бы хотел получить фрейм данных, показывающий вероятность выбора, если в команде есть женщина, а не женщина.
Я использовал sqldf запросов, как показано ниже:
library(sqldf)
selectcomp <- sqldf("SELECT *
FROM selection
WHERE selected NOT NULL
")
selectcomp
countnotNull <- dplyr::count(selectcomp)
withF <- sqldf("SELECT *
FROM selectcomp
WHERE (teammember1 LIKE '%F%'
OR teammember2 LIKE '%F%'
OR teammember3 LIKE '%F%')
AND selected LIKE '%1%'
")
onlyM <- sqldf("SELECT *
FROM selectcomp
WHERE (teammember1 NOT LIKE '%F%'
AND teammember2 NOT LIKE '%F%'
AND teammember3 NOT LIKE '%F%')
AND selected LIKE '%1%'
")
countwithF <- dplyr::count(withF)
countonlyM <- dplyr::count(onlyM)
probwithF <- (countwithF/countnotNull)*100
probonlyM <- (countonlyM/countnotNull)*100
comparison <- data.frame(probwithF, probonlyM)
comparison
colnames(comparison) <- c("probwithF", "probonlyM")
library(tidyr)
comparison <- comparison %>%
tidyr::gather(type, prob)
comparison
Как будет выглядеть схема, использующая тидир и Magrittr труба (%>%) полностью?