который (is.na ()) возвращает индексированную позицию за пределами df - PullRequest
0 голосов
/ 15 января 2020

Я заранее прошу прощения за длину моих вопросов, однако R возвращает результат, который я не могу понять. Поэтому я хотел собрать как можно больше своих данных. У меня есть следующий фрейм данных:

str(CompleteData)
'data.frame':   7830 obs. of  65 variables:
 $ StateCD                                                 : chr  "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" ...
 $ Year                                                    : num  2001 2002 2003 2004 2005 ...
 $ Congress                                                : Factor w/ 9 levels "107","108","109",..: 1 1 2 2 3 3 4 4 5 5 ...
 $ AGRICULTURE                                             : Factor w/ 3 levels "0","1","2": 1 1 2 2 2 2 2 2 1 1 ...
 $ APPROPRIATIONS                                          : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 2 2 2 2 ...
 $ NATIONALSECURITY                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ FINANCIALSERVICES                                       : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ BUDGET                                                  : Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 1 1 ...
 $ EDUCATIONANDTHEWORKFORCE                                : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ENERGYANDCOMMERCE                                       : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ INTERNATIONALRELATIONS                                  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ GOVERNMENTREFORMANDOVERSIGHT                            : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ HOUSEOVERSIGHT                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ JUDICIARY                                               : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ RESOURCES                                               : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ TRANSPORTATIONANDINFRASTRUCTURE                         : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ RULES                                                   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SCIENCE                                                 : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 1 ...
 $ SMALLBUSINESS                                           : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ STANDARDSOFOFFICIALCONDUCT                              : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 2 ...
 $ VETERANSAFFAIRS                                         : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ WAYSANDMEANS                                            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ INTELLIGENCE_SELECT                                     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SELECTCOMMITTEEONHOMELANDSECURITY                       : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ LIBRARY_JOINT                                           : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ PRINTING_JOINT                                          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ TAXATION_JOINT                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ECONOMIC_JOINT                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MAJORITYWHIP                                            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MAJORITYLEADER                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SPEAKER                                                 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MINORITYLEADER                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MINORITYWHIP                                            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SCIENCEANDTECHNOLOGY                                    : Factor w/ 3 levels "0","1","2": 1 1 2 2 1 1 2 2 1 1 ...
 $ ARMEDSERVICES                                           : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ GOVERNMENTREFORM                                        : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ HOUSEADMINISTRATION                                     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ HOMELANDSECURITY                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ EDUCATIONANDLABOR                                       : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ FOREIGNAFFAIRS                                          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ OVERSIGHTANDGOVERNMENTREFORM                            : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ NATURALRESOURCES                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ ENERGYINDEPENDENCEANDGLOBALWARMING_SELECT               : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ INVESTIGATETHEVOTINGIRREGULARITIESOFAUGUST2.2007_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ EDUCATIONANDTHEWORKPLACE                                : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SCIENCE.SPACE.ANDTECHNOLOGY                             : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ ETHICS                                                  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ DEFICITREDUCTION_JOINT.SELECT                           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ASSISTANTMINORITYLEADER                                 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ EVENTSSURROUNDINGTHE2012TERRORISTATTACKONBENGHAZI_SELECT: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ NA                                                      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ Majority                                                : Factor w/ 7 levels "0","1","2","3",..: 2 2 4 4 4 4 1 1 1 1 ...
 $ Minority                                                : Factor w/ 7 levels "0","1","2","3",..: 1 1 1 1 1 1 5 5 3 3 ...
 $ MinorityAddition                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ MajorityReplacement                                     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ MinorityReplacement                                     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 2 2 1 1 ...
 $ MajorityAddition                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ OtherParty                                              : Factor w/ 2 levels "0","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ Republican                                              : Factor w/ 8 levels "0","1","2","3",..: 2 2 4 4 4 4 6 6 3 3 ...
 $ Democratic                                              : Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Independent                                             : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ candidatevotes                                          : num  0 108102 0 161067 0 ...
 $ totalvotes                                              : num  0 178687 0 255164 0 ...
 $ VoteShare                                               : num  0 60.5 0 63.1 0 ...
 $ election                                                : num  0 1 0 1 0 1 0 1 0 1 ...

Этот фрейм данных был создан путем объединения двух других фреймов данных, используя left_join. Код показан ниже:

CompleteData <- Full_Congress %>%
  mutate(Year = as.character(Year),
         Year = as.numeric(Year),
         StateCD = as.character(StateCD)) %>%
  left_join(HORElections2, by = c("StateCD", "Year" = "year")) %>%
  mutate(election = ifelse(is.na(candidatevotes), 0, 1),
    candidatevotes = ifelse(election == 1, candidatevotes, 0),
    totalvotes = ifelse(election == 1, totalvotes, 0), 
    VoteShare = ifelse(election == 1, VoteShare, 0))

И два других фрейма данных имеют следующие структуры:

str(Full_Congress)
'data.frame':   7830 obs. of  61 variables:
 $ StateCD                                                 : Factor w/ 459 levels "ALABAMA 1","ALABAMA 2",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year                                                    : Factor w/ 18 levels "2001","2002",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Congress                                                : Factor w/ 9 levels "107","108","109",..: 1 1 2 2 3 3 4 4 5 5 ...
 $ AGRICULTURE                                             : Factor w/ 3 levels "0","1","2": 1 1 2 2 2 2 2 2 1 1 ...
 $ APPROPRIATIONS                                          : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 2 2 2 2 ...
 $ NATIONALSECURITY                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ FINANCIALSERVICES                                       : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ BUDGET                                                  : Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 1 1 ...
 $ EDUCATIONANDTHEWORKFORCE                                : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ENERGYANDCOMMERCE                                       : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ INTERNATIONALRELATIONS                                  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ GOVERNMENTREFORMANDOVERSIGHT                            : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ HOUSEOVERSIGHT                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ JUDICIARY                                               : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ RESOURCES                                               : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ TRANSPORTATIONANDINFRASTRUCTURE                         : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ RULES                                                   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SCIENCE                                                 : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 1 ...
 $ SMALLBUSINESS                                           : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ STANDARDSOFOFFICIALCONDUCT                              : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 2 ...
 $ VETERANSAFFAIRS                                         : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ WAYSANDMEANS                                            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ INTELLIGENCE_SELECT                                     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SELECTCOMMITTEEONHOMELANDSECURITY                       : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ LIBRARY_JOINT                                           : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ PRINTING_JOINT                                          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ TAXATION_JOINT                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ECONOMIC_JOINT                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MAJORITYWHIP                                            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MAJORITYLEADER                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SPEAKER                                                 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MINORITYLEADER                                          : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MINORITYWHIP                                            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SCIENCEANDTECHNOLOGY                                    : Factor w/ 3 levels "0","1","2": 1 1 2 2 1 1 2 2 1 1 ...
 $ ARMEDSERVICES                                           : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ GOVERNMENTREFORM                                        : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ HOUSEADMINISTRATION                                     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ HOMELANDSECURITY                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ EDUCATIONANDLABOR                                       : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ FOREIGNAFFAIRS                                          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ OVERSIGHTANDGOVERNMENTREFORM                            : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ NATURALRESOURCES                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ ENERGYINDEPENDENCEANDGLOBALWARMING_SELECT               : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ INVESTIGATETHEVOTINGIRREGULARITIESOFAUGUST2.2007_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ EDUCATIONANDTHEWORKPLACE                                : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ SCIENCE.SPACE.ANDTECHNOLOGY                             : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ ETHICS                                                  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ DEFICITREDUCTION_JOINT.SELECT                           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ASSISTANTMINORITYLEADER                                 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ EVENTSSURROUNDINGTHE2012TERRORISTATTACKONBENGHAZI_SELECT: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ NA                                                      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ Majority                                                : Factor w/ 7 levels "0","1","2","3",..: 2 2 4 4 4 4 1 1 1 1 ...
 $ Minority                                                : Factor w/ 7 levels "0","1","2","3",..: 1 1 1 1 1 1 5 5 3 3 ...
 $ MinorityAddition                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ MajorityReplacement                                     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ MinorityReplacement                                     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 2 2 1 1 ...
 $ MajorityAddition                                        : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ OtherParty                                              : Factor w/ 2 levels "0","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ Republican                                              : Factor w/ 8 levels "0","1","2","3",..: 2 2 4 4 4 4 6 6 3 3 ...
 $ Democratic                                              : Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Independent                                             : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...

и

str(HORElections2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   3915 obs. of  5 variables:
 $ StateCD       : chr  "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" ...
 $ year          : num  2002 2004 2006 2008 2010 ...
 $ candidatevotes: num  108102 161067 112944 210660 129063 ...
 $ totalvotes    : num  178687 255164 165841 214367 156281 ...
 $ VoteShare     : num  60.5 63.1 68.1 98.3 82.6 ...

Я хотел бы проверить если новый фрейм данных (CompleteData) имеет какие-либо пропущенные (NA) значения, используйте следующий код:

which(is.na(CompleteData))
[1] 495145

Однако фрейм данных CompleteData содержит только 7 830 строк.

dim(CompleteData)
[1] 7830   65

Почему R возвращает индекс строки, который находится далеко за пределами диапазона строк во фрейме данных? Так как 495,145 больше, чем 7830 (количество строк во фрейме данных), означает ли это, что в фрейме данных нет NA?

1 Ответ

0 голосов
/ 15 января 2020

Чтобы получить строки, вы можете сделать следующее

# create a dataframe that has one NA in row 1 and 3 and two in row 4
df1 <- data.frame(a = c(1,2,NA,NA)
                  , b = c(NA,2,3,NA))

# now...
df1 %>% # take the dataframe
    mutate_all(is.na) %>% # turn every column into a logical that tells if the value is an NA
    reduce(`|`) # and then reduce one column after another using the OR-function

Это дает вам логический вектор TRUE, если есть какой-либо столбец с NA. Если вы хотите индексы, вы можете добавить which()

df1 %>%
    mutate_all(is.na) %>%
    reduce(`|`) %>%
    which()
...