我为问题的长度事先表示歉意,但是R返回的输出我无法理解。因此,我想要尽可能多的数据。我有以下数据框:
str(CompleteData)
'data.frame': 7830 obs. of 65 variables:
$ StateCD : chr "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" ...
$ Year : num 2001 2002 2003 2004 2005 ...
$ Congress : Factor w/ 9 levels "107","108","109",..: 1 1 2 2 3 3 4 4 5 5 ...
$ AGRICULTURE : Factor w/ 3 levels "0","1","2": 1 1 2 2 2 2 2 2 1 1 ...
$ APPROPRIATIONS : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 2 2 2 2 ...
$ NATIONALSECURITY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ FINANCIALSERVICES : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ BUDGET : Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 1 1 ...
$ EDUCATIONANDTHEWORKFORCE : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ENERGYANDCOMMERCE : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ INTERNATIONALRELATIONS : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ GOVERNMENTREFORMANDOVERSIGHT : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ HOUSEOVERSIGHT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ JUDICIARY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ RESOURCES : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ TRANSPORTATIONANDINFRASTRUCTURE : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ RULES : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SCIENCE : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 1 ...
$ SMALLBUSINESS : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ STANDARDSOFOFFICIALCONDUCT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 2 ...
$ VETERANSAFFAIRS : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ WAYSANDMEANS : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ INTELLIGENCE_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SELECTCOMMITTEEONHOMELANDSECURITY : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ LIBRARY_JOINT : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ PRINTING_JOINT : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ TAXATION_JOINT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ECONOMIC_JOINT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MAJORITYWHIP : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MAJORITYLEADER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SPEAKER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MINORITYLEADER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MINORITYWHIP : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SCIENCEANDTECHNOLOGY : Factor w/ 3 levels "0","1","2": 1 1 2 2 1 1 2 2 1 1 ...
$ ARMEDSERVICES : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ GOVERNMENTREFORM : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ HOUSEADMINISTRATION : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ HOMELANDSECURITY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ EDUCATIONANDLABOR : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ FOREIGNAFFAIRS : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ OVERSIGHTANDGOVERNMENTREFORM : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ NATURALRESOURCES : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ENERGYINDEPENDENCEANDGLOBALWARMING_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ INVESTIGATETHEVOTINGIRREGULARITIESOFAUGUST2.2007_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ EDUCATIONANDTHEWORKPLACE : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SCIENCE.SPACE.ANDTECHNOLOGY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ETHICS : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ DEFICITREDUCTION_JOINT.SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ASSISTANTMINORITYLEADER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ EVENTSSURROUNDINGTHE2012TERRORISTATTACKONBENGHAZI_SELECT: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ NA : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Majority : Factor w/ 7 levels "0","1","2","3",..: 2 2 4 4 4 4 1 1 1 1 ...
$ Minority : Factor w/ 7 levels "0","1","2","3",..: 1 1 1 1 1 1 5 5 3 3 ...
$ MinorityAddition : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ MajorityReplacement : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ MinorityReplacement : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 2 2 1 1 ...
$ MajorityAddition : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ OtherParty : Factor w/ 2 levels "0","2": 1 1 1 1 1 1 1 1 1 1 ...
$ Republican : Factor w/ 8 levels "0","1","2","3",..: 2 2 4 4 4 4 6 6 3 3 ...
$ Democratic : Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Independent : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ candidatevotes : num 0 108102 0 161067 0 ...
$ totalvotes : num 0 178687 0 255164 0 ...
$ VoteShare : num 0 60.5 0 63.1 0 ...
$ election : num 0 1 0 1 0 1 0 1 0 1 ...
此数据帧是通过使用left_join将两个其他数据帧组合在一起而创建的。代码显示在下面:
CompleteData <- Full_Congress %>%
mutate(Year = as.character(Year),
Year = as.numeric(Year),
StateCD = as.character(StateCD)) %>%
left_join(HORElections2, by = c("StateCD", "Year" = "year")) %>%
mutate(election = ifelse(is.na(candidatevotes), 0, 1),
candidatevotes = ifelse(election == 1, candidatevotes, 0),
totalvotes = ifelse(election == 1, totalvotes, 0),
VoteShare = ifelse(election == 1, VoteShare, 0))
以及另外两个数据帧具有以下结构:
str(Full_Congress)
'data.frame': 7830 obs. of 61 variables:
$ StateCD : Factor w/ 459 levels "ALABAMA 1","ALABAMA 2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Year : Factor w/ 18 levels "2001","2002",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Congress : Factor w/ 9 levels "107","108","109",..: 1 1 2 2 3 3 4 4 5 5 ...
$ AGRICULTURE : Factor w/ 3 levels "0","1","2": 1 1 2 2 2 2 2 2 1 1 ...
$ APPROPRIATIONS : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 2 2 2 2 ...
$ NATIONALSECURITY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ FINANCIALSERVICES : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ BUDGET : Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 1 1 ...
$ EDUCATIONANDTHEWORKFORCE : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ENERGYANDCOMMERCE : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ INTERNATIONALRELATIONS : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ GOVERNMENTREFORMANDOVERSIGHT : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ HOUSEOVERSIGHT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ JUDICIARY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ RESOURCES : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ TRANSPORTATIONANDINFRASTRUCTURE : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ RULES : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SCIENCE : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 1 ...
$ SMALLBUSINESS : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ STANDARDSOFOFFICIALCONDUCT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 2 ...
$ VETERANSAFFAIRS : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ WAYSANDMEANS : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ INTELLIGENCE_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SELECTCOMMITTEEONHOMELANDSECURITY : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ LIBRARY_JOINT : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ PRINTING_JOINT : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ TAXATION_JOINT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ECONOMIC_JOINT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MAJORITYWHIP : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MAJORITYLEADER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SPEAKER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MINORITYLEADER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ MINORITYWHIP : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SCIENCEANDTECHNOLOGY : Factor w/ 3 levels "0","1","2": 1 1 2 2 1 1 2 2 1 1 ...
$ ARMEDSERVICES : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ GOVERNMENTREFORM : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ HOUSEADMINISTRATION : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ HOMELANDSECURITY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ EDUCATIONANDLABOR : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ FOREIGNAFFAIRS : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ OVERSIGHTANDGOVERNMENTREFORM : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ NATURALRESOURCES : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ENERGYINDEPENDENCEANDGLOBALWARMING_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ INVESTIGATETHEVOTINGIRREGULARITIESOFAUGUST2.2007_SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ EDUCATIONANDTHEWORKPLACE : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ SCIENCE.SPACE.ANDTECHNOLOGY : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ETHICS : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ DEFICITREDUCTION_JOINT.SELECT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ASSISTANTMINORITYLEADER : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ EVENTSSURROUNDINGTHE2012TERRORISTATTACKONBENGHAZI_SELECT: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ NA : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Majority : Factor w/ 7 levels "0","1","2","3",..: 2 2 4 4 4 4 1 1 1 1 ...
$ Minority : Factor w/ 7 levels "0","1","2","3",..: 1 1 1 1 1 1 5 5 3 3 ...
$ MinorityAddition : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ MajorityReplacement : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ MinorityReplacement : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 2 2 1 1 ...
$ MajorityAddition : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ OtherParty : Factor w/ 2 levels "0","2": 1 1 1 1 1 1 1 1 1 1 ...
$ Republican : Factor w/ 8 levels "0","1","2","3",..: 2 2 4 4 4 4 6 6 3 3 ...
$ Democratic : Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Independent : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
和
str(HORElections2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3915 obs. of 5 variables:
$ StateCD : chr "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" "ALABAMA 1" ...
$ year : num 2002 2004 2006 2008 2010 ...
$ candidatevotes: num 108102 161067 112944 210660 129063 ...
$ totalvotes : num 178687 255164 165841 214367 156281 ...
$ VoteShare : num 60.5 63.1 68.1 98.3 82.6 ...
我想使用以下代码测试新数据帧(CompleteData)是否缺少任何(NA)值:
which(is.na(CompleteData))
[1] 495145
但是,CompleteData数据帧仅包含7,830行。
dim(CompleteData)
[1] 7830 65
为什么R返回的数据行索引远远超出数据帧中的行范围?由于495,145大于7,830(数据帧中的行数),这是否意味着数据帧中没有NA?
要获取行,您可以执行以下操作
# create a dataframe that has one NA in row 1 and 3 and two in row 4
df1 <- data.frame(a = c(1,2,NA,NA)
, b = c(NA,2,3,NA))
# now...
df1 %>% # take the dataframe
mutate_all(is.na) %>% # turn every column into a logical that tells if the value is an NA
reduce(`|`) # and then reduce one column after another using the OR-function
如果存在带有TRUE
的列,则为您提供一个逻辑矢量,即NA
。如果需要索引,可以添加which()
df1 %>%
mutate_all(is.na) %>%
reduce(`|`) %>%
which()