[gov2001-l] NA and TRUE

21 Nov 2004

The key thing to understand here is that R treats an NA as a missing value
-- literally.  Which means that if you want to do a logical operation on a
NA value, you get "missing value where TRUE/FALSE needed".  Since R
doesn't know what to do if the logical value is missing, and so keeps the
row by default.  You need to omit the NA values using:

nrow(na.omit(data[data$var,]))

This has the effect of omitting observations 3-5 below:

  x        y
1 TRUE     1
2 TRUE     2
3 TRUE     NA
4 FALSE    3
5 NA       4

That is, if any of the variables for which x is TRUE has an NA in any of
of the *other* variables, that row will be omitted from the count.

Also, note that data$var == TRUE is redundant in this case, and also note
that you should use the name of the variable, not a column indicator.
(You want to preserve the indexing of observations against each other.)

Also, Allan's hack from yesterday works too...

Olivia.

On Sat, 20 Nov 2004, Andrew Eggers wrote:

...
  I discovered an unpleasant R feature and wondered if
anyone could
 suggest a workaround or show me where I am going wrong.

 I have been using logical statements and nrow() to extract the number
 of observations that have certain characteristics.

 for example, I thought the following statement

 nrow(X[X[,1]==TRUE,])

 would tell me how many rows of X had TRUE in the first column.

 But it turns out that R considers anything that is TRUE or 1 _or_ NA
 as TRUE. (The same is true when you say nrow(X[X[,1]==1,]).)

 Can anyone tell me how to deal with this? Can anyone explain why R
 would count NA as TRUE?

 Andy
 _______________________________________________
 gov2001-l mailing list
 gov2001-l(a)lists.fas.harvard.edu
 http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[gov2001-l] NA and TRUE