The key thing to understand here is that R treats an NA as a missing value
-- literally. Which means that if you want to do a logical operation on a
NA value, you get "missing value where TRUE/FALSE needed". Since R
doesn't know what to do if the logical value is missing, and so keeps the
row by default. You need to omit the NA values using:
nrow(na.omit(data[data$var,]))
This has the effect of omitting observations 3-5 below:
x y
1 TRUE 1
2 TRUE 2
3 TRUE NA
4 FALSE 3
5 NA 4
That is, if any of the variables for which x is TRUE has an NA in any of
of the *other* variables, that row will be omitted from the count.
Also, note that data$var == TRUE is redundant in this case, and also note
that you should use the name of the variable, not a column indicator.
(You want to preserve the indexing of observations against each other.)
Also, Allan's hack from yesterday works too...
Olivia.
On Sat, 20 Nov 2004, Andrew Eggers wrote:
I discovered an unpleasant R feature and wondered if
anyone could
suggest a workaround or show me where I am going wrong.
I have been using logical statements and nrow() to extract the number
of observations that have certain characteristics.
for example, I thought the following statement
nrow(X[X[,1]==TRUE,])
would tell me how many rows of X had TRUE in the first column.
But it turns out that R considers anything that is TRUE or 1 _or_ NA
as TRUE. (The same is true when you say nrow(X[X[,1]==1,]).)
Can anyone tell me how to deal with this? Can anyone explain why R
would count NA as TRUE?
Andy
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l