How many NA
s can be tolerated for a median estimate?
Source: R/median-count-tolerable.R
median_count_tolerable.Rd
median_count_tolerable()
returns the number of missing values
that can be preserved while determining the median. The point is to retain
as many data points as possible, instead of simply ignoring all NA
s.
This is only based on the number of known values, not any NA
s there might
be.
It is used within median_table()
to determine how many missing values
need to be ignored.
Arguments
- x
Vector that can be ordered using
sort()
.- needs_prep
Logical. Ignore unless the function is used as a helper. See details.
Details
With the default needs_prep = TRUE
, missing values will be removed
from x
, and x
will be sorted. If this was already done elsewhere,
setting needs_prep
to FALSE
is more efficient. Proceed with caution as
this is not checked.
Examples
# With two or fewer `NA`s, the median can only be `8`,
# so these `NA`s are tolerated:
median_count_tolerable(c(8, 8, 8, NA, NA))
#> [1] 2
# When adding a third `NA`, the median will be unknown.
# Compare using naidem's correct median function:
median2(c(8, 8, 8, NA, NA))
#> [1] 8
median2(c(8, 8, 8, NA, NA, NA))
#> [1] NA
# No `NA`s are tolerable here because
# a single one could change the median:
median_count_tolerable(c(8, 9, 9, NA, NA, NA))
#> [1] 0
# Here too, the median depends on the value behind `NA`,
# so the `NA` cannot be tolerated:
median_count_tolerable(c(8, 9, NA))
#> [1] 0