How many NAs can be tolerated for a median estimate? — median_count

median_count_tolerable() returns the number of missing values that can be preserved while determining the median. The point is to retain as many data points as possible, instead of simply ignoring all NAs.

This is only based on the number of known values, not any NAs there might be.

It is used within median_table() to determine how many missing values need to be ignored.

Usage

median_count_tolerable(x, needs_prep = TRUE)

Arguments

x: Vector that can be ordered using sort().
needs_prep: Logical. Ignore unless the function is used as a helper. See details.

Value

Integer (length 1). Never NA, never negative.

Details

With the default needs_prep = TRUE, missing values will be removed from x, and x will be sorted. If this was already done elsewhere, setting needs_prep to FALSE is more efficient. Proceed with caution as this is not checked.

Examples

# With two or fewer `NA`s, the median can only be `8`,
# so these `NA`s are tolerated:
median_count_tolerable(c(8, 8, 8, NA, NA))
#> [1] 2

# When adding a third `NA`, the median will be unknown.
# Compare using naidem's correct median function:
median2(c(8, 8, 8, NA, NA))
#> [1] 8
median2(c(8, 8, 8, NA, NA, NA))
#> [1] NA

# No `NA`s are tolerable here because
# a single one could change the median:
median_count_tolerable(c(8, 9, 9, NA, NA, NA))
#> [1] 0

# Here too, the median depends on the value behind `NA`,
# so the `NA` cannot be tolerated:
median_count_tolerable(c(8, 9, NA))
#> [1] 0