Drop-in replacement for median()

median2() computes the sample median. By default, it works like median() from base R, with these exceptions:

If some values are missing, median2() checks if the median can still be determined. median() always returns NA in this case, but median2() only returns NA if the median is genuinely unknown.
You can opt to only ignore a certain number of missing values using the na.rm.amount and na.rm.from arguments.
Strings, factors and all other data that can be ordered by sort() are allowed. However, non-numeric data, including dates and factors, require one of even = "low" and even = "high". This avoids "computing the mean" of the two central values of sorted vectors with an even length when no such operation is possible, e.g., with strings.
The return type is always double if the input vector is numeric (i.e., double or integer), for both even and odd lengths.

Usage

median2(
  x,
  na.rm = FALSE,
  na.rm.amount = 0,
  na.rm.from = c("first", "last", "random"),
  even = c("mean", "low", "high"),
  ...
)

# Default S3 method
median2(
  x,
  na.rm = FALSE,
  na.rm.amount = 0,
  na.rm.from = c("first", "last", "random"),
  even = c("mean", "low", "high"),
  ...
)

Arguments

x: Vector that can be ordered using sort(). It will be searched for its median.
na.rm: Logical. If set to TRUE, missing values are removed before computation proceeds. Default is FALSE.
na.rm.amount: Numeric. Alternative to na.rm that only removes a specified number of missing values. Default is 0.
na.rm.from: String. If na.rm.amount is used, from which position in x should missing values be removed? Options are "first", "last", and "random". Default is "first".
even: String. What to return if x has an even length and contains no missing values (or they were removed). The default, "mean", averages the two central values of the sorted vector, "low" returns the lower central value, and "high" returns the higher one. Note that "mean" is only allowed if x is numeric.
...: Optional further arguments for methods. Not used in the default method.

Value

Length-1 vector of type double if the input is numeric, and the same type as x otherwise. This is tested by is.numeric(), so factors and dates do not count as numeric.

Details

The main point of median2() is to handle missing values correctly. For the motivation behind the other differences from median(), see Tidy design principles.

median2() is a generic function, so new methods can be defined for it. As with stats::median() from base R, the default method described here should work for most classes for which a median is a reasonable concept (e.g., "Date").

If a new method is necessary, please make sure it deals with missing values like the default method does. See Implementing the algorithm for further details.

Author

Lukas Jung, R Core Team

Examples

# If no values are missing,
# it works mostly like `median()`:
median(1:4)
#> [1] 2.5
median2(1:4)
#> [1] 2.5

median(c(1:3, 100, 1000))
#> [1] 3
median2(c(1:3, 100, 1000))
#> [1] 3

# With some `NA`s, the median can
# sometimes still be determined...
median2(c(0, 1, 1, 1, NA))
#> [1] 1
median2(c(0, 0, NA, 0, 0, NA, NA))
#> [1] 0

# ...unless there are too many `NA`s...
median2(c(0, 1, 1, 1, NA, NA))
#> [1] NA

# ...or too many unique values:
median2(c(0, 1, 2, 3, NA))
#> [1] NA