median2()
computes the sample median. By default, it works
like median()
from base R, with these exceptions:
If some values are missing,
median2()
checks if the median can still be determined.median()
always returnsNA
in this case, butmedian2()
only returnsNA
if the median is genuinely unknown.You can opt to only ignore a certain number of missing values using the
na.rm.amount
andna.rm.from
arguments.Strings, factors and all other data that can be ordered by
sort()
are allowed. However, non-numeric data, including dates and factors, require one ofeven = "low"
andeven = "high"
. This avoids "computing the mean" of the two central values of sorted vectors with an even length when no such operation is possible, e.g., with strings.The return type is always double if the input vector is numeric (i.e., double or integer), for both even and odd lengths.
Arguments
- x
Vector that can be ordered using
sort()
. It will be searched for its median.- na.rm
Logical. If set to
TRUE
, missing values are removed before computation proceeds. Default isFALSE
.- na.rm.amount
Numeric. Alternative to
na.rm
that only removes a specified number of missing values. Default is0
.- na.rm.from
String. If
na.rm.amount
is used, from which position inx
should missing values be removed? Options are"first"
,"last"
, and"random"
. Default is"first"
.- even
String. What to return if
x
has an even length and contains no missing values (or they were removed). The default,"mean"
, averages the two central values of the sorted vector,"low"
returns the lower central value, and"high"
returns the higher one. Note that"mean"
is only allowed ifx
is numeric.- ...
Optional further arguments for methods. Not used in the default method.
Value
Length-1 vector of type double if the input is numeric, and the same
type as x
otherwise. This is tested by is.numeric()
, so factors and
dates do not count as numeric.
Details
The main point of median2()
is to handle missing values correctly.
For the motivation behind the other differences from median()
, see
Tidy design principles.
median2()
is a generic function, so new methods can be defined for it. As
with stats::median()
from base R, the default method described here
should work for most classes for which a median is a reasonable concept
(e.g., "Date
").
If a new method is necessary, please make sure it deals with missing values like the default method does. See Implementing the algorithm for further details.
Examples
# If no values are missing,
# it works mostly like `median()`:
median(1:4)
#> [1] 2.5
median2(1:4)
#> [1] 2.5
median(c(1:3, 100, 1000))
#> [1] 3
median2(c(1:3, 100, 1000))
#> [1] 3
# With some `NA`s, the median can
# sometimes still be determined...
median2(c(0, 1, 1, 1, NA))
#> [1] 1
median2(c(0, 0, NA, 0, 0, NA, NA))
#> [1] 0
# ...unless there are too many `NA`s...
median2(c(0, 1, 1, 1, NA, NA))
#> [1] NA
# ...or too many unique values:
median2(c(0, 1, 2, 3, NA))
#> [1] NA