median_table()
computes the sample median. If the median is
unknown due to missing values, it only ignores as many of them as
necessary. In this way, a true median estimate of the remaining known and
unknown values can be found, preserving as much data as possible.
Estimates are presented along with lower and upper bounds, the number of missing values that had to be ignored, etc.
The function can also take a data frame (or another list) of numeric vectors. It will then compute the median of each element.
Usage
median_table(x, even = c("mean", "low", "high"), ...)
Value
Data frame with these columns:
term
: the names ofx
elements.estimate
: the medians ofx
elements, ignoring as manyNA
s as necessary.certainty
:TRUE
if the corresponding estimate is certain to be the true median, andFALSE
if this is unclear due to missing values.lower
,upper
: Bounds of the median. Equal ifcertainty
isTRUE
because in that case, the precise value is known.na_ignored
: the number of missing values that had to be ignored to arrive at the estimate.na_total
: the total number of missing values.rate_ignored_na
: the proportion of missing values that had to be ignored from among all missing values.sum_total
: the total number of values, missing or not.rate_ignored_sum
: the proportion of missing values that had to be ignored from among all values, missing or not.
See also
median_count_tolerable()
for the logic of preserving as manyNA
s as possible.median_bounds()
for thelower
andupper
columns; the bounds of an uncertain median.median_plot_errorbar()
andmedian_plot_col()
for follow-up visualizations.
Examples
median_table(c(5, 23, 5, NA, 5, NA))
#> # A tibble: 1 × 10
#> term estimate certainty lower upper na_ignored na_total rate_ignored_na
#> <chr> <dbl> <lgl> <dbl> <dbl> <int> <int> <dbl>
#> 1 "" 5 FALSE 5 14 1 2 0.5
#> # ℹ 2 more variables: sum_total <int>, rate_ignored_sum <dbl>
# Use a list of numeric vectors:
my_list <- list(
a = 1:15,
b = c(1, 1, NA),
c = c(4, 4, NA, NA, NA, NA),
d = c(96, 24, 3, NA)
)
median_table(my_list)
#> # A tibble: 4 × 10
#> term estimate certainty lower upper na_ignored na_total rate_ignored_na
#> <chr> <dbl> <lgl> <dbl> <dbl> <int> <int> <dbl>
#> 1 a 8 TRUE 8 8 0 0 0
#> 2 b 1 TRUE 1 1 0 1 0
#> 3 c 4 FALSE NA NA 3 4 0.75
#> 4 d 24 FALSE 13.5 60 1 1 1
#> # ℹ 2 more variables: sum_total <int>, rate_ignored_sum <dbl>
# Data frames are allowed:
median_table(iris[1:4])
#> # A tibble: 4 × 10
#> term estimate certainty lower upper na_ignored na_total rate_ignored_na
#> <chr> <dbl> <lgl> <dbl> <dbl> <int> <int> <dbl>
#> 1 Sepal.Leng… 5.8 TRUE 5.8 5.8 0 0 0
#> 2 Sepal.Width 3 TRUE 3 3 0 0 0
#> 3 Petal.Leng… 4.35 TRUE 4.35 4.35 0 0 0
#> 4 Petal.Width 1.3 TRUE 1.3 1.3 0 0 0
#> # ℹ 2 more variables: sum_total <int>, rate_ignored_sum <dbl>