Skip to contents

median_table() computes the sample median. If the median is unknown due to missing values, it only ignores as many of them as necessary. In this way, a true median estimate of the remaining known and unknown values can be found, preserving as much data as possible.

Estimates are presented along with lower and upper bounds, the number of missing values that had to be ignored, etc.

The function can also take a data frame (or another list) of numeric vectors. It will then compute the median of each element.

Usage

median_table(x, even = c("mean", "low", "high"), ...)

Arguments

x

Vector that can be ordered using sort(), or a list of such vectors; e.g., a data frame.

even

Passed on to median2().

...

Optional further arguments for median2() methods. Not used in its default method.

Value

Data frame with these columns:

  • term: the names of x elements.

  • estimate: the medians of x elements, ignoring as many NAs as necessary.

  • certainty: TRUE if the corresponding estimate is certain to be the true median, and FALSE if this is unclear due to missing values.

  • lower, upper: Bounds of the median. Equal if certainty is TRUE because in that case, the precise value is known.

  • na_ignored: the number of missing values that had to be ignored to arrive at the estimate.

  • na_total: the total number of missing values.

  • rate_ignored_na: the proportion of missing values that had to be ignored from among all missing values.

  • sum_total: the total number of values, missing or not.

  • rate_ignored_sum: the proportion of missing values that had to be ignored from among all values, missing or not.

See also

Examples

median_table(c(5, 23, 5, NA, 5, NA))
#> # A tibble: 1 × 10
#>   term  estimate certainty lower upper na_ignored na_total rate_ignored_na
#>   <chr>    <dbl> <lgl>     <dbl> <dbl>      <int>    <int>           <dbl>
#> 1 ""           5 FALSE         5    14          1        2             0.5
#> # ℹ 2 more variables: sum_total <int>, rate_ignored_sum <dbl>

# Use a list of numeric vectors:
my_list <- list(
  a = 1:15,
  b = c(1, 1, NA),
  c = c(4, 4, NA, NA, NA, NA),
  d = c(96, 24, 3, NA)
)

median_table(my_list)
#> # A tibble: 4 × 10
#>   term  estimate certainty lower upper na_ignored na_total rate_ignored_na
#>   <chr>    <dbl> <lgl>     <dbl> <dbl>      <int>    <int>           <dbl>
#> 1 a            8 TRUE        8       8          0        0            0   
#> 2 b            1 TRUE        1       1          0        1            0   
#> 3 c            4 FALSE      NA      NA          3        4            0.75
#> 4 d           24 FALSE      13.5    60          1        1            1   
#> # ℹ 2 more variables: sum_total <int>, rate_ignored_sum <dbl>

# Data frames are allowed:
median_table(iris[1:4])
#> # A tibble: 4 × 10
#>   term        estimate certainty lower upper na_ignored na_total rate_ignored_na
#>   <chr>          <dbl> <lgl>     <dbl> <dbl>      <int>    <int>           <dbl>
#> 1 Sepal.Leng…     5.8  TRUE       5.8   5.8           0        0               0
#> 2 Sepal.Width     3    TRUE       3     3             0        0               0
#> 3 Petal.Leng…     4.35 TRUE       4.35  4.35          0        0               0
#> 4 Petal.Width     1.3  TRUE       1.3   1.3           0        0               0
#> # ℹ 2 more variables: sum_total <int>, rate_ignored_sum <dbl>