NOTE: This function is currently experimental and shouldn't be relied upon.
Call frequency_grid_plot()
to visualize the absolute frequencies of
values in a vector. Each observation is plotted distinctly, resulting in a
hybrid of a histogram and a scatterplot.
Boxes are known values.
Circles with
NA
labels are missing values.Empty circles are no values at all: They signify that certain unique values would have to be more frequent in order for all unique values to be equally frequent.
Usage
frequency_grid_plot(
x,
show_line_grid = FALSE,
show_line_mode = FALSE,
label_missing = "NA",
color_label_missing = "red2",
color_missing = "red2",
color_non_missing = "blue2",
alpha_missing = 1,
alpha_non_missing = 0.75,
size_label_missing = 3,
size_missing = 10,
size_non_missing = 10,
shape_missing = 1,
shape_non_missing = 15,
expand = 0.1
)
Arguments
- x
A vector with frequencies to visualize.
- show_line_grid
Logical. Should gridlines be present, crossing at each observation? Default is
FALSE
.- show_line_mode
Logical. Should a dashed line demarcate the mode(s) among known values from the missing values that might add to these modes, if there are any? Default is
FALSE
.- label_missing
String. Label used for missing values. Default is
"NA"
.- color_label_missing, color_missing, color_non_missing
String. Colors of the data points. Defaults are
"red2"
for missing data points as well as their labels, and"blue2"
for non-missing data points.- alpha_missing, alpha_non_missing
Numeric. Opacity of the data points. Defaults are
1
and0.75
, respectively.- size_label_missing, size_missing, size_non_missing
Numeric. Sizes of the data points. Defaults are
3
for the label and10
for both symbols.- shape_missing, shape_non_missing
Numeric or string. Signifiers for the shapes of the data points. Defaults are
1
(circle) and15
(square filled), respectively.- expand
Numeric. Padding whitespace between the axes and the data points. The distance is the same on all four sides due to the grid structure. Default is
0.1
.
Value
A ggplot object. To save it, call ggplot2::ggsave()
.
Limitations
Certain assumptions about missing values are currently hard-coded in the function. In the future, they should become optional. These assumptions are:
All missings represent a known value. For example, in
c(1, 2, NA)
, theNA
is either1
or2
.The missings are as evenly distributed across known values as possible. Therefore, in
c(1, 2, NA, NA)
, oneNA
is a1
and the other one is a2
. This is clearly not reasonable as a general assumption. It is derived from moder's way of determining possible extreme cases.
See also
frequency_grid_df()
, which forms the basis of the current
function.
Examples
x <- c("a", "a", "a", "b", "b", "c", NA, NA, NA, NA, NA)
# Basic usage:
frequency_grid_plot(x)
# With "N/A" as a marker of missing values
# instead of "NA":
frequency_grid_plot(x, label_missing = "N/A")
# Black and white mode:
frequency_grid_plot(
x, color_label_missing = "black",
color_missing = "black", color_non_missing = "black"
)