Changelog
Source:NEWS.md
scrutiny 0.6.0
CRAN release: 2025-08-22
Bugfixes
Fixed a bug in the DEBIT functions that could sometimes have resulted in
consistencybeingFALSEwhen it should have beenTRUE(thanks to @nrposner, #75). However, this seems to be a rare issue, and DEBIT is not widely used in any case.Fixed a bug that could theoretically lead
grim(),grim_map(),grim_map_seq(), andgrim_map_total_n()to throw a warning and possibly even return incorrect results (also @nrposner, #75). However, this is even less realistic than the previous bug.Compatibility with ggplot2 4.0.0 was ensured (@teunbrand, #78).
-
restore_zeros()now checkswidthmore strictly:- It no longer truncates decimal numbers if
widthis specified but some elements ofxhave more decimal places than that. For example, in earlier versions,restore_zeros(c(0.12, 0.123, 0.1234), width = 2)would have returnedc("0.120", "0.123", "0.123"): it silently cut off the4from the last value. An error is now thrown in such cases. - Also,
widthis now checked to be either a single whole number or a vector of whole numbers with the same length asx.
- It no longer truncates decimal numbers if
restore_zeros_df()has the same fixes as above.is_seq_dispersed()now works correctly ifNAvalues are present.
scrutiny 0.5.0
CRAN release: 2024-09-22
The package is now released under the MIT license.
Breaking changes
-
The
ratiocolumn in the output ofgrim_map()andgrim_map_seq()was replaced by aprobabilitycolumn. This means:Numerically, the only difference is that
probabilityis zero wheneverratiowas negative.-
Conceptually, it is much easier to interpret: it is the probability that a reported mean or percentage of integer data that has a specific number of decimal places but is otherwise random is GRIM-inconsistent with the reported sample size.
For example,
probabilityis0.6for a mean of1.23and a sample size of40. The same is true for any other mean with two decimal places. Thus, a randomly chosen mean with two decimal places, ostensibly derived from integer data, has a 0.6 probability of being GRIM-inconsistent with the reported sample size.
In the functions around
grim_ratio(), thexargument must now be a string. This is consistent withgrim_map(),unround(), etc.; and it prevents erroneous results that could previously occur by omitting trailing zeros.The GRIMMER implementation was debugged, so that
grimmer_map()etc. may now yield different results in a few cases. In particular, theitemsargument now works correctly, thanks to Aurélien Allard and Lukas Wallrich (#58).is_seq_dispersed()now correctly returnsFALSEif different numbers of missing values at the start and end ofxmean thatxcannot be dispersed aroundfrom.
New features
- The
probabilitycolumn (see above) is created by a new function,grim_probability().
Lifecycle updates
Deprecated
As a consequence of the above, the
show_probargument ofgrim_map()is now deprecated and will be removed in a future version. It no longer has any effect.grim_ratio_upper()is deprecated and will be removed in a future version. It no longer seems very interesting (and likely never was), especially now that the GRIM ratio in general has taken a backseat.All 15 (!) functions around
is_subset_of()are deprecated and will be removed in a future version. In truth, they were always poorly written and widely out of scope for scrutiny.
Removed
All of these had been deprecated since scrutiny 0.3.0:
audit_list()was removed.The
separgument inrestore_zeros()andrestore_zeros_df()was removed.The
numeric_onlyargument induplicate_count()andduplicate_detect()was removed.The
na.rmargument induplicate_count_colpair()was removed.
scrutiny 0.4.0
CRAN release: 2024-02-23
This version brings major performance improvements. Furthermore:
Bugfixes
- Fixed a bug in
audit_seq(): If thedispersionargument in the preceding call to a function likegrim_map_seq()was specified as something other than a linearly increasing sequence, the"diff_*"columns in the data frames returned byaudit_seq()may have contained incorrect values. - Similarly,
audit_seq()andreverse_map_seq()used to reconstruct the reported values incorrectly if thedispersiondefault was overridden as described above. At least for now, the issue is handled by throwing an error if these functions operate on data frames that are the result of specifyingdispersionas something other than a linearly increasing sequence. - Fixed a bug that incorrectly threw an error in
grim_map_seq(), other functions made byfunction_map_seq(), as well asseq_disperse()andseq_disperse_df()if an input value was so close toout_minorout_maxthat the output sequence would be shorter than implied bydispersion/.dispersion, and iftrack_var_change/.track_var_change(see below) wasTRUE. Again, note that the bug only occurred if an error was thrown.
New features
- A new vignette lists the options for specifying the
roundingargument that many scrutiny functions have:vignette("rounding-options"). - Another new vignette shows the minimal steps to implement a consistency test using scrutiny:
vignette("consistency-tests-simple"). - The output of
grim_map_seq(),grimmer_map_seq(),debit_map_seq()and any other function made byfunction_map_seq()now has adiff_varcolumn that tracks the difference between the dispersed variable (see thevarcolumn) and the reported value. Following thediff_*columns in the output ofaudit_seq(), this is the number of dispersion steps, not the actual numeric difference. - The same
diff_*columns are now integer, not double. -
function_map(),function_map_seq(), andfunction_map_total_n()have a new.name_key_resultargument that controls the name of the key result column in the output of the factory-made function. This is"consistency"by default, but other names will fit better for other kinds of tests. (The results of these tests must still be logical values.)
Minor changes
- In
duplicate_count(), thecountcolumn in the output tibble was renamed tofrequency. This makes for a more streamlined frequency table and removes an ambiguity withduplicate_count_colpair(), where thecountoutput column means something different. - In
seq_disperse()andseq_disperse_df(), thetrack_var_change/.track_var_changeargument was renamed totrack_diff_var/.track_diff_var. The arguments with the old names are still present for now but will be removed in a future version. Also, the unit of these values is now dispersion steps, for consistency withgrim_map_seq()etc. as well asaudit_seq(). -
grim_total(),grim_ratio(), andgrim_ratio_upper()now requirexto have length 1. - The docs now link to functions when opened in RStudio, not just on the website.
- Accordingly, the output of
write_doc_factory_map_conventions()now renders links. The function also has a newscrutiny_prefixargument for use in another package. - The “Infrastructure” article was renamed to “Developer tools”;
vignette("devtools"). - Some dependencies that used to be suggested are now imported.
scrutiny 0.3.0
CRAN release: 2023-08-08
Duplicate analysis overhaul
The duplicate_*() functions now present their output better and have overall been streamlined. Read more at vignette("duplicates").
A new function,
duplicate_tally(), marks each observation with its overall frequency. It is similar toduplicate_detect()but more informative.-
All values are now treated like character strings, so all can be checked. The
numeric_onlyargument is deprecated and should no longer be used.The output tibble has two new columns,
locationsandlocations_n. These hold the names of all input columns in which a value appears and the number of these columns. Details are controlled by the newlocations_typeargument.New
ignoreargument for specifying one or more values that will not be checked for duplicates.
-
New
total_xandtotal_ycolumns in the output show how many non-missing values were checked for duplicates.New
ignoreargument as induplicate_count().The
na.rmargument is deprecated. It wasn’t very useful because missing values are never checked for duplicates.
-
duplicate_detect()is superseded. It is less informative thanduplicate_count()and, in particular,duplicate_tally(). Still, it shares in the overhaul:- As in
duplicate_count(), all values are now treated like character strings, so all can be checked. Thenumeric_onlyargument is deprecated and should no longer be used. - The duplicate status of missing values is now shown as
NA. - New
ignoreargument as induplicate_count().
- As in
Bugfixes
Fixed a numeric precision bug in
round_up_from()andround_down_from()that occurred when rounding numbers greater than circa 2100 with a part to be truncated that was equal to 5 on that decimal level (thanks to @kaz462, #43). These functions are called withinround_up()andround_down(), and indirectly by all consistency-testing functions.Fixed a bug in
audit_seq()that displayed one “hit” found by varying a given reported value if there were no such hits. The other columns were not affected.Fixed a bug in
function_map()that displayed the wrong calling function’s name in case of an error.
Minor improvements
Documentation for
grim_map_seq()and all other functions made byfunction_map(),function_map_seq(), orfunction_map_total_n()now displays meaningful defaults. Printing the factory-made functions is more meaningful, as well. Internally, they now work withrlang::new_function(), which allows for flexible metaprogramming.The experimental function
audit_list()is deprecated. Just useaudit()instead.audit_seq()andaudit_total_n()are now documented separately fromaudit()andaudit_list().The lifecycle package is now imported and used in formal deprecations such as that of
sepin therestore_*()functions. The janitor package is no longer suggested.Adjusted to new CRAN requirements for
packageVersion()usage.Some performance improvements.
scrutiny 0.2.4
CRAN release: 2023-01-20
- New
decimal_places_df()function that takes a data frame and counts the decimal places in all numeric-like columns. - Four new predicate functions centered around
is_map_df()test whether an object is the output of a scrutiny-style mapper function for consistency tests. - Newly exported
is_numeric_like()function to test whether an object (e.g., a string vector) can be coerced to numeric. - New
grim_ratio_upper()function gives an upper bound forgrim_ratio(). - Changes in
split_by_parens():The function now uses a
colsargument instead of the dots (...). This follows tidyselect development guidelines. The default,cols = everything(), is to select all columns that contain thesepelements (by default, parentheses). Set the newcheck_separgument toFALSEto select all columns regardless.All other arguments were renamed: they no longer start on a dot. Furthermore,
.col1and.col2have been renamed toend1andend2.A warning is now issued if one or more columns can’t be split (or is de-selected from splitting). This occurs if a column doesn’t contain the
sepelements.Internal changes for compatibility with dplyr 1.1.0.
- In
restore_zeros_df()as well, the dots (...) were replaced by acolsargument, and each other argument no longer has a prefix dot. This follows the changes insplit_by_parens(), but note the default selection restrictions by the newcheck_numeric_likeargument. The optionalcheck_decimalsargument goes even further. - Prevent false-positive warnings when printing ggplot objects (they had occurred since ggplot2 3.4.0).
scrutiny 0.2.3
CRAN release: 2022-12-11
Some new features and bugfixes:
New
audit()methods for the output ofaudit_seq()andaudit_total_n().New
duplicate_count_colpair()function that checks each combination of columns in a data frame for duplicates.New
restore_zeros_df()function to easily restore trailing zeros in all numeric-like columns in a data frame.New
seq_length()function to extend or shorten linear sequences.Bugfixes in the
is_seq_*()functions.Argument evaluation is now forced in the function factories:
function_map(),function_map_seq(), andfunction_map_total_n().Some possible corner case issues in
split_by_parens()are now prevented.Internal changes for compatibility with purrr 1.0.0 and tidyselect 1.2.0.
scrutiny 0.2.2
CRAN release: 2022-08-22
This is a patch for CRAN compliance.
The package now requires R version >= 3.4.0 and rlang version >= 1.0.2.
Subtle changes to
split_by_parens()that users generally won’t notice.Minor shifts in the documentation (e.g.,
vignette("consistency-tests")now has instructions on exporting factory-made functions.).
scrutiny 0.2.1
This is a patch.
It reduces the scope of some examples for CRAN compliance.
Minor vignette changes.
scrutiny 0.2.0
This is a massive release, with many new features and improvements all over scrutiny. Most notably, the package now includes an entirely new system for implementing consistency tests.
- A new vignette lays out how to implement consistency tests using scrutiny’s infrastructure. It describes many of the features mentioned below.
- GRIMMER support was added, as explained in another new vignette. All GRIM and DEBIT functions mentioned below have GRIMMER analogues. For example,
grimmer_map_seq()is analogous togrim_map_seq(). - Because of the new, stricter rules for consistency tests, the output of
grim_map()no longer includes anitemscolumn by default. Instead, the numbers of items (1 by default) are factored into the output’sncolumn. This focuses the presentation on the essence of GRIM. - GRIM and DEBIT functions are now somewhat less likely to flag value sets as inconsistent. That is because measures were taken to reduce spurious, computer-induced differences when comparing floating-point numbers. The same applies to the new GRIMMER functions.
-
function_map()enables users to quickly create consistency test functions for data frames much likegrim_map()ordebit_map(). -
grim_map_seq()checks if GRIM inconsistencies might be due to small errors, and the true values might be close to the reported ones. It varies the inputs up and down in a specified range, holding the respective other ones constant, and tests all those combinations. For summaries, callaudit_seq()on the results. -
debit_map_seq()does the same for DEBIT. - The above two are powered by
function_map_seq(), which allows users to easily create functions just like these for any consistency test. All that’s needed is a data-frame-level consistency testing function likegrim_map()ordebit_map(). -
grim_map_total_n()applies GRIM in cases where no group sizes are reported, only total sample sizes. It systematically matches possible group sizes (around half the total) with reported mean or proportion values, GRIM-tests them, and counts the scenarios in which both matches are consistent. For summaries, callaudit_total_n()on the results. -
debit_map_total_n()does the same for DEBIT. - The above two are powered by
function_map_total_n(), which allows users to easily create new functions likegrim_map_total_n()ordebit_map_total_n(), provided a data-frame-level consistency testing function likegrim_map()ordebit_map(). - On a lower level still,
disperse_total()takes a total sample size (comprised of the two unknown group sizes on interest) and calls the appropriate group-level function:disperse()for even totals,disperse2()for odd ones. -
seq_disperse()andseq_disperse_df()extend scrutiny’s support for string decimal sequences with trailing zeros. They construct sequences centered around the input; a use case not directly covered bybase::seq(). - Predicate functions around
is_seq_linear()test whether a vector represents a certain kind of numeric sequence. - In
debit_map(), thexcolumn is now to the left of thesdcolumn ifshow_recisFALSE, in accordance with theshow_rec = TRUEdefault. -
debit()is now vectorized. - The functions around
is_subset_of() and is_superset_of()functions now have stricter variants grouped aroundis_proper_subset_of()andis_proper_superset_of(). -
split_by_parens()now accepts any pair of separators passed to.sepas a length-2 vector.
scrutiny 0.1.1
This is a patch, mainly fixing a bug that used to affect the presentation of input data in grim_map()’s results. It needs to be emphasized that this bug only affected a convenience feature, namely the presentation of certain input data in the output, not the GRIM test itself.
Previously, if
percentwas set toTRUE, thexvalues were converted to percentages. Because they need to be presented as strings, percentage conversion involves restoring the correct number of trailing zeros. The bug, then, was that all thexvalues appearing in the output (not in the internal computations!) were restored to the same “length” as the single longest one. This was now remedied, andxvalues are restored to their individually appropriate number of trailing zeros.Another bugfix concerns versioning. Previously, the package had an incorrect version number. It was now corrected.
The last change was to remove an outdated and potentially misleading paragraph in the documentation of
reround_to_fraction().
scrutiny 0.1.0
-
This version includes an overhaul of
grim_plot():It extends the function to cover cases of
decimalsvalues greater than 2, using a gradient instead of a raster.It enables data-free calls to
grim_plot()with the newshow_dataargument. Resulting plots only display the background raster. This mirrors Figure 1 in Brown and Heathers’ GRIM paper. (Althoughgrim_plot()as a whole is modeled after this figure, the default addition of empirical summary data is specific to scrutiny.) Like Brown and Heathers, users may wish to create such raster-only plots in order to demonstrate some principled points. The key parametersdecimalsandroundingcan be controlled directly to make up for the lack of information fromdata.The function now checks if all input means or proportions (
x) have the same number of decimal places. If they don’t, it throws an error. This strict criterion can be circumvented by specifying thedecimalsargument. However, since each raster is specific to one number of decimal places (and hence cannot be interpreted regardingxvalues with a different number), the recommended solution is to plotxvalues separately — once for each number of decimal places.The
show_full_rangeargument was removed because I now think it is superfluous.Previously, there was some space between the raster and the y-axis. It has now been removed.
Test result data points, shown in blue and/or red by default, are now built on top of the raster, which makes for a more distinct appearance.
Two new functions,
reround_to_fraction(), andreround_to_fraction_level(), enable fractional rounding, inspired byjanitor::round_to_fraction(). For example, they might round0.4to0.5for fractions of2. What tells the new functions apart is that they come with all the flexibility ofreround(). Furthermore,reround_to_fraction_level()is closer to a conventional rounding function than the other two.The new version also fixes a bug in
row_to_colnames(), rewriting the function’s core.Another bug was fixed in
grim()andgrim_map(), concerningshow_rec: Forroundingstrings that lead to four reconstructed numbers perxvalue rather than just two, it used to be the case that only the two values corresponding to the first of the two rounding procedures were displayed in the output tibble. Now, all four are displayed, bearing appropriate names.Another bugfix is for the
thresholdargument inreround(), which didn’t work properly before. This used to affect higher-level functions such asgrim(),grim_map(),debit(), anddebit_map(), as well. The default forthresholdis now5in all such functions. Note that rounding up and down from 5 has been fully functional independently of it.Also in
reround(), therecargument has been renamed toxin accordance with general naming conventions. Thedecimalsargument has been renamed todigitsin accordance with naming conventions among rounding functions.In
split_by_parens(), ellipsis support was added to protect the user from silent, unexpected results following named arguments in tidy evaluation. The ellipsis package has been added to the Suggests field inDESCRIPTION.In some high-level functions, internal checks now determine if the lengths of multiple arguments that are factored into the same internal function call are mutually congruent. That is, if two such arguments are length > 1, they need to have the same length (which will throw a warning). Otherwise, there will be an explicit and very specific error message.
Finally, some minor refactoring and other small changes that users generally won’t notice.
scrutiny 0.0.1
- Added vignette about other packages for error detection, called
Related software. - Exported
grim_plot(). - Minor refactoring.