Title: | Tools for adjusting for rounding problems in metastudies about p-hacking and publication bias |
---|---|
Description: | Tools for adjusting for rounding problems in metastudies about p-hacking and publication bias |
Authors: | Sebastian Kranz, Peter Puetz |
Maintainer: | Sebastian Kranz <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.1.0 |
Built: | 2024-08-31 02:57:45 UTC |
Source: | https://github.com/skranz/RoundingMatters |
Avoids downward bias at the left hand side where abs(z)=0.
absz.density( z, at = NULL, bw = 0.1, adjust = 1, kernel = "epanechnikov", n = 1024, weights = NULL, ... )
absz.density( z, at = NULL, bw = 0.1, adjust = 1, kernel = "epanechnikov", n = 1024, weights = NULL, ... )
z |
vector of z-statistics (or absolute z-statistics) |
at |
vector of points where density shall be evaluated. If NULL return a function (by calling |
bw , adjust , kernel , n , weights , ...
|
arguments passed to |
Add by default bootstrap standard errors and confidence intervals
absz.density.ratio( z.num, z.denom, at, bootstrap = TRUE, B = 1000, ci.level = 0.95, bw = 0.1, kernel = "epanechnikov", return.as = c("long", "wide")[1], weights.num = NULL, weights.denom = NULL, ... )
absz.density.ratio( z.num, z.denom, at, bootstrap = TRUE, B = 1000, ci.level = 0.95, bw = 0.1, kernel = "epanechnikov", return.as = c("long", "wide")[1], weights.num = NULL, weights.denom = NULL, ... )
z.num |
observed z-statistics forming numerator density |
z.denom |
observed z-statistics forming denominator density |
at |
position where density shall be evaluated |
bootstrap |
if TRUE add bootstrap SE and CI for all measures |
B |
number of bootstrap repetitions |
ci.level |
Confidence level. Default 0.95. |
weights.num |
weights for z.num (optional) |
weights.denom |
weights for z.denom (optional) |
... |
arguments for absz.density |
Convert numbers like 0.421 to 42.1%
as.perc(x, digits = 1)
as.perc(x, digits = 1)
x |
a vector of floating point numbers |
digits |
to how many decimal digits shall the percentage be rounded? |
Draw derounded z assuming missing digits of mu and sigma are uniformly distributed, but adjust for estimated density of z using rejection sampling
deround.z.density.adjust( z.pdf, mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec, max.rejection.rounds = 10000, verbose = TRUE, just.uniform = rep(FALSE, length(mu)), z.min = 0, z.max = 5 )
deround.z.density.adjust( z.pdf, mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec, max.rejection.rounds = 10000, verbose = TRUE, just.uniform = rep(FALSE, length(mu)), z.min = 0, z.max = 5 )
z.pdf |
An estimated density of the derounded z-statistics (e.g. using only observations with many significant digits) normalized such that its highest values is 1. Best use |
mu |
Reported coefficient, possibly rounded |
sigma |
Reported standard error, possibly rounded. |
mu.dec |
Number of decimal places mu is reported to. Usually, we would assume that mu and sigma are rounded to the same number of decimal places. Since trailing zeros may not be detected, we set the default |
sigma.dec |
By default equal to mu.dec. |
max.rejection.rounds |
A limit how often the rejection sampler redraws to avoid an infinite loop. |
verbose |
If |
Draw derounded z assuming missing digits of mu and sigma are uniformly distributed
deround.z.uniform( mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec )
deround.z.uniform( mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec )
mu |
Reported coefficient, possibly rounded |
sigma |
Reported standard error, possibly rounded. |
mu.dec |
Number of decimal places mu is reported to. Usually, we would assume that mu and sigma are rounded to the same number of decimal places. Since trailing zeros may not be detected, we set the default |
sigma.dec |
By default equal to mu.dec. |
The resulting data frame is required for derounding b simulting rounding (dsr) approach. It contains a row for all considered combinations of z and s and window half-width h in h.seq. The columns share.below and share.above indicate which share of derounded z-statistics are inside the window and either fall below or above the threshold z0, respectively. Note that 1-share.above-share.below is the share of derounded z-statistics that fall outside the considered window.
dsr.ab.df( dat, h.seq = c(0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5), z0 = 1.96, min.n = 10000, min.rounds = 5, verbose = TRUE )
dsr.ab.df( dat, h.seq = c(0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5), z0 = 1.96, min.n = 10000, min.rounds = 5, verbose = TRUE )
dat |
the data frame that should contain at least the columns z and num.deci (number fo decimal places of mu and sigma, maximum of both) |
h.seq |
all considered window half-widths |
z0 |
the significance threshold. Can be a single number or a vector with one element per row of dat. |
min.n |
how many z values shall be minimally rounded to compute the derounded z-distribution for each observation. |
min.rounds |
how many repetitions of rounding z-values shall there be at least (even if min.n is already reached). |
verbose |
Shall some progress information be shown? (This function can take a while). |
Adds to dat
the logical columns dsr.adjust
and dsr.compute
. dsr.adjust==TRUE
means that z-statistics of this observation will be adjusted by dsr. The adjustment statistics only depend on the reported z value and significant s of sigma. We thus don't need to compute the distribution for all rows with dsr.adjust==TRUE
. If dsr.compute==TRUE
we shall cmpute the derounded distribution for this observations.
dsr.mark.obs( dat, h.seq = c(0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5), z0 = 1.96, s.max = 100, no.deround = NULL )
dsr.mark.obs( dat, h.seq = c(0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5), z0 = 1.96, s.max = 100, no.deround = NULL )
dat |
the data frame, must have columns |
h.seq |
vector of considered windows half-width. We mark an observation for adjustment if it is at risk of missclassification, wrong inclusion, or wrong exclusion for any considered window size. |
z0 |
the signficance threshold for z (default=1.96). |
s.max |
only mark observations for adjustment who have |
no.deround |
a logical vector indicating columns that shall never be derounded |
The PDF is normalized such that the point of highest density is 1
make.z.pdf( z, bw = 0.05, kernel = "gaussian", n = 512, dat, min.s = 100, z.min = 0, z.max = 5, show.hist = FALSE, ... )
make.z.pdf( z, bw = 0.05, kernel = "gaussian", n = 512, dat, min.s = 100, z.min = 0, z.max = 5, show.hist = FALSE, ... )
z |
a vector of z statistics. Usually, you would select all values from dat whose mu and sigma have sufficiently many significant digits |
... |
other parameters passed to |
Compute minimum and maximum possible values of z given rounded mu and sigma
## S3 method for class 'max.z' min( mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec )
## S3 method for class 'max.z' min( mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec )
mu |
Vector of reported estimated coefficients |
sigma |
Vector of reported standard errors |
mu.dec |
Number of reported decimal digits for mu. By default the maximum of the |
long |
if TRUE (default return results in a long format) |
Get the number of significand digits of a floating point number using the character presentation of those numbers of R
num.deci(x)
num.deci(x)
x |
a numeric vector |
We assume that trailing zeros left of the decimal point are significant digits while trailing zeros right of the decimal point are not significant digits
num.sig.digits(x)
num.sig.digits(x)
x |
a numeric vector |
Get the last significant digit(s) of a floating point number
rightmost.sig.digit(x, r1 = 1, r2 = 1)
rightmost.sig.digit(x, r1 = 1, r2 = 1)
x |
The vector of floating point numbers |
r1 |
Starting position from right |
r2 |
Ending position from right |
Compute thresholds for the significant s of the reported standard deviation such that we can rule-out the errors: misclassification, wrong inclusion, wrong exclusion
rounding.risk.s.thresholds(z, z0 = z0, h = 0.2)
rounding.risk.s.thresholds(z, z0 = z0, h = 0.2)
z |
a vector of z statistics |
z0 |
significance threshold. Can be a single number or a vector of length z |
h |
half-width of considered window around z0 |
A data frame with the columns "z", "s.misclass", "s.include", "s.exclude" specifying for each z value the corresponding thresholds.
Assess for observations with reported z-statistic z and a signficand of s for the standard error whether it is at risk of the errors: misclassification, wrong inclusion, wrong exclusion
rounding.risks(z, s, z0 = 1.96, h = 0.2)
rounding.risks(z, s, z0 = 1.96, h = 0.2)
z |
a vector of z statistics |
s |
vector of corresponding significands of the standard error |
z0 |
significance threshold. Can be a single number like 1.96 or a vector of length z |
h |
half-width of considered window around z0 |
A data frame with risk of missclassification information for each observations. We illustrate the columns for the misclassification risk:
"s.misclass" is the threshold for the significand s above which we can rule out misclassification risk
risk.misclass = s < s.misclass
indicates whether the observation is at risk of misclassification
risk.misclass.below = risk.misclass & z < z0
indicates whether the observation is at risk of misclassification and below the significance threshold
the other columns should be self-explainable given this info.
Summary statistics for rounding risks for different thresholds
rounding.risks.summary(rr.dat, s.thresh = 0:100, long = TRUE)
rounding.risks.summary(rr.dat, s.thresh = 0:100, long = TRUE)
rr.dat |
A data frame returned from a call to |
long |
if TRUE (default return results in a long format) |
s.tresh |
a vector of considered s thresholds |
Sample derounded z from the uniformely derounded distributon for a given single value of mu and sigma
sample.uniform.z.deround( n, mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec )
sample.uniform.z.deround( n, mu, sigma, mu.dec = pmax(num.deci(mu), num.deci(sigma)), sigma.dec = mu.dec )
n |
Number of sample draws |
mu |
Reported coefficient, possibly rounded |
sigma |
Reported standard error, possibly rounded. |
mu.dec |
Number of decimal places mu is reported to. Usually, we would assume that mu and sigma are rounded to the same number of decimal places. Since trailing zeros may not be detected, we set the default |
sigma.dec |
By default equal to mu.dec. |
Sets the last digit of a number x to zero
set.last.digit.zero(x)
set.last.digit.zero(x)
x |
a numeric vector |
The significand is the integer of all significand digits, e.g. the significand of 0.012 is 12.
significand(x, num.deci = NULL)
significand(x, num.deci = NULL)
x |
a numeric vector. |
num.deci |
If not NULL a vector that states the number reported decimal places for x. This can be used if we know that there were addtional trailing zeros. |
Unlike normal [geom_density] or [stat_density] the density estimate does not go artificially decrease at the left bound 0. Note that this function only works nicely if the data starts left with 0. Possibly atoms at z=0 should ideally be removed.
stat_abszdensity( mapping = NULL, data = NULL, geom = "line", position = "stack", ..., bw = "nrd0", adjust = 1, kernel = "epanechnikov", n = 512, trim = FALSE, na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE )
stat_abszdensity( mapping = NULL, data = NULL, geom = "line", position = "stack", ..., bw = "nrd0", adjust = 1, kernel = "epanechnikov", n = 512, trim = FALSE, na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE )
bw |
The smoothing bandwidth to be used. If numeric, the standard deviation of the smoothing kernel. If character, a rule to choose the bandwidth, as listed in [stats::bw.nrd()]. |
adjust |
A multiplicate bandwidth adjustment. This makes it possible to adjust the bandwidth while still using the a bandwidth estimator. For example, 'adjust = 1/2' means use half of the default bandwidth. |
kernel |
Kernel. See list of available kernels in [density()]. |
n |
number of equally spaced points at which the density is to be estimated, should be a power of two, see [density()] for details |
trim |
If 'FALSE', the default, each density is computed on the full range of the data. If 'TRUE', each density is computed over the range of that group: this typically means the estimated x values will not line-up, and hence you won't be able to stack density values. This parameter only matters if you are displaying multiple densities in one plot or if you are manually adjusting the scale limits. |
density estimate
density * number of points - useful for stacked density plots
density estimate, scaled to maximum of 1
alias for 'scaled', to mirror the syntax of ['stat_bin()']
This is the main function you will call if you want to perform a publication bias / p-hacking analysis with derounded z-statistics. It allows flexible combinations of how a single derounded z vector is drawn, which statistics are computed for each combination of window h and derounded z-draw and how those statistics are aggregated over multiple replications.
study.with.derounding( dat, h.seq = c(0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5), window.fun = window.t.ci, mode = c("reported", "uniform", "zda", "dsr")[1], alt.mode = c("uniform", "reported")[1], make.z.fun = NULL, z0 = ifelse(has.col(dat, "z0"), dat[["z0"]], 1.96), repl = 1, aggregate.fun = "median", ab.df = NULL, z.pdf = NULL, max.s = 100, common.deci = TRUE, verbose = TRUE )
study.with.derounding( dat, h.seq = c(0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5), window.fun = window.t.ci, mode = c("reported", "uniform", "zda", "dsr")[1], alt.mode = c("uniform", "reported")[1], make.z.fun = NULL, z0 = ifelse(has.col(dat, "z0"), dat[["z0"]], 1.96), repl = 1, aggregate.fun = "median", ab.df = NULL, z.pdf = NULL, max.s = 100, common.deci = TRUE, verbose = TRUE )
dat |
a data frame containing all observations. Each observation is a test from a regression table in some article. It must have the columns |
h.seq |
All considered half-window sizes |
window.fun |
The function that computes for each draw of a derounded z vector and a window h the statistics of interest. Examples are |
mode |
Mode how a single draw of derounded z is computed: "reported", "uniform","zda","dsr" or some custom name (requires ab.df to be defined) |
alt.mode |
Either "uniform" (DEFAULT) or "reported". Some derounding modes like "zda" and "dsr" cannot be well defined (or are too time-consuming to compute) for observations with many significant digits or outlier z-statistics. |
z0 |
The significance threshold for z |
repl |
Number of replications of each derounding draw. |
aggregate.fun |
How shall multiple replications be aggregated. Not yet implemented. Currently we always take the medians of each variale returned by window.fun of all replications. |
ab.df |
Required if |
z.pdf |
Required if |
max.s |
Used if |
common.deci |
Shall we assume that mu and sigma are given with the same number of decimal places. If |
Apply on windows one-sided binomiminal test with H0: z <= z0
## S3 method for class 'binom.test' window(above = z >= z0, h = NA, ci.level = 0.95, z, z0, ...)
## S3 method for class 'binom.test' window(above = z >= z0, h = NA, ci.level = 0.95, z, z0, ...)
Apply on windows two sided binomiminal test with H0: z = z0
## S3 method for class 'binom.test.2s' window(above = z >= z0, h = NA, ci.level = 0.9, z, z0, ...)
## S3 method for class 'binom.test.2s' window(above = z >= z0, h = NA, ci.level = 0.9, z, z0, ...)
Can be used as argument window.fun
in compute.with.derounding
## S3 method for class 't.ci' window(above = z >= z0, h = NA, ci.level = 0.95, z, z0, ...)
## S3 method for class 't.ci' window(above = z >= z0, h = NA, ci.level = 0.95, z, z0, ...)