Fits cumulative link models (CLMs) with the
smooth-effect-on-response penalty (SERP) via a modified Newton-Raphson
algorithm. SERP enables the regularization of the parameter space between
the general and the restricted cumulative models, with a resultant shrinkage
of all subject-specific effects to global effects. The Akaike information
critrion (aic
), K-fold cross validation (cv
), among other tuning
aproaches, provide the means of arriving at an optimal tuning parameter in a
in a situation where a user-supplied tuning value is not available.
The slope
argument allows for the selection of a penalized, unparallel,
parallel, or partial slope.
serp(
formula,
link = c("logit", "probit","loglog", "cloglog", "cauchit"),
slope = c("penalize", "parallel", "unparallel", "partial"),
tuneMethod = c("aic", "cv", "finite", "user"),
reverse = FALSE,
lambdaGrid = NULL,
cvMetric = c("brier", "logloss", "misclass"),
gridType = c("discrete", "fine"),
globalEff = NULL,
data,
subset,
weights = NULL,
weight.type = c("analytic", "frequency"),
na.action = NULL,
lambda = NULL,
contrasts = NULL,
control = list(),
...)
regression formula of the form: response ~ predictors. The response should be a factor (ordered).
sets the link function for the cumulative link model including: logit, probit, complementary log-log, cloglog, cauchit.
selects the form of coefficients used in the model, with
penalize
denoting the penalized coefficients, unparallel
,
parallel
and partial
denoting the unpenalized non-parallel,
parallel and semi-parallel coefficients respectively.
sets the method of choosing an optimal shrinkage
parameter, including: aic
, cv
, finite
and
user
. i.e., the lambda value along parameter shrinkage path at
which the fit's AIC or the k-fold cross-validated test error is
minimal. The finite tuning is used to obtain the model along parameter
shrinkage for which the log-Likelihood exist (is finite). The 'user'
tuning supports a user-supplied lambda value.
false by default, when true the sign of the linear predictor is reversed.
optional user-supplied lambda grid for the aic
,
and cv
tuning methods, when the discrete gridType
is chosen. Negative range of values are not allowed. A short lambda grid
could increase computation time assuming large number of predictors and
cases in the model.
sets the performance metric for the cv tuning, with the brier score used by default.
chooses if a discrete or a continuous lambda grid should be
used to select the optimal tuning parameter. The former is used by default
and could be adjusted as desired in serp.control
. The latter
is on the range (0, maxPen
). A user-supplied grid is also possible,
which automatically overrides the internal grid.
specifies variable(s) to be assigned global effects during
penalization or when slope
is set to partial
. Variables are
specified as a formula with an empty left hand side, for instance,
globalEff = ~predictors.
optional dataframe explaining the variables used in the formula.
specifies which subset of the rows of the data should be used for fit. All observations are used by default.
optional case weights in fitting. Negative weights are not allowed. Defaults to 1.
chooses between analytic and frequency weights with the former used by default. The latter should be used when weights are mere case counts used to compress the data set.
a function to filter missing data.
a user-supplied single numeric value for the tuning parameter
when using the user
tuning method. Negative values are not
allowed.
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula.
A list of fit control parameters to replace default values
returned by serp.control
. Values not set assume default values.
additional arguments.
the akaike information criterion, with effective degrees of freedom obtained from the trace of the generalized hat matrix depending on the tuning parameter.
the bayesian information criterion, with effective degrees of freedom obtained from the trace of the generalized hat matrix depending on the tuning parameter.
the matched call.
a vector of coefficients of the fitted model.
a character vector of fit convergence status.
(where relevant) the contrasts used in the model.
list of control parameters from serp.control
.
the performance metric used for cv tuning.
the residual deviance.
the (effective) number of degrees of freedom used by the model
the fitted probabilities.
variable(s) in model treated as global effect(s)
a column vector of gradients for the coefficients at the model convergence.
the hessian matrix for the coefficients at the model convergence.
number of interactions before convergence or non-convergence.
a user-supplied single numeric value for the user
tuning tuning method.
a numeric vector of lambda values used to determine the optimum tuning parameter.
the realized log-likelihood at the model convergence.
character vector indicating the link function of the fit.
character vector stating the type of convergence obtained
a list to hold miscellaneous fit information.
model.frame having variables from formula.
(where relevant) information on the treatment of NAs.
the number of observations.
the number of k-fold cross validation for the cv tuning method. Default to k = 5.
the residual degrees of freedom
a logical vector indicating the the direction of the cumulative probabilities. Default to P(Y<=r).
a character vector indicating the type of slope parameters
fitted. Default to penalize
.
the terms structure describing the model.
numeric value of the cross-validated test error at which the optimal tuning parameter emerged.
a character vector specifying the method for choosing an optimal shrinkage parameter.
numeric value of AIC or logLik obtained at the optimal tuning
parameter when using aic
or finite
tuning methods respectively.
the number of the response levels.
The serp
function fits the cumulative link model (CLM)
with smooth-effect-on-response penalty (SERP). The cumulative
model developed by McCullagh (1980) is probably most frequently
used ordinal model. When motivated by an underlying latent
variable, a simple form of the model is expressed as follows:
$$P(Y\leq r|x) = F(\delta_{0r} + x^T\delta)$$
where \(x\) is a vector of covariates, \(\delta\) a vector of regression parameters and \(F\) a continuous distribution function. This model assumes that the effect of \(x\) does not depend on the category. However, with this assumption relaxed, one obtains the following general cumulative model:
$$P(Y\leq r|x) = F(\delta_{0r} + x^T\delta_{r}),$$
where r=1,...,k-1. This model, however, has the stochastic ordering property, which implies that \(P(Y\leq r-1|x) < P(Y\leq r|x)\) holds for all \(x\) and all categories \(r\). Such assumption is often problematic, resulting in unstable likelihoods with ill-conditioned parameter space during the iterative procedure.
SERP offers a means of arriving at stable estimates of the general model. It provides a form of regularization that is based on minimizing the penalized log-likelihood:
$$l_{p}(\delta)=l(\delta)-J_{\lambda}(\delta)$$
where \(l(\delta)\), is the log-likelihood of the general cumulative model and \(J_{\lambda}(\delta)=\lambda J(\delta)\) the penalty function weighted by the turning parameter \(\lambda\). Assuming an ordered categorical outcome \(Y \in \{1,\dots,k\}\), and considering that the corresponding parameters \(\delta_{1j},\dots \delta_{k-1,j}\) vary smoothly over the categories, the following penalty (Tutz and Gertheiss, 2016),
$$J_{\lambda}(\delta)= \sum_{j=1}^{p} \sum_{r=1}^{k-2} (\delta_{r+1,j}-\delta_{rj})^{2}$$
enables the smoothing of response categories such that all category-specific effects associated with the response turn towards a common global effect. SERP could also be applied to a semi-parallel model with only the category-specific part of the model penalized. See, Ugba (2021), Ugba et al. (2021) for further details and application in empirical studies.
An object of class serp
with the components listed below,
depending on the type of slope modeled. Other summary methods include:
summary
, coef
, predict
, vcov
,
anova
, etc.
Ugba, E. R. (2021). serp: An R package for smoothing in ordinal regression Journal of Open Source Software, 6(66), 3705. https://doi.org/10.21105/joss.03705
Ugba, E. R., Mörlein, D. and Gertheiss, J. (2021). Smoothing in Ordinal Regression: An Application to Sensory Data. Stats, 4, 616–633. https://doi.org/10.3390/stats4030037
Tutz, G. and Gertheiss, J. (2016). Regularized Regression for Categorical Data (With Discussion and Rejoinder). Statistical Modelling, 16, pp. 161-260. https://doi.org/10.1177/1471082X16642560
McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society. Series B (Methodological), 42, pp. 109-142. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x
require(serp)
## The unpenalized non-proportional odds model returns unbounded estimates, hence,
## not fully identifiable.
f1 <- serp(rating ~ temp + contact, slope = "unparallel",
reverse = TRUE, link = "logit", data = wine)
coef(f1)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4 tempwarm:1
#> 1.225774 -1.032996 -3.946377 -19.184272 19.242728
#> tempwarm:2 tempwarm:3 tempwarm:4 contactyes:1 contactyes:2
#> 2.110849 2.940411 17.063644 1.659372 1.342884
#> contactyes:3 contactyes:4
#> 1.692808 1.162201
## The penalized non-proportional odds model with a user-supplied lambda gives
## a fully identified model with bounded estimates. A suitable tuning criterion
## could as well be used to select lambda (e.g., aic, cv)
f2 <- serp(rating ~ temp + contact, slope = "penalize",
link = "logit", reverse = TRUE, tuneMethod = "user",
lambda = 1e1, data = wine)
coef(f2)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4 tempwarm:1
#> 1.342671 -1.236068 -3.513122 -5.050240 2.518993
#> tempwarm:2 tempwarm:3 tempwarm:4 contactyes:1 contactyes:2
#> 2.461986 2.554007 2.609441 1.526984 1.518964
#> contactyes:3 contactyes:4
#> 1.536791 1.465075
## A penalized partial proportional odds model with some variables set to
## global effect is also possible.
f3 <- serp(rating ~ temp + contact, slope = "penalize",
reverse = TRUE, link = "logit", tuneMethod = "user",
lambda = 2e1, globalEff = ~ temp, data = wine)
coef(f3)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4 tempwarm
#> 1.342746 -1.236486 -3.423464 -4.993496 2.070177
#> contactyes:1 contactyes:2 contactyes:3 contactyes:4
#> 1.832944 1.857316 1.921398 1.971866
## The unpenalized proportional odds model having constrained estimates can
## as well be fit. Under extreme shrinkage, estimates in f2 equal those in
## this model.
f4 <- serp(rating ~ temp + contact, slope = "parallel",
reverse = FALSE, link = "logit", data = wine)
summary(f4)
#>
#> call:
#> serp(formula = rating ~ temp + contact, link = "logit", slope = "parallel",
#> reverse = FALSE, data = wine)
#>
#> Coefficients:
#> Estimate Std Error z value Pr(>|z|)
#> (Intercept):1 -1.3444 0.5085 -2.644 0.00820 **
#> (Intercept):2 1.2508 0.4391 2.849 0.00439 **
#> (Intercept):3 3.4669 0.5971 5.806 6.40e-09 ***
#> (Intercept):4 5.0064 0.7291 6.867 6.56e-12 ***
#> tempwarm -2.5031 0.5320 -4.705 2.54e-06 ***
#> contactyes -1.5278 0.4736 -3.226 0.00126 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Number of iterations: 5
#>
#> Loglik: -86.49192 on 282 degrees of freedom
#>
#> AIC: 184.9838
#>
#> Exponentiated coefficients:
#> tempwarm contactyes
#> 0.08183069 0.21701410