Smooth Effects on Response Penalty for CLM

Fits cumulative link models (CLMs) with the smooth-effect-on-response penalty (SERP) via a modified Newton-Raphson algorithm. SERP enables the regularization of the parameter space between the general and the restricted cumulative models, with a resultant shrinkage of all subject-specific effects to global effects. The Akaike information critrion (aic), K-fold cross validation (cv), among other tuning aproaches, provide the means of arriving at an optimal tuning parameter in a in a situation where a user-supplied tuning value is not available. The slope argument allows for the selection of a penalized, unparallel, parallel, or partial slope.

serp(
     formula,
     link = c("logit", "probit","loglog", "cloglog", "cauchit"),
     slope = c("penalize", "parallel", "unparallel", "partial"),
     tuneMethod = c("aic", "cv", "finite", "user"),
     reverse = FALSE,
     lambdaGrid = NULL,
     cvMetric = c("brier", "logloss", "misclass"),
     gridType = c("discrete", "fine"),
     globalEff = NULL,
     data,
     subset,
     weights = NULL,
     weight.type = c("analytic", "frequency"),
     na.action = NULL,
     lambda = NULL,
     contrasts = NULL,
     control = list(),
     ...)

Arguments

formula: regression formula of the form: response ~ predictors. The response should be a factor (ordered).
link: sets the link function for the cumulative link model including: logit, probit, complementary log-log, cloglog, cauchit.
slope: selects the form of coefficients used in the model, with penalize denoting the penalized coefficients, unparallel, parallel and partial denoting the unpenalized non-parallel, parallel and semi-parallel coefficients respectively.
tuneMethod: sets the method of choosing an optimal shrinkage parameter, including: aic, cv, finite and user. i.e., the lambda value along parameter shrinkage path at which the fit's AIC or the k-fold cross-validated test error is minimal. The finite tuning is used to obtain the model along parameter shrinkage for which the log-Likelihood exist (is finite). The 'user' tuning supports a user-supplied lambda value.
reverse: false by default, when true the sign of the linear predictor is reversed.
lambdaGrid: optional user-supplied lambda grid for the aic, and cv tuning methods, when the discrete gridType is chosen. Negative range of values are not allowed. A short lambda grid could increase computation time assuming large number of predictors and cases in the model.
cvMetric: sets the performance metric for the cv tuning, with the brier score used by default.
gridType: chooses if a discrete or a continuous lambda grid should be used to select the optimal tuning parameter. The former is used by default and could be adjusted as desired in serp.control. The latter is on the range (0, maxPen). A user-supplied grid is also possible, which automatically overrides the internal grid.
globalEff: specifies variable(s) to be assigned global effects during penalization or when slope is set to partial. Variables are specified as a formula with an empty left hand side, for instance, globalEff = ~predictors.
data: optional dataframe explaining the variables used in the formula.
subset: specifies which subset of the rows of the data should be used for fit. All observations are used by default.
weights: optional case weights in fitting. Negative weights are not allowed. Defaults to 1.
weight.type: chooses between analytic and frequency weights with the former used by default. The latter should be used when weights are mere case counts used to compress the data set.
na.action: a function to filter missing data.
lambda: a user-supplied single numeric value for the tuning parameter when using the user tuning method. Negative values are not allowed.
contrasts: a list of contrasts to be used for some or all of the factors appearing as variables in the model formula.
control: A list of fit control parameters to replace default values returned by serp.control. Values not set assume default values.
...: additional arguments.

Value

aic: the akaike information criterion, with effective degrees of freedom obtained from the trace of the generalized hat matrix depending on the tuning parameter.
bic: the bayesian information criterion, with effective degrees of freedom obtained from the trace of the generalized hat matrix depending on the tuning parameter.
call: the matched call.
coef: a vector of coefficients of the fitted model.
converged: a character vector of fit convergence status.
contrasts: (where relevant) the contrasts used in the model.
control: list of control parameters from serp.control.
cvMetric: the performance metric used for cv tuning.
deviance: the residual deviance.
edf: the (effective) number of degrees of freedom used by the model
fitted.values: the fitted probabilities.
globalEff: variable(s) in model treated as global effect(s)
gradient: a column vector of gradients for the coefficients at the model convergence.
Hessian: the hessian matrix for the coefficients at the model convergence.
iter: number of interactions before convergence or non-convergence.
lambda: a user-supplied single numeric value for the user tuning tuning method.
lambdaGrid: a numeric vector of lambda values used to determine the optimum tuning parameter.
logLik: the realized log-likelihood at the model convergence.
link: character vector indicating the link function of the fit.
message: character vector stating the type of convergence obtained
misc: a list to hold miscellaneous fit information.
model: model.frame having variables from formula.
na.action: (where relevant) information on the treatment of NAs.
nobs: the number of observations.
nrFold: the number of k-fold cross validation for the cv tuning method. Default to k = 5.
rdf: the residual degrees of freedom
reverse: a logical vector indicating the the direction of the cumulative probabilities. Default to P(Y<=r).
slope: a character vector indicating the type of slope parameters fitted. Default to penalize.
Terms: the terms structure describing the model.
testError: numeric value of the cross-validated test error at which the optimal tuning parameter emerged.
tuneMethod: a character vector specifying the method for choosing an optimal shrinkage parameter.
value: numeric value of AIC or logLik obtained at the optimal tuning parameter when using aic or finite tuning methods respectively.
ylev: the number of the response levels.

Details

The serp function fits the cumulative link model (CLM) with smooth-effect-on-response penalty (SERP). The cumulative model developed by McCullagh (1980) is probably most frequently used ordinal model. When motivated by an underlying latent variable, a simple form of the model is expressed as follows:

$$P(Y\leq r|x) = F(\delta_{0r} + x^T\delta)$$

where $x$ is a vector of covariates, $\delta$ a vector of regression parameters and $F$ a continuous distribution function. This model assumes that the effect of $x$ does not depend on the category. However, with this assumption relaxed, one obtains the following general cumulative model:

$$P(Y\leq r|x) = F(\delta_{0r} + x^T\delta_{r}),$$

where r=1,...,k-1. This model, however, has the stochastic ordering property, which implies that $P(Y\leq r-1|x) < P(Y\leq r|x)$ holds for all $x$ and all categories $r$. Such assumption is often problematic, resulting in unstable likelihoods with ill-conditioned parameter space during the iterative procedure.

SERP offers a means of arriving at stable estimates of the general model. It provides a form of regularization that is based on minimizing the penalized log-likelihood:

$$l_{p}(\delta)=l(\delta)-J_{\lambda}(\delta)$$

where $l(\delta)$, is the log-likelihood of the general cumulative model and $J_{\lambda}(\delta)=\lambda J(\delta)$ the penalty function weighted by the turning parameter $\lambda$. Assuming an ordered categorical outcome $Y \in \{1,\dots,k\}$, and considering that the corresponding parameters $\delta_{1j},\dots \delta_{k-1,j}$ vary smoothly over the categories, the following penalty (Tutz and Gertheiss, 2016),

$$J_{\lambda}(\delta)= \sum_{j=1}^{p} \sum_{r=1}^{k-2} (\delta_{r+1,j}-\delta_{rj})^{2}$$

enables the smoothing of response categories such that all category-specific effects associated with the response turn towards a common global effect. SERP could also be applied to a semi-parallel model with only the category-specific part of the model penalized. See, Ugba (2021), Ugba et al. (2021) for further details and application in empirical studies.

An object of class serp with the components listed below, depending on the type of slope modeled. Other summary methods include: summary, coef, predict, vcov, anova, etc.

References

Ugba, E. R. (2021). serp: An R package for smoothing in ordinal regression Journal of Open Source Software, 6(66), 3705. https://doi.org/10.21105/joss.03705

Ugba, E. R., Mörlein, D. and Gertheiss, J. (2021). Smoothing in Ordinal Regression: An Application to Sensory Data. Stats, 4, 616–633. https://doi.org/10.3390/stats4030037

Tutz, G. and Gertheiss, J. (2016). Regularized Regression for Categorical Data (With Discussion and Rejoinder). Statistical Modelling, 16, pp. 161-260. https://doi.org/10.1177/1471082X16642560

McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society. Series B (Methodological), 42, pp. 109-142. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x

Examples

require(serp)

## The unpenalized non-proportional odds model returns unbounded estimates, hence,
## not fully identifiable.
f1 <- serp(rating ~ temp + contact, slope = "unparallel",
           reverse = TRUE, link = "logit", data = wine)
coef(f1)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4    tempwarm:1 
#>      1.225774     -1.032996     -3.946377    -19.184272     19.242728 
#>    tempwarm:2    tempwarm:3    tempwarm:4  contactyes:1  contactyes:2 
#>      2.110849      2.940411     17.063644      1.659372      1.342884 
#>  contactyes:3  contactyes:4 
#>      1.692808      1.162201 

## The penalized non-proportional odds model with a user-supplied lambda gives
## a fully identified model with bounded estimates. A suitable tuning criterion
## could as well be used to select lambda (e.g., aic, cv)
f2 <- serp(rating ~ temp + contact, slope = "penalize",
           link = "logit", reverse = TRUE, tuneMethod = "user",
           lambda = 1e1, data = wine)
coef(f2)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4    tempwarm:1 
#>      1.342671     -1.236068     -3.513122     -5.050240      2.518993 
#>    tempwarm:2    tempwarm:3    tempwarm:4  contactyes:1  contactyes:2 
#>      2.461986      2.554007      2.609441      1.526984      1.518964 
#>  contactyes:3  contactyes:4 
#>      1.536791      1.465075 

## A penalized partial proportional odds model with some variables set to
## global effect is also possible.
f3 <- serp(rating ~ temp + contact, slope = "penalize",
           reverse = TRUE, link = "logit", tuneMethod = "user",
           lambda = 2e1, globalEff = ~ temp, data = wine)
coef(f3)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4      tempwarm 
#>      1.342746     -1.236486     -3.423464     -4.993496      2.070177 
#>  contactyes:1  contactyes:2  contactyes:3  contactyes:4 
#>      1.832944      1.857316      1.921398      1.971866 


## The unpenalized proportional odds model having constrained estimates can
## as well be fit. Under extreme shrinkage, estimates in f2 equal those in
## this model.
f4 <-  serp(rating ~ temp + contact, slope = "parallel",
            reverse = FALSE, link = "logit", data = wine)
summary(f4)
#> 
#> call:
#> serp(formula = rating ~ temp + contact, link = "logit", slope = "parallel", 
#>     reverse = FALSE, data = wine)
#> 
#> Coefficients:
#>               Estimate Std Error z value Pr(>|z|)    
#> (Intercept):1  -1.3444    0.5085  -2.644  0.00820 ** 
#> (Intercept):2   1.2508    0.4391   2.849  0.00439 ** 
#> (Intercept):3   3.4669    0.5971   5.806 6.40e-09 ***
#> (Intercept):4   5.0064    0.7291   6.867 6.56e-12 ***
#> tempwarm       -2.5031    0.5320  -4.705 2.54e-06 ***
#> contactyes     -1.5278    0.4736  -3.226  0.00126 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Number of iterations: 5 
#> 
#> Loglik: -86.49192 on 282 degrees of freedom 
#> 
#> AIC: 184.9838
#> 
#> Exponentiated coefficients:
#>   tempwarm contactyes 
#> 0.08183069 0.21701410