Skip to main content


Showing posts from October, 2017

Extracting variables and estimates from SAS prediction models

SAS ods output statements provide a simple alternative to advanced text manipulation techniques.

The example below extracts selected variables from proc hpgenselect and use these in proc corr, proc gampl, which outputs predicted probabilities. A final ods output statement in proc logistic extracts the concordance index, i.e. the measure for second order predictive capabilities of the model given as the probability that two different observations are correctly ordered with respect to the model based risk score.

Extracting selected variables from the hpgenselect procedure using code

Several SAS procedures generate code from which the user may extract critical information.

In the example below I extract variables selected by proc hpgenselect and input these to proc corr and proc gampl without writing intermediate results to the harddrive:

The generated sourcecode 'code' is read a line at the time. Appropriate text lines are kept and concatenated into a single string using retained variables for identification (expr) and text (text).
The variable containing variable names are stored in a macro variable 'variables' and a data file test in the work library. The list may be inspected in both the data file work.test or in the SAS log.

Modeling gender and age adjusted incidence rates

National Health Institute (NHI) provides a tool box for calculation of cancer incidence and percentage change. Their algorithm for Jointpoint Trend Analysis is well-documented but does not provide the best tool at hand for most problems. The normal approximation is not the most optimal choice for situations with a low incidence rate in which I would recommend to apply modern logistic regression algorithms which are far more versatile. In the logistic regression model, either direct or indirect incidence rates are modeled using population numbers by age, year, and gender and number of cases specific to the given year, gender, and age group. There are no missing data if the registry is complete, compared to the normal model in which we cannot use data points for years/age-gender groups with no events and perform simulation to get approximate estimates.

The difference between careful parametrization in a binomial regression model and the plug-and-play functionality of the NHI suite becom…