Skip to main content

Posts

Showing posts from October, 2017

Extracting variables and estimates from SAS prediction models

SAS ods output statements provide a simple alternative to advanced text manipulation techniques.

The example below extracts selected variables from proc hpgenselect and use these in proc corr, proc gampl, which outputs predicted probabilities. A final ods output statement in proc logistic extracts the concordance index, i.e. the measure for second order predictive capabilities of the model given as the probability that two different observations are correctly ordered with respect to the model based risk score.


Extracting selected variables from the hpgenselect procedure using code

Several SAS procedures generate code from which the user may extract critical information.

In the example below I extract variables selected by proc hpgenselect and input these to proc corr and proc gampl without writing intermediate results to the harddrive:



The generated sourcecode 'code' is read a line at the time. Appropriate text lines are kept and concatenated into a single string using retained variables for identification (expr) and text (text).
The variable containing variable names are stored in a macro variable 'variables' and a data file test in the work library. The list may be inspected in both the data file work.test or in the SAS log.

Modeling gender and age adjusted incidence rates

National Health Institute (NHI) provides a tool box for calculation of cancer incidence and percentage change. Their algorithm for Jointpoint Trend Analysis is well-documented but does not provide the best tool at hand for most problems. The normal approximation is not the most optimal choice for situations with a low incidence rate in which I would recommend to apply modern logistic regression algorithms which are far more versatile.


The difference between careful parametrization in a binomial regression model and the plug-and-play functionality of the NHI suite becomes obvious in an example in which we look at cancers in children. Data source: NORDCAN

Logistic regression models. Joint point model (left) using stepwise linear gender specific regression models and polynomial models (right) using gender specific polynomial regression models.


Graphs with gender specific 95% prediction limits


R-script Data Extraction
NORDCAN.r
SAS program
ExampleNORDCAN.sas
ExampleJointPoint.sas
Joint Poin…