Skip to main content


Prevalence models in health science

I chose to divide generic prediction models applied in health science and administration into two main groups: Models based on general activity measures such as number of hospitalizations, LOS, number of visits, cost, diagnose groups, age, geography and other background information. A second and neglected group of models is based on prevalence of specific activity measures common for a substantial part of the population in question. Prediction models in health take advantage of RFM-I methodology from market analysis, which have previously been mentioned in posts on SAS macros on this blog, below I discuss the simplicity of prevalence models.

Prevalence models have my special attention as pivot for machine learning and deep learning models. Prevalence models include indicators on activity common among 1%, 5% or 10% of a population, e.g. diagnoses, operations and procedures common to 1% of the patients from a ward with a retroperspective ranging from months to years. Background informat…
Recent posts

Crowd monitoring and performance analysis

A job for a statistician in the era of data science driven progress within businesses reflecting all aspects of life: Crowd and performance analysis.

Using the Microsoft Azure API for analysis of crowd monitoring input pictures. Measuring levels of general emotions like happiness, sadness, surprise, anger, disgust, fear, contempt and neutral as well as gender, age and measures like baldness. Combining from several cameras to measure levels over time comparing with set list, place in crowd, mood on stage and background variables like size of venue, city and socio-demographic info. You can name a countless number of applications in a very dynamic and business oriented context.

Crowd displaying surprise, happiness and neutral. Ages 34, 50, 51, 28, 29 and 22.

Andy Clayton on stage displaying 71% neutral and 28.9% sadness while Bono is 98% happy.

Illustrative plots and tables (SAS macro)

Basic SAS macros for basic summary statistics and illustrative plots.

Features ttests, Van der Waerden and Wilcoxon tests for continuous variables and both chisq and Fisher tests for categorical variables.

Overall tests may be based on accumulated measures such as average integrated values, which are interpretable on original scale:
%macro averageIntegral(values,timepoints,intlength,retval);
%local count;
 %let count=0;
 %let time_old=0;
 %let val_old=0;
 %do %while(%qscan(&values,&count+1,%str( )) ne %str());
 %let time=%scan(&timepoints,&count+1,%str( )); 
 %let val=%scan(&values,&count+1,%str( )); 
 %if &count GT 0 %then +(&time-&time_old)*(&val_old+&val)/2;
 %let time_old=&time;
 %let val_old=&val;
 %let count=%eval(&count+1);

Table output for continuous data (contrasts between groups are evaluated using t-tests, van der waerden and Wilcoxon)

categorical data table (both chisq and Fishers…

Flag of the liberal revolution

Macro scheme for generation of RFM-I data

Recency, frequency, monetary - interaction (RFM-I) are guiding principles for extracting customer data in the process of generating segments and prediction models. Managing and processing of RFM-I data are often the most time consuming part compared to the analysis. 

We often see various kinds of tables gathered from different relational databases and a need to scan those tables record-by-record. SAS is an appropriate choice for even huge data tables due to the principles governing the implementation of the the SAS engine.

Below a pseudo code macro extracted from an existing code base combining several data sources. The programming language is SAS, since SAS offers great transparency and robustness. I considered both proc sql, hash look-up tables and combinations of data steps and application of proc means, implemented those alternatives and concluded division of data into presorted historic data tables and present data tables lead to both transparency and speed-up of execution of cod…

Extracting variables and estimates from SAS prediction models

SAS ods output statements provide a simple alternative to advanced text manipulation techniques.

The example below extracts selected variables from proc hpgenselect and use these in proc corr, proc gampl, which outputs predicted probabilities. A final ods output statement in proc logistic extracts the concordance index, i.e. the measure for second order predictive capabilities of the model given as the probability that two different observations are correctly ordered with respect to the model based risk score.

Extracting selected variables from the hpgenselect procedure using code

Several SAS procedures generate code from which the user may extract critical information.

In the example below I extract variables selected by proc hpgenselect and input these to proc corr and proc gampl without writing intermediate results to the harddrive:

The generated sourcecode 'code' is read a line at the time. Appropriate text lines are kept and concatenated into a single string using retained variables for identification (expr) and text (text).
The variable containing variable names are stored in a macro variable 'variables' and a data file test in the work library. The list may be inspected in both the data file work.test or in the SAS log.