Skip to main content

Macro scheme for generation of RFM-I data

Recency, frequency, monetary - interaction (RFM-I) are guiding principles for extracting customer data in the process of generating segments and prediction models. Managing and processing of RFM-I data are often the most time consuming part compared to the analysis. 

We often see various kinds of tables gathered from different relational databases and a need to scan those tables record-by-record. SAS is an appropriate choice for even huge data tables due to the principles governing the implementation of the the SAS engine.

Below a pseudo code macro extracted from an existing code base combining several data sources. The programming language is SAS, since SAS offers great transparency and robustness. I considered both proc sql, hash look-up tables and combinations of data steps and application of proc means, implemented those alternatives and concluded division of data into presorted historic data tables and present data tables lead to both transparency and speed-up of execution of code using proc sort and a single data step with a clever application of retain statements and small helpful macro scripts.
The actual application included gathering of population data and restriction of those data through a set of inclusion and exclusion criteria, then a rerun with an augmented set of tables. A lookup-table thus seemed appropriate ... I learned a simple macro generating a format identifying persons from the previously generated population list was the fastest and most elegant approach. Formatting using proc format and SAS macro language is also the preferred choice in the generation of complex indicators based on continuous or factor coded variables, although PRXMATCH and regular expressions should be considered.

The macro generates data suited for analysis using least-squares or logistic regression models. A similar macro scheme may be constructed for application of recurrent event Cox regression models.

Link to SAS-program file.


Popular posts from this blog

HackRF on Windows 8

This technical note is based on  an extract from thread . I have made several changes and added recommendations. I have experienced lot of latency using GnuRadio and HackRF on Pentoo Linux, so I wanted to try out GnuRadio on Windows. HackRF One is a transceiver, so besides SDR capabilities, it can also transmit signals, inkluding sweeping a given range, uniform and Gaussian signals. Pentoo Linux provides the most direct access to HackRF and toolboxes. Install Pentoo Linux on a separate drive, then you can use osmocom_siggen from a terminal to transmit signals such as near-field GSM bursts, which will only be detectable within a meter. Installation of MGWin and cmake: Download and install the following packages: - MinGW Setup (Go to the Installer directory and download setup file) - CMake (I am using CMake 3.2.2 and I installed it in C:\CMake, this path is important in the commands we must send in the MinGW shell) Download and extract the packages

Alder/korrekt århundrede udfra cpr nummer

De fleste, der arbejder med registre eller databaser, står ofte med problemstillingen, at alder er uoplyst, medens cpr-nummer er kendt. Hvordan regner man den ud? Følgende regel er gældende: Hvis syvende ciffer er 0, 1, 2 eller 3 er man født i det 20. århunderede (1900-tallet) Ligeledes, hvis syvende ciffer er 4 eller 9, og årstallet (femte og sjette ciffer) er større end eller lig 37. Endelig er man født i det 19. århundrede (1800-tallet) hvis syvende ciffer er 5, 6, 7 eller 8 og årstallet er større end eller lig 58. Nedenfor finder du eksempel i SAS kode: En lille makro, der udover fødselsdato også udregner køn samt den præcise alder givet datovariabel. Kilde: Opbygning af CPR nummeret, proc format library=work; value gender 0="Female" 1="Male" ; run; %macro agefromCPR(cpr,datevar=inddto,birthvar=birth,agevar=age); dy_temp=input(substrn(&cpr,1,2),2.); mt_temp=input(substrn(&cpr,3,2),2.); yr_temp=input(substrn(&cpr,5,2),

Comorbidity indexes in SQL

Generating Elixhauser comorbidity index from Danish National Health Register as relational database. ( ICD 10 Coding  in SAS) A lookup-table based version of Charlson comorbidity index I made in SQL. A similar approach can be applied to Elixhauser. SELECT V_CPR, MAX(EI1)+MAX(EI2)+MAX(EI3)+MAX(EI4)+MAX(EI5)+ MAX(EI6)+MAX(EI7)+MAX(EI8)+MAX(EI9)+MAX(EI10)+ MAX(EI11)+MAX(EI12)+MAX(EI13)+MAX(EI14)+MAX(EI15)+ MAX(EI16)+MAX(EI17)+MAX(EI18)+MAX(EI19)+MAX(EI20)+ MAX(EI21)+MAX(EI22)+MAX(EI23)+MAX(EI24)+MAX(EI25)+ MAX(EI26)+MAX(EI27)+MAX(EI28)+MAX(EI29)+MAX(EI30)+MAX(EI31) AS Elixhauser FROM (SELECT V_CPR, -- Congestive Heart Failure CASE WHEN DIAG LIKE 'DI099%' OR DIAG LIKE 'DI110%' OR DIAG LIKE 'DI130%' OR DIAG LIKE 'DI132%' OR DIAG LIKE 'DI255%' OR DIAG LIKE 'DI420%' OR DIAG LIKE 'DI425%' OR DIAG LIKE 'DI426%' OR DIAG LIKE 'DI427%' OR DIAG LIKE 'DI428%' OR DIAG LIKE 'DI429%' OR D