Skip to main content

Prevalence models in health science

I chose to divide generic prediction models applied in health science and administration into two main groups: Models based on general activity measures such as number of hospitalizations, LOS, number of visits, cost, diagnose groups, age, geography and other background information. A second and neglected group of models is based on prevalence of specific activity measures common for a substantial part of the population in question. Prediction models in health take advantage of RFM-I methodology from market analysis, which have previously been mentioned in posts on SAS macros on this blog, below I discuss the simplicity of prevalence models.

Prevalence models have my special attention as pivot for machine learning and deep learning models. Prevalence models include indicators on activity common among 1%, 5% or 10% of a population, e.g. diagnoses, operations and procedures common to 1% of the patients from a ward with a retroperspective ranging from months to years. Background information on age, gender, geography, total cost etc may be added, furthermore and more importantly a clinical specialist may request addition or exclusion of operations, procedures and have other demands for quantitative measures mirroring the clinical developmental program of a specialization. Prevalence models offer very flexible modelling frameworks for quality analysis and decision support tools in the clinic.

In the result below I define a population of patients visiting a ward within a particular month. Then I add information on their activity patterns from the LPR (Danish National Health Register) in 2 years retroperspective and information on whether they are hospitalized (acute) within the next month. Indicators are defined using a short dummy-variable coding function and aggregated with ML techniques. The R-function use a key-variable V_CPR, and needs to be adapted before it is applicable in other settings...

dummyl <- function(data, varname, vallevels,datevar,evaldate){
data<-data[trimws(data[[which(names(data)==varname)]]) %in% vallevels,]
for(i in 1:length(vallevels)){
A 200 line code script generates a fairy good raw prevalence model for prediction of acute hospitalization with a AUC above 0.92, the probability of aligning a pair of patients correct based on estimated risk for acute hospitalization is very high. Least squares and subsequently logistic regression makes a solid foundation for a stable and adjustable prediction model.

#Example of usage, generating indicators for 5% prevalence model used for accumalating measures in regression analysis
The data extraction and manipulation uses SQL and ML R-packages RODBC, tidyr, stringr and dplyr. Estimation requires basic R algorithms and GLM modeling. 


Popular posts from this blog

HackRF on Windows 8

This technical note is based on  an extract from thread . I have made several changes and added recommendations. I have experienced lot of latency using GnuRadio and HackRF on Pentoo Linux, so I wanted to try out GnuRadio on Windows. HackRF One is a transceiver, so besides SDR capabilities, it can also transmit signals, inkluding sweeping a given range, uniform and Gaussian signals. Pentoo Linux provides the most direct access to HackRF and toolboxes. Install Pentoo Linux on a separate drive, then you can use osmocom_siggen from a terminal to transmit signals such as near-field GSM bursts, which will only be detectable within a meter. Installation of MGWin and cmake: Download and install the following packages: - MinGW Setup (Go to the Installer directory and download setup file) - CMake (I am using CMake 3.2.2 and I installed it in C:\CMake, this path is important in the commands we must send in the MinGW shell) Download and extract the packages

Alder/korrekt århundrede udfra cpr nummer

De fleste, der arbejder med registre eller databaser, står ofte med problemstillingen, at alder er uoplyst, medens cpr-nummer er kendt. Hvordan regner man den ud? Følgende regel er gældende: Hvis syvende ciffer er 0, 1, 2 eller 3 er man født i det 20. århunderede (1900-tallet) Ligeledes, hvis syvende ciffer er 4 eller 9, og årstallet (femte og sjette ciffer) er større end eller lig 37. Endelig er man født i det 19. århundrede (1800-tallet) hvis syvende ciffer er 5, 6, 7 eller 8 og årstallet er større end eller lig 58. Nedenfor finder du eksempel i SAS kode: En lille makro, der udover fødselsdato også udregner køn samt den præcise alder givet datovariabel. Kilde: Opbygning af CPR nummeret, proc format library=work; value gender 0="Female" 1="Male" ; run; %macro agefromCPR(cpr,datevar=inddto,birthvar=birth,agevar=age); dy_temp=input(substrn(&cpr,1,2),2.); mt_temp=input(substrn(&cpr,3,2),2.); yr_temp=input(substrn(&cpr,5,2),

Comorbidity indexes in SQL

Generating Elixhauser comorbidity index from Danish National Health Register as relational database. ( ICD 10 Coding  in SAS) A lookup-table based version of Charlson comorbidity index I made in SQL. A similar approach can be applied to Elixhauser. SELECT V_CPR, MAX(EI1)+MAX(EI2)+MAX(EI3)+MAX(EI4)+MAX(EI5)+ MAX(EI6)+MAX(EI7)+MAX(EI8)+MAX(EI9)+MAX(EI10)+ MAX(EI11)+MAX(EI12)+MAX(EI13)+MAX(EI14)+MAX(EI15)+ MAX(EI16)+MAX(EI17)+MAX(EI18)+MAX(EI19)+MAX(EI20)+ MAX(EI21)+MAX(EI22)+MAX(EI23)+MAX(EI24)+MAX(EI25)+ MAX(EI26)+MAX(EI27)+MAX(EI28)+MAX(EI29)+MAX(EI30)+MAX(EI31) AS Elixhauser FROM (SELECT V_CPR, -- Congestive Heart Failure CASE WHEN DIAG LIKE 'DI099%' OR DIAG LIKE 'DI110%' OR DIAG LIKE 'DI130%' OR DIAG LIKE 'DI132%' OR DIAG LIKE 'DI255%' OR DIAG LIKE 'DI420%' OR DIAG LIKE 'DI425%' OR DIAG LIKE 'DI426%' OR DIAG LIKE 'DI427%' OR DIAG LIKE 'DI428%' OR DIAG LIKE 'DI429%' OR D