Skip to main content

Prevalence models in health science

I chose to divide generic prediction models applied in health science and administration into two main groups: Models based on general activity measures such as number of hospitalizations, LOS, number of visits, cost, diagnose groups, age, geography and other background information. A second and neglected group of models is based on prevalence of specific activity measures common for a substantial part of the population in question. Prediction models in health take advantage of RFM-I methodology from market analysis, which have previously been mentioned in posts on SAS macros on this blog, below I discuss the simplicity of prevalence models.

Prevalence models have my special attention as pivot for machine learning and deep learning models. Prevalence models include indicators on activity common among 1%, 5% or 10% of a population, e.g. diagnoses, operations and procedures common to 1% of the patients from a ward with a retroperspective ranging from months to years. Background information on age, gender, geography, total cost etc may be added, furthermore and more importantly a clinical specialist may request addition or exclusion of operations, procedures and have other demands for quantitative measures mirroring the clinical developmental program of a specialization. Prevalence models offer very flexible modelling frameworks for quality analysis and decision support tools in the clinic.


In the result below I define a population of patients visiting a ward within a particular month. Then I add information on their activity patterns from the LPR (Danish National Health Register) in 2 years retroperspective and information on whether they are hospitalized (acute) within the next month. Indicators are defined using a short dummy-variable coding function and aggregated with ML techniques. The R-function use a key-variable V_CPR, and needs to be adapted before it is applicable in other settings...


       
dummyl <- function(data, varname, vallevels,datevar,evaldate){
data<-data[trimws(data[[which(names(data)==varname)]]) %in% vallevels,]
df<-as.data.frame(matrix(0,nrow(data),2*length(vallevels))
for(i in 1:length(vallevels)){
df[,c(i,i+length(vallevels))]<-c(1.0*(trimws(data[[which(names(data)==varname)]])==trimws(vallevels[i])),log(as.integer(as.Date(evaldate)as.Date(data[[which(names(data)==datevar)]]))))
df[1.0*(trimws(data[[which(names(data)==varname)]])==trimws(vallevels[i]))<1,i+length(vallevels)]<-rep(NA,sum(1.0*(trimws(data[[which(names(data)==varname)]])==trimws(vallevels[i]))<1))
names(df)[c(i,i+length(vallevels))]<-c(vallevels[i],paste0(c(vallevels[i],"_dto"),sep="",collapse=""))
}
dt<-setorderv(cbind.data.frame(data,df),c("V_CPR",varname,datevar),c(1,1,-1))
return(dt)
}
       
 
A 200 line code script generates a fairy good raw prevalence model for prediction of acute hospitalization with a AUC above 0.92, the probability of aligning a pair of patients correct based on estimated risk for acute hospitalization is very high. Least squares and subsequently logistic regression makes a solid foundation for a stable and adjustable prediction model.

#Example of usage, generating indicators for 5% prevalence model used for accumalating measures in regression analysis
temp<-unique(Dat[,c("V_CPR","val")])
tbl<-table(temp$val)
lvs_5pct<-names(table(temp$val)[tbl>5*length(cprnr)/100])
lvs_5pct
dt<-dummyl(Dat,"val",lvs_5pct,"date","2017-12-01")
       
 
The data extraction and manipulation uses SQL and ML R-packages RODBC, tidyr, stringr and dplyr. Estimation requires basic R algorithms and GLM modeling. 


Comments

Popular posts from this blog

Alder/korrekt århundrede udfra cpr nummer

De fleste, der arbejder med registre eller databaser, står ofte med problemstillingen, at alder er uoplyst, medens cpr-nummer er kendt. Hvordan regner man den ud? Følgende regel er gældende: Hvis syvende ciffer er 0, 1, 2 eller 3 er man født i det 20. århunderede (1900-tallet) Ligeledes, hvis syvende ciffer er 4 eller 9, og årstallet (femte og sjette ciffer) er større end eller lig 37. Endelig er man født i det 19. århundrede (1800-tallet) hvis syvende ciffer er 5, 6, 7 eller 8 og årstallet er større end eller lig 58. Nedenfor finder du eksempel i SAS kode: En lille makro, der udover fødselsdato også udregner køn samt den præcise alder givet datovariabel. Kilde: Opbygning af CPR nummeret, cpr.dk proc format library=work; value gender 0="Female" 1="Male" ; run; %macro agefromCPR(cpr,datevar=inddto,birthvar=birth,agevar=age); dy_temp=input(substrn(&cpr,1,2),2.); mt_temp=input(substrn(&cpr,3,2),2.); yr_temp=input(substrn(&cpr,5,2),

HackRF on Windows 8

This technical note is based on  an extract from thread . I have made several changes and added recommendations. I have experienced lot of latency using GnuRadio and HackRF on Pentoo Linux, so I wanted to try out GnuRadio on Windows. HackRF One is a transceiver, so besides SDR capabilities, it can also transmit signals, inkluding sweeping a given range, uniform and Gaussian signals. Pentoo Linux provides the most direct access to HackRF and toolboxes. Install Pentoo Linux on a separate drive, then you can use osmocom_siggen from a terminal to transmit signals such as near-field GSM bursts, which will only be detectable within a meter. Installation of MGWin and cmake: Download and install the following packages: - MinGW Setup (Go to the Installer directory and download setup file) - CMake (I am using CMake 3.2.2 and I installed it in C:\CMake, this path is important in the commands we must send in the MinGW shell) Download and extract the packages

Comorbidity indexes in SQL

Generating Elixhauser comorbidity index from Danish National Health Register as relational database. ( ICD 10 Coding  in SAS) A lookup-table based version of Charlson comorbidity index I made in SQL. A similar approach can be applied to Elixhauser. SELECT V_CPR, MAX(EI1)+MAX(EI2)+MAX(EI3)+MAX(EI4)+MAX(EI5)+ MAX(EI6)+MAX(EI7)+MAX(EI8)+MAX(EI9)+MAX(EI10)+ MAX(EI11)+MAX(EI12)+MAX(EI13)+MAX(EI14)+MAX(EI15)+ MAX(EI16)+MAX(EI17)+MAX(EI18)+MAX(EI19)+MAX(EI20)+ MAX(EI21)+MAX(EI22)+MAX(EI23)+MAX(EI24)+MAX(EI25)+ MAX(EI26)+MAX(EI27)+MAX(EI28)+MAX(EI29)+MAX(EI30)+MAX(EI31) AS Elixhauser FROM (SELECT V_CPR, -- Congestive Heart Failure CASE WHEN DIAG LIKE 'DI099%' OR DIAG LIKE 'DI110%' OR DIAG LIKE 'DI130%' OR DIAG LIKE 'DI132%' OR DIAG LIKE 'DI255%' OR DIAG LIKE 'DI420%' OR DIAG LIKE 'DI425%' OR DIAG LIKE 'DI426%' OR DIAG LIKE 'DI427%' OR DIAG LIKE 'DI428%' OR DIAG LIKE 'DI429%' OR D