Skip to main content


Showing posts from 2019

Simulation of hemophilia bleeds

A SAS simulation study describing bleeds in hemophilia type A and B patients. Based on the "proof-of-concept study" Randomized, prospective clinical trial of recombinant factor VIIa for secondary prophylaxis in hemophilia patients with inhibitors by Konkle et. al. published in Journal of Thrombosis and Haemostasis, 5: 1904-1913 in 2007. The epidemiology and design of the study are not in accordance with CONSORT and SPIRIT as of December 2019. We observe 22 included patients (11 in each group) allocated from 20 sites in 11 countries. Neither the randomization nor the analysis is stratified by center, and it is possible to question the logistics of the experiment. People have asked if a condition on the number of bleeds (above 4 for each month in the pre-prophylaxis period) introduces bias: The study is a comparison of the effect of dose-difference not the effect of the drug, although the mean of bleeds in the first period is obviously higher than in the second period. Suspic

Indicators from numeric factor variables

The macro call below will generate indicators for all factor levels of xfaktor from the data set regdata represented by at least 5 observations. I have developed the macroes to avoid class variables in SAS regression analyses. %indicator(xfaktor,regdata,5); /* Macro to generate dummy variables. */ %MACRO GETCON;   %DO I = 1 %TO &N;   %IF &&M&I NE 0 %THEN %PUT &factor&I;   %IF &&M&I = 0 %THEN %GOTO OUT;     IF &factor = &&M&I THEN &factor&I = 1;     ELSE &factor&I = 0;   %OUT: %END; %MEND GETCON; %macro indicator(factor,dataset,size); data tbl_data_; set &dataset(keep=&factor); run; proc sort data=tbl_data_; by &factor; run; data tbl_; set tbl_data_; by &factor; retain sz; if first.&factor then sz=1; else sz=sz+1; if last.&factor then output; run; PROC SORT DATA=tbl_(where=(sz GE &size)) OUT=UNIQUE NODUPKEY;   BY &factor; RUN; /* Assign the largest value o

Graphics in SAS SGPLOT illustrating ANOVA analysis results.

Output and graphics from statistical programming packages are often time-consuming to read and interpret. In peer-reviewed publications you usually provide both a written assessment, tables and graphics illustrating data and analysis results. ANOVA analysis is still a very common analysis technique and it is possible to beautify the output from analysis using PROC SGPLOT ods graphics; proc format lib=work; value timevar 12='0-12 hrs' 18='12-18 hrs' 24='18-24 hrs' 32='Cumulated 0-24 hrs' ; run; proc sql;   create table estimates     (  Treatment char(12) label='Treatment Group'      , Time      num      label='Visit number'      , Time2     num      label='Visit number'      , TimeVar   char(18) label='Visit number'      , Mark      char(8)  label='p values'      , Est       num      label='Est'      , LCL       num      label='LCL'      , UCL       num      label='UCL'  

T tests in censored normal distribution

T-tests assume normal distributed random variates. We experience designs in which data take on a particular value and are otherwise normal distributed on a half axis. It might be the case in experiments in which a regimen is only applied due to symptoms or the participant's request. In such case a censored normal distribution assumption will increase the strength of the experiment, i.e. statistical power will be increased and assumptions of an underlying normal distribution will be easier to justify. The suggested test is based on the chi-square likelihood ratio test. The example is two independent groups of identical independently distributed samples and the null hypothesis is identical distributions. The density function splits into single probability, given as the integral of a normal density from negative infinity to the lower bound and the density of the normal distribution for values larger than the lower bound. The expression involves the distribution function for a G

Free electrical energy from central heating and water supply

Anyone investigated central heating and water installations as a source for the generation of electricity? I have a 6-10 bar pressure on water coming from the public supply. A drop of 4 bar equals potential energy from a 40m waterfall like Queen Mary Falls, Australia . A lot of wasted energy released in your toilet and other places. How do we make use of all that energy? Central heating is delivered at high pressure and leaves at high pressure after a temperature drop between 30 and 50 degrees. Can we use 1 bar, 2 bar or 3 bar to slow down the hot water and generate electrical energy and extract more heat before return? Let's calculate. If 100 cubic meter of heated water falls 40 meter, how much energy could we theoretically get, although practically unreasonable? Approximately 40*10*(100*1000)J or circa 40 kWh. 40 kWh costs circa DKK 90 in Denmark, or USD 13 ... or the price of a LED light bulb. In the US typical household power consumption is about 11,700 kWh

Up-and-down dose response study design

Dose response model and biased coin up-and-down design estimating EV90. A simulation study with a boundary condition and addition of auxilliary data points. PAVA estimators and 2 parameter isotonic regression.  Experimental Spreadsheet from study with reporting sheet and random numbers sheet, latter should be blinded during the course of the experiment. The study had a minimum of 25 patients. #Upper boundary of 30 (mL), steps of 2 (mL) #Estimat for number of patients ANTAL<-NULL MU_1<-NULL MU_2<-NULL MU_3<-NULL EV90<-NULL library(Iso) library(drc) iter<-1 while(iter<=10000){ #MEV MEV<-0.9 #Start dose V<-20 #From previous study start dose is probably around MEV92 P<-0.92-(30-V)*0.014 #Defining vectors for simulation succes<-0 coins<-0 kt<-1 #Increase in dose if failure dosedeltaNeg<-2 dosedeltaPos<-2 while(V[kt]<30){ coins[kt]<-runif(1) if(coins[k

Stata Workflow optimization with GRAPHLOG

The basic workflow recommendations for Stata code and folder hierarchy given by Scott Long  is relative easy to extend with inclusion of graphics in pdf log files. In Stata using point-and-click: *Get links to template files find workflow  *Install graphlog (and updates) find graphlog  Run example: pwd mkdir "C:\WorkFlow" cd "C:\WorkFlow" preserve capture log close log using workflowtest.log, replace *// *//  gunnar 2019/03/10 version 10 clear all macro drop _all set linesize 80 * my commands start here sysuse auto summarize trunk histogram trunk, normal graph export graph.pdf, replace log close restore *Assume installed pdflatex (TeXLive, MikTeX) graphlog using workflowtest, lspacing(1) replace *exit

Set Exchange Server properties

Guided by principles in GDPR you may wish to delete mail older than 3 months from your inbox. This should be a principle for all office workers in both private and public organizations: Mail and electronic content not journalised should be deleted within 3 months. Outlook: Choose 'Account Settings' in File window. Under 'Mail' highlight account and choose 'Edit'. Set time period for storage on Exchange server to 3 months. Repeat as necessary for all accounts. Under each account in Mail window: Right click inbox and chose 'Properties' and click the option to delete all offline content. Repeat as necessary for sent mail and other accounts.

Half-whiskers plot of difference in treatment effect

Below an alternative graphical representation of difference in treatment effects. data ests; missing z; infile datalines delimiter=',' MISSOVER; input behandling time VASe VASs lowere uppere lowers uppers; datalines; 1,0,100,.z,100,100,.z,.z 1,1,78.88888889,.z,78.88888889,120,.z,.z 1,2,53.23552736,.z,53.23552736,100,.z,.z 1,3,27.7394636,.z,27.7394636,72.34042553,.z,.z 1,4,13.67521368,.z,13.67521368,78.72340426,.z,.z 1,5,7.948717949,.z,7.948717949,76.59574468,.z,.z 1,6,7.948717949,.z,7.948717949,77.65957447,.z,.z 1,7,2.948717949,.z,2.948717949,82.9787234,.z,.z 1,8,2.948717949,.z,2.948717949,81.91489362,.z,.z 2,0,.z,100,.z,.z,100,100 2,1,.z,64.40092166,.z,.z,11.11111111,64.40092166 2,2,.z,39.42105263,.z,.z,0,39.42105263 2,3,.z,22,.z,.z,0,22 2,4,.z,17.44444444,.z,.z,0,17.44444444 2,5,.z,9.545454545,.z,.z,0,9.545454545 2,6,.z,6.666666667,.z,.z,0,6.666666667 2,7,.z,5.409356725,.z,.z,0,5.409356725 2,8,.z,2.631578947,.z,.z,0,2.631578947 ;; run; ods graphics on /noborder; t

CUSUM plots for operating time

CUSUM plots have seen application within time series and quality analysis. For both numeric, binary, ordinal and polytomous outcomes we often use residuals based on regression models instead of empirical means. Below I apply a standard unpublished technique for adjusted CUSUM plots for operating times using a series based on information from 174 operations performed by the same team. Simulations are added to illustrate randomness of the phenomena, based on residual standard deviation from a regression analysis. optime<-c(375,205,255,190,195,235,255,235,305,225,225,255,255,274,255,255,195,270,235,235,255,295,195,375,259,185,205,255,205,285,175,155,215,190,195,315,185,255,405,255,235,135,255,215,255,175,215,195,205,195,345,315,195,135,233,135,315,180,155,315,255,200,315,375,255,375,270,255,210,255,185,235,195,225,145,215,230,278,245,280,230,271,270,213,290,325,375,202,214,435,165,175,325,195,435,245,225,220,215,195,285,315,230,275,355,255,175,160,255,165,145,255,225,265,2

Comorbidity indexes in SQL

Generating Elixhauser comorbidity index from Danish National Health Register as relational database. ( ICD 10 Coding  in SAS) A lookup-table based version of Charlson comorbidity index I made in SQL. A similar approach can be applied to Elixhauser. SELECT V_CPR, MAX(EI1)+MAX(EI2)+MAX(EI3)+MAX(EI4)+MAX(EI5)+ MAX(EI6)+MAX(EI7)+MAX(EI8)+MAX(EI9)+MAX(EI10)+ MAX(EI11)+MAX(EI12)+MAX(EI13)+MAX(EI14)+MAX(EI15)+ MAX(EI16)+MAX(EI17)+MAX(EI18)+MAX(EI19)+MAX(EI20)+ MAX(EI21)+MAX(EI22)+MAX(EI23)+MAX(EI24)+MAX(EI25)+ MAX(EI26)+MAX(EI27)+MAX(EI28)+MAX(EI29)+MAX(EI30)+MAX(EI31) AS Elixhauser FROM (SELECT V_CPR, -- Congestive Heart Failure CASE WHEN DIAG LIKE 'DI099%' OR DIAG LIKE 'DI110%' OR DIAG LIKE 'DI130%' OR DIAG LIKE 'DI132%' OR DIAG LIKE 'DI255%' OR DIAG LIKE 'DI420%' OR DIAG LIKE 'DI425%' OR DIAG LIKE 'DI426%' OR DIAG LIKE 'DI427%' OR DIAG LIKE 'DI428%' OR DIAG LIKE 'DI429%' OR D

Produce animation in R illustrating uniform and Gaussian density

#Number of simulations and chunk-size sN <- 265 sM <- 10 #Line from 0 to 1 will be divided into 40 bins bin <- seq(0 + (1 / 40) / 2, 1 - (1 / 40) / 2, 1 / 40) bincnt <- rep(0, 40) #Vertical distance between dots icr <- 0.01 printx <- NULL printy <- NULL for (i in 1:sN) { if (i < 15) sMI <- 1 if (i > 14) sMI <- sM sim <- runif(sMI) Sys.sleep(0.1) pointx <- rep(0, sMI) pointy <- rep(0, sMI) for (j in 1:sMI) { idx <- sim[j] %/% (1 / 40) + 1 pointx[j] <- bin[idx] pointy[j] <- bincnt[idx] + 1 pointy[j] <- max(icr, (pointy[j] * icr) %% 1) %% 1 bincnt[idx] <- bincnt[idx] + 1 } printx <- c(printx, pointx) printy <- c(printy, pointy) plot( seq(0, 1, 0.01), dunif(seq(0, 1, 0.01)), type = "l", xlab = "value", ylab = "Uniform density", ylim = c(0, 1.2), col = "red", lwd = 5 ) points(printx, printy, pc