Skip to main content


Showing posts from 2019

Graphics in SAS SGPLOT illustrating ANOVA analysis results.

Output and graphics from statistical programming packages are often time-consuming to read and interpret. In peer-reviewed publications you usually provide both a written assessment, tables and graphics illustrating data and analysis results. ANOVA analysis is still a very common analysis technique and it is possible to beautify the output from analysis using PROC SGPLOT

ods graphics;
proc format lib=work;
value timevar
12='0-12 hrs'
18='12-18 hrs'
24='18-24 hrs'
32='Cumulated 0-24 hrs'
proc sql;
  create table estimates
    (  Treatment char(12) label='Treatment Group'
     , Time      num      label='Visit number'
     , Time2     num      label='Visit number'
     , TimeVar   char(18) label='Visit number'
     , Mark      char(8)  label='p values'
     , Est       num      label='Est'
     , LCL       num      label='LCL'
     , UCL       num      label='UCL'
insert into …

T tests in censored normal distribution

T-tests assume normal distributed random variates. We experience designs in which data take on a particular value and are otherwise normal distributed on a half axis. It might be the case in experiments in which a regimen is only applied due to symptoms or the participant's request. In such case a censored normal distribution assumption will increase the strength of the experiment, i.e. statistical power will be increased and assumptions of an underlying normal distribution will be easier to justify.
The suggested test is based on the chi-square likelihood ratio test. The example is two independent groups of identical independently distributed samples and the null hypothesis is identical distributions. The density function splits into single probability, given as the integral of a normal density from negative infinity to the lower bound and the density of the normal distribution for values larger than the lower bound.
The expression involves the distribution function for a Gaussia…

Free electrical energy from central heating and water supply

Anyone investigated central heating and water installations as a source for the generation of electricity? I have a 6-10 bar pressure on water coming from the public supply.
A drop of 4 bar equals potential energy from a 40m waterfall like Queen Mary Falls, Australia. A lot of wasted energy released in your toilet and other places.
How do we make use of all that energy? Central heating is delivered at high pressure and leaves at high pressure after a temperature drop between 30 and 50 degrees.
Can we use 1 bar, 2 bar or 3 bar to slow down the hot water and generate electrical energy and extract more heat before return? Let's calculate. If 100 cubic meter of heated water falls 40 meter, how much energy could we theoretically get, although practically unreasonable?
Approximately 40*10*(100*1000)J or circa 40 kWh.
40 kWh costs circa DKK 90 in Denmark, or USD 13
... or the price of a LED light bulb.

In the US typical household power consumption is about 11,700 kWh each y…

Up-and-down dose response study design

Dose response model and biased coin up-and-down design estimating EV90. A simulation study with a boundary condition and addition of auxilliary data points. PAVA estimators and 2 parameter isotonic regression. 

Experimental Spreadsheet from study with reporting sheet and random numbers sheet, latter should be blinded during the course of the experiment. The study had a minimum of 25 patients.

#Upper boundary of 30 (mL), steps of 2 (mL) #Estimat for number of patients ANTAL<-NULL MU_1<-NULL MU_2<-NULL MU_3<-NULL EV90<-NULL library(Iso) library(drc) iter<-1 while(iter<=10000){ #MEV MEV<-0.9 #Start dose V<-20 #From previous study start dose is probably around MEV92 P<-0.92-(30-V)*0.014 #Defining vectors for simulation succes<-0 coins<-0 kt<-1 #Increase in dose if failure dosedeltaNeg<-2 dosedeltaPos<-2 while(V[kt]<30){ coins[kt]<-runif(1) if(coins[kt]>P[kt]…

Stata Workflow optimization with GRAPHLOG

The basic workflow recommendations for Stata code and folder hierarchy given by Scott Long is relative easy to extend with inclusion of graphics in pdf log files.

In Stata using point-and-click:

*Get links to template files
find workflow 

*Install graphlog (and updates)
find graphlog 

Run example:

mkdir "C:\WorkFlow"
cd "C:\WorkFlow"

capture log close
log using workflowtest.log, replace

*//  gunnar 2019/03/10

version 10
clear all
macro drop _all
set linesize 80

* my commands start here
sysuse auto
summarize trunk
histogram trunk, normal
graph export graph.pdf, replace

log close

*Assume installed pdflatex (TeXLive, MikTeX)
graphlog using workflowtest, lspacing(1) replace

Set Exchange Server properties

Guided by principles in GDPR you may wish to delete mail older than 3 months from your inbox. This should be a principle for all office workers in both private and public organizations: Mail and electronic content not journalised should be deleted within 3 months.

Outlook: Choose 'Account Settings' in File window. Under 'Mail' highlight account and choose 'Edit'. Set time period for storage on Exchange server to 3 months. Repeat as necessary for all accounts. Under each account in Mail window: Right click inbox and chose 'Properties' and click the option to delete all offline content. Repeat as necessary for sent mail and other accounts.

Half-whiskers plot of difference in treatment effect

Below an alternative graphical representation of difference in treatment effects.

data ests; missing z; infile datalines delimiter=',' MISSOVER; input behandling time VASe VASs lowere uppere lowers uppers; datalines; 1,0,100,.z,100,100,.z,.z 1,1,78.88888889,.z,78.88888889,120,.z,.z 1,2,53.23552736,.z,53.23552736,100,.z,.z 1,3,27.7394636,.z,27.7394636,72.34042553,.z,.z 1,4,13.67521368,.z,13.67521368,78.72340426,.z,.z 1,5,7.948717949,.z,7.948717949,76.59574468,.z,.z 1,6,7.948717949,.z,7.948717949,77.65957447,.z,.z 1,7,2.948717949,.z,2.948717949,82.9787234,.z,.z 1,8,2.948717949,.z,2.948717949,81.91489362,.z,.z 2,0,.z,100,.z,.z,100,100 2,1,.z,64.40092166,.z,.z,11.11111111,64.40092166 2,2,.z,39.42105263,.z,.z,0,39.42105263 2,3,.z,22,.z,.z,0,22 2,4,.z,17.44444444,.z,.z,0,17.44444444 2,5,.z,9.545454545,.z,.z,0,9.545454545 2,6,.z,6.666666667,.z,.z,0,6.666666667 2,7,.z,5.409356725,.z,.z,0,5.409356725 2,8,.z,2.631578947,.z,.z,0,2.631578947 ;; run; ods graphics on /noborder; title '…

CUSUM plots for operating time

CUSUM plots have seen application within time series and quality analysis.

For both numeric, binary, ordinal and polytomous outcomes we often use residuals based on regression models instead of empirical means.

Below I apply a standard unpublished technique for adjusted CUSUM plots for operating times using a series based on information from 174 operations performed by the same team.

Simulations are added to illustrate randomness of the phenomena, based on residual standard deviation from a regression analysis.


Comorbidity indexes in SQL

Generating Elixhauser comorbidity index from Danish National Health Register as relational database. (ICD 10 Coding in SAS)

A lookup-table based version of Charlson comorbidity index I made in SQL. A similar approach can be applied to Elixhauser.

MAX(EI26)+MAX(EI27)+MAX(EI28)+MAX(EI29)+MAX(EI30)+MAX(EI31) AS Elixhauser
-- Congestive Heart Failure

Produce animation in R illustrating uniform and Gaussian density

#Number of simulations and chunk-size sN <- 265 sM <- 10 #Line from 0 to 1 will be divided into 40 bins bin <- seq(0 + (1 / 40) / 2, 1 - (1 / 40) / 2, 1 / 40) bincnt <- rep(0, 40) #Vertical distance between dots icr <- 0.01 printx <- NULL printy <- NULL for (i in 1:sN) { if (i < 15) sMI <- 1 if (i > 14) sMI <- sM sim <- runif(sMI) Sys.sleep(0.1) pointx <- rep(0, sMI) pointy <- rep(0, sMI) for (j in 1:sMI) { idx <- sim[j] %/% (1 / 40) + 1 pointx[j] <- bin[idx] pointy[j] <- bincnt[idx] + 1 pointy[j] <- max(icr, (pointy[j] * icr) %% 1) %% 1 bincnt[idx] <- bincnt[idx] + 1 } printx <- c(printx, pointx) printy <- c(printy, pointy) plot( seq(0, 1, 0.01), dunif(seq(0, 1, 0.01)), type = "l", xlab = "value", ylab = "Uniform density", ylim = c(0, 1.2), col = "red", lwd = 5 ) points(printx, printy, pch = 20) }…