Skip to main content

Posts

Showing posts from 2016

Real world split-plot designs

Google Earth picture from a blog on statistics. A real world example near Christchurch (NZ) of a split-plot design. Today things have completely changed on location as the forest has grown considerably. Google Earth coordinate link.


Percentage of Danish newmagazine stories with words 'Moon' and 'Sun'.

Percentage of Danish newmagazine stories with words 'Moon' and 'Sun' from 1750 until 2014.


Source: http://labs.statsbiblioteket.dk/smurf/
Names of presidents and around 1776 a British general.


Bar charts in polar coordinates using R package ggplot2

R code providing both an ordinary bar plot and a plot based on polar coordinates.

library(RColorBrewer)
library(ggplot2)
Geography = c(rep('Ukraine in general',3),rep('South-Eastern Ukraine',3),rep('Lower-Southern Ukraine',3),rep('Central Ukraine/Kiev',3),rep('Western Ukraine',3))

Answer = rep(c("Ukraine and Russia must unite into a single state", "Ukraine and Russia must be independent, but friendly states - with open borders, without visas and customs houses","Relations should be the same as with other states - with closed borders, visas and customs houses"))

numbers = c(12.5,68,14.7,25.8,72.2,2,19.4,63.8,10.5,5.4,69.7,20.9,0.7,66.7,24)

df = data.frame(Geography,Answer,numbers)

#install.packages("extrafont")
library(extrafont)

loadfonts(device="win")

ggplot(df, aes(x = Geography)) + geom_bar(aes(weight=numbers, fill = Answer), position = 'fill') + scale_y_continuous("", breaks=NULL) + s…

Alder/korrekt århundrede udfra cpr nummer

De fleste, der arbejder med registre eller databaser, står ofte med problemstillingen, at alder er uoplyst, medens cpr-nummer er kendt. Hvordan regner man den ud?

Følgende regel er gældende: Hvis syvende ciffer er 0, 1, 2 eller 3 er man født i det 20. århunderede (1900-tallet) Ligeledes, hvis syvende ciffer er 4 eller 9, og årstallet (femte og sjette ciffer) er større end eller lig 37.

Endelig er man født i det 19. århundrede (1800-tallet) hvis syvende ciffer er 5, 6, 7 eller 8 og årstallet er større end eller lig 58.

Nedenfor finder du eksempel i SAS kode: En lille makro, der udover fødselsdato også udregner køn samt den præcise alder givet datovariabel.

Kilde: Opbygning af CPR nummeret, cpr.dk


proc format library=work;
value gender
0="Female"
1="Male"
;
run;

%macro agefromCPR(cpr,datevar=inddto,birthvar=birth,agevar=age);
dy_temp=input(substrn(&cpr,1,2),2.);
mt_temp=input(substrn(&cpr,3,2),2.);
yr_temp=input(substrn(&cpr,5,2),2.);
lr_temp=inp…

Predicting height from growth curves

My eldest daughter said: I hope my height will improve to become 170cm (5'7'')

Let's have a look at growth curves. Growth accelerates several times during adolescence, so we can only guess. Above a growth curve for Danish females and a fair prediction she will become disappointed (by an inch). A predicted median height is 173cm (5'8'') calculated using the height of her mother and my own height ((father's height minus 13) + mothers height)/2.
More precise predictions can be made through combination of anthropometric measures.

Generate indicators in SAS based on factor variables

Generating indicators from factor variable defined in SAS datafile using a generic procedure in three steps. The code is easy to modify to suite more complicated needs.
Produce table with factor levelsUse data step to generate programAppend indicators by running SAS data step using %includeAn example of generative programming...

proc sql; 
  create table levels_data_set_name as 
  select distinct factor_variable_name as indicator_name 
  from input_data_set_name; 
quit; 

data _NULL_; 
  file 'C:\PATH_TO_GENERATED_SAS_PROGRAM\indicators.sas';     
  put 'data somelib.indicator_enriched_data_set;'; 
  put 'set input_data_set_name;'; 
run; 

data _NULL_; 
set levels_data_set_name; 
  file 'C:\PATH_TO_GENERATED_SAS_PROGRAM\indicators.sas' MOD; 
  length char_var $256; 
  char_var='ind'||strip(indicator_name)||'='; 
  put char_var; 
  char_var='('||strip(factor_variable_name)||' EQ '||strip(indicator_name)||'");'; 
  put char_var; 
ru…

Recency, Frequency, Monetary (RTF) variable generation using Danish National Health Register data.

Description Below adaptable SAS code for generation of recency, frequency and monetary variables.
Algorithm may be applied in a lot of contexts in which records contain dates ('day of entry', 'day of exit', 'day of visit') id number and maybe even revenue or cost.

Citation Please cite this code as:
Laier, G.H. (2016) Recency, Frequency and Monetary SAS programming script  [computer software]. Denmark. Link: http://hellmund.blogspot.dk/2016/02/recency-frequency-monetary-rtf-variable.html

Thanks!
Gunnar Hellmund Laier,
PhD, MSc

Explanation In this context we form variable for analyses of Danish National Register data and form variables containing information on contacts, hospitalizations and days in hospital 14, 30, 91 and 180 days before a hospitalization or contact.

Key variables:
cpr (security number), pattype (patient type, in- or outpatient), inddto (day of entry), uddto (day of exit), ambdto (day of visit).

Data step program data calc.RFMdata(drop=dto_h…

Statistics site collection

General recommendations and programming examples:
http://support.sas.com/documentation/ (syntax lookup)
http://www.ats.ucla.edu/stat/ (combined best-practice/examples)
http://www.rseek.org

Emphasize on applications within psychology:
http://www.personality-project.org/ (development of R-package, documentation)
http://psych.unl.edu/psycrs/statpage/ (several small GUIs and dokumentation)

Structural equation modelling (both variance and covariance based):
http://www.ssicentral.com/lisrel/
http://www.pls-sem.com/ (with link to smartPLS)
https://youtu.be/Uwo2cBtT_xo (HTMT criterion)
https://www.youtube.com/user/Gaskination

Standards:

Publishing:
http://www.apastyle.org/learn/tutorials/basics-tutorial.aspx

Charlson's Comorbidity Index as SAS Macro

A Canadian version of Charlson's Comorbidity Index implemented as a SAS macro for application on Danish national register data.

%charlson_can

The macro traverses main diagnosis and bidiagnoses recorded at time of hospitalization, defines 17 indicators and calculates an unweighted comorbidity index.

A patient's CCI is calculated across records using indicators (and weights) from all previous hospitalizations. CCI is a growing index.


***
Here is an additional and more traditional calculation of Charlson's Comorbidity Index including 19 indicators and weights from 1 to 6. Charlson's Comorbidity Index is defined as the sum of all indicators. The calculation assumes access to all diagnoses on a single record, which often includes both a main diagnosis, several bi- and auxillary diagnoses.

%charlson

Reference:
Cross-National Comparative Performance of Three Versions of the ICD-10 Charlson Index
Sundararajan et al, Medical Care, V45, N12, December 2007 (1210-1215)