Skip to main content

T tests in censored normal distribution

T-tests assume normal distributed random variates. We experience designs in which data take on a particular value and are otherwise normal distributed on a half axis. It might be the case in experiments in which a regimen is only applied due to symptoms or the participant's request. In such case a censored normal distribution assumption will increase the strength of the experiment, i.e. statistical power will be increased and assumptions of an underlying normal distribution will be easier to justify.
The suggested test is based on the chi-square likelihood ratio test. The example is two independent groups of identical independently distributed samples and the null hypothesis is identical distributions. The density function splits into single probability, given as the integral of a normal density from negative infinity to the lower bound and the density of the normal distribution for values larger than the lower bound.
The expression involves the distribution function for a Gaussian variate due to normalization. Likelihood ratio value can be calculated in SAS using PROC LIFEREG. P-value is then calculated with a lookup of a chisq quantile value.

%macro tobittest(var_name,group_name,datafile);
/*https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sect035.htm;*/
data analysedata;
set &datafile;
lower=&var_name.;
if missing(&var_name.) then delete;
run;
proc lifereg data=analysedata outest=OUTEST(keep=_scale_);
  class &group_name.;
      model (lower, &var_name.) = &group_name. / d=normal;
  probplot ppout npintervals=simul;
      inset;
      output out=OUT xbeta=Xbeta;
  ods output ParameterEstimates=PE;
  ods select ParameterEstimates FitStatistics;
run;
data predict;
      drop lambda _scale_ _prob_;
      set out;
      if _n_ eq 1 then set outest;
      lambda = pdf('NORMAL',Xbeta/_scale_)
               / cdf('NORMAL',Xbeta/_scale_);
      Predict = cdf('NORMAL', Xbeta/_scale_)
                * (Xbeta + _scale_*lambda);
  res=(&var_name.-predict)/_scale_;
      label Xbeta='MEAN OF UNCENSORED VARIABLE'
            Predict = 'MEAN OF CENSORED VARIABLE';
run;
title1 "&var_name.";
title2 "Tobit regression";
proc univariate data=predict(where=(NOT(missing(&var_name.))));
var res;
qqplot /normal(mu=est sigma=est);
ods select qqplot;
run;
title2 "Test for differences between groups";
proc print data=PE(where=(Parameter EQ '&group_name.'));
run;
title2 "Summary statistics across groups";
proc tabulate data=analysedata;
class &group_name.;
var &var_name.;
table &group_name. all,&var_name.*(min p25 median p75 max);
run;
data zero;
set analysedata;
zero=0;
if &var_name. EQ 0 then zero=1;
run;
title2 "Zero values (censored values)";
proc tabulate data=zero;
class &group_name. zero;
table &group_name. all,zero all;
run;
title1;
%mend;

ods pdf;
%tobittest(varnameOnFile,groupvarOnFile,datanameFile);

In R the package censReg by Arne Henningsen should perform a similar calculation, however I have found lack of convergence and thus recommend using SAS.

Comments

Popular posts from this blog

HackRF on Windows 8

This technical note is based on an extract from thread. I have made several changes and added recommendations. I have experienced lot of latency using GnuRadio and HackRF on Pentoo Linux, so I wanted to try out GnuRadio on Windows.



HackRF One is a transceiver, so besides SDR capabilities, it can also transmit signals, inkluding sweeping a given range, uniform and Gaussian signals. Pentoo Linux provides the most direct access to HackRF and toolboxes. Install Pentoo Linux on a separate drive, then you can use osmocom_siggen from a terminal to transmit signals such as near-field GSM bursts, which will only be detectable within a meter.









Installation of MGWin and cmake: Download and install the following packages:
- MinGW Setup (Go to the Installer directory and download setup file)
- CMake (I am using CMake 3.2.2 and I installed it in C:\CMake, this path is important in the commands we must send in the MinGW shell)
Download and extract the packages respectively in the path C:\MinGW\msys\…

Example: Beeswarm plot in R

library(foreign)

data <- read.dta("C:/Users/hellmund/Documents/MyStataDataFile.dta")

names(data)

install.packages('beeswarm')

library(beeswarm)

levels(data$group)

png(file="C:/Users/hellmund/Documents/il6.png", bg="transparent")

beeswarm(data$il6~data$group,data=data, method=c("swarm"),pch=16,pwcol=data$Gender,xlab='',ylab='il6',ylim=c(0,20))

legend('topright',legend=levels(data$Gender),title='Gender',pch=16,col=2:1)

boxplot(data$il6~data$group, data=data, add = T, names = c("","",""), col="#0000ff22")

dev.off()

Alder/korrekt århundrede udfra cpr nummer

De fleste, der arbejder med registre eller databaser, står ofte med problemstillingen, at alder er uoplyst, medens cpr-nummer er kendt. Hvordan regner man den ud?

Følgende regel er gældende: Hvis syvende ciffer er 0, 1, 2 eller 3 er man født i det 20. århunderede (1900-tallet) Ligeledes, hvis syvende ciffer er 4 eller 9, og årstallet (femte og sjette ciffer) er større end eller lig 37.

Endelig er man født i det 19. århundrede (1800-tallet) hvis syvende ciffer er 5, 6, 7 eller 8 og årstallet er større end eller lig 58.

Nedenfor finder du eksempel i SAS kode: En lille makro, der udover fødselsdato også udregner køn samt den præcise alder givet datovariabel.

Kilde: Opbygning af CPR nummeret, cpr.dk


proc format library=work;
value gender
0="Female"
1="Male"
;
run;

%macro agefromCPR(cpr,datevar=inddto,birthvar=birth,agevar=age);
dy_temp=input(substrn(&cpr,1,2),2.);
mt_temp=input(substrn(&cpr,3,2),2.);
yr_temp=input(substrn(&cpr,5,2),2.);
lr_temp=inp…