Skip to main content

Preparing data. A basic example using R-base

The data editing and formatting phase is one of the most important steps in fitting a structural equation model (SEM), whether it is covariance based as in SPSS AMOS and LISREL or a partial least squares model as in smartPLS and warpPLS.
Often you receive data from surveys with a plenitude of different data types: numeric, integer, ordinal, factor, indicator, text comments.
One of the first steps is to decide what to include in an analysis. Should we transform data and perform analysis of marginal distributions before running a CB-SEM, study relations between ordinal variables with LISREL or just clean up data and do a robust exploratory PLS-SEM?
We are going to scrap unused variables and perform tasks such as imputation or calculation of indicators of factor levels beforehand, split data into several sheets or files with appropriate keys or identifiers. Then comes the formatting issue. Lots of survey data come as  SPSS sav files or in the SAS sas7dbat file format, and if your analyzing clinical data you might encounter Stata's dta format. You can export to other formats from these commercial packages or use a program like Stat/Transfer.
Ordinary ascii based formats might do, when import of csv files straight from Excel or OpenOffice fails. Using a simple file format also enables you to run several types of covariance based models or partial least squares regressions. Then why not use simple txt files?

Here is an example of R-base code which will convert an Excel csv file to a txt file.
Ready for analysis in a software package such as LISREL.

#Print current directory

#Change to another directory through interactive dialog window
dirPath<-getwd font="">

#Choose the CSV file
filePath <- file.choose="" font="">

#Check options for read.csv (US/European comma format etc)
CSVdata<- filepath="" font="" read.csv="" sep=";">

#View data

#Save data in current directory
txtPath<-paste filepath="" fixed="TRUE,)[[1]][1]," font="" sep="" split="." strsplit="" txt="">
write.table(CSVdata, txtPath, sep="\t", eol="\r\n", row.names=FALSE)


Popular posts from this blog

HackRF on Windows 8

This technical note is based on an extract from thread. I have made several changes and added recommendations. I have experienced lot of latency using GnuRadio and HackRF on Pentoo Linux, so I wanted to try out GnuRadio on Windows.

HackRF One is a transceiver, so besides SDR capabilities, it can also transmit signals, inkluding sweeping a given range, uniform and Gaussian signals. Pentoo Linux provides the most direct access to HackRF and toolboxes. Install Pentoo Linux on a separate drive, then you can use osmocom_siggen from a terminal to transmit signals such as near-field GSM bursts, which will only be detectable within a meter.

Installation of MGWin and cmake: Download and install the following packages:
- MinGW Setup (Go to the Installer directory and download setup file)
- CMake (I am using CMake 3.2.2 and I installed it in C:\CMake, this path is important in the commands we must send in the MinGW shell)
Download and extract the packages respectively in the path C:\MinGW\msys\…

Example: Beeswarm plot in R


data <- read.dta("C:/Users/hellmund/Documents/MyStataDataFile.dta")





png(file="C:/Users/hellmund/Documents/il6.png", bg="transparent")

beeswarm(data$il6~data$group,data=data, method=c("swarm"),pch=16,pwcol=data$Gender,xlab='',ylab='il6',ylim=c(0,20))


boxplot(data$il6~data$group, data=data, add = T, names = c("","",""), col="#0000ff22")

Example: Business cards typeset with LaTeX

So you enjoy the quality of a professional typesetting system? You got Avery labels, a working MikTeX and the ticket package installed...
You might find some assistance from a half criminal paranoid zealot system administrator, willing to guide you through a dinosaur kingdom of TeX ... but that kind of assistance might also just leave you with nothing.

It was easy to get the layout of the labels with the option zw32010, but how about page margins? I tried to set things straight with the layouts package (\usepackage{layouts}\currentpage \pagedesign), but then there was still some unwanted white space and margins...

To make things less complicated I decided to make a single card. The solution is a hack because it needs customization (with voffset and hoffset as you see n the TeX code below) but the adjustment is more straightforward, especially if you use the boxed option with ticket.

The card was converted to png with Ghostscript and I could easily print the business cards with Averys …