L4 Enterprises LLC
Open Module 1 and copy the function into the “Developer” of your own spreadsheet
Statistical Consulting
From time to time I have had clients wanting a quick assessment of the “correlation” between two Categorical variables. For example, after producing an orthogonal design for a conjoint analysis, a client might decide that he does not find the low level of one variable to be credible when combined with the high level on another and wants to know the implication of altering this one card without re-creating the design. I have known researchers to approach this by creating a correlation matrix in excel. However, using a correlation on categorical data can be troublesome. A less wrong way of approaching this would be to create a Cramer’s V matrix. However, this is a function that does not exist in Excel. I programmed a function to do just this and am putting it out on the internet. You may use this function however you would like. If you notice any errors in the function or program any improvement, I would ask you to notify L4 enterprises at: info@L4Statistics.com.
To utilize this function please download the excel file by clicking HERE. Upon opening the file and ensuring that macros are enabled you may use this sheet itself or copy the function into your own sheet by going to “Developer” and opening visual basic:
If macros are enabled on your spreadsheet, you may use =Cramv(field1,field2) Just like you would use =Correl(field1, field2).
Example 1 demonstrates the weakness of using a correlation to assess the relationship of two categorical variables. In this case a three level variable is perfectly related to a two level variable in such a way that whenever the three level variable is 1 or 3 the two level variable is 0 otherwise the two level variable is 1. If we examine this relationship using correlations, we will find no correlation which would suggest no relationship. The perfect relationship is captured in the Cramer's V Matrix.
The second example demonstrates various equivalencies between Cramer’s V and a correlation when the variables are two-level. Var1 and Var2 are correlated at 0.1. Because, they are dummy coded as "0" and "1" we would expect the same relationship in the Cramer's V. Var3 is perfectly but negatively correlated with Var2 producing a correlation of -1. The same strength is captured in Cramer's V except because the order of a categorical variable is meaningless, the negative is lost. Finally, Var4 and Var3 are correlated at -0.3. The Cramer's V relationship is of the same strength but looses the negative value.
If you distrust opening my macros, the code for the function is:
Function Cramv(field1 As Range, field2 As Range)
Dim arr1() As Variant 'First Array
Dim arr2() As Variant 'Second Array
Dim lCtr As Long, lCount As Long 'Indexs
Dim iCtr As Integer 'index
Dim col1 As New Collection 'Collection to collect unique values of array1
Dim col2 As New Collection 'Collection to collect unique values of array1
Dim observed() As Variant
Dim Rowtotal() As Variant
Dim Columntotal() As Variant
Dim nvans1 As Long
Dim nvans2 As Long
Dim exptected() As Variant
Dim Xsq As Double
Dim k As Long
'Finding length of arrays
Dim lStartPoint As Long
Dim lEndPoint As Long
arr1 = field1.Value
arr2 = field2.Value
lStartPoint = LBound(arr1)
lEndPoint = UBound(arr1)
mStartPoint = LBound(arr2)
mendpoint = UBound(arr2)
ReDim arr1(1 To lEndPoint)
ReDim arr2(1 To lEndPoint)
arr1 = field1.Value
arr2 = field2.Value
'Testing to ensure equal size ranges
If lEndPoint <> mendpoint Then
MsgBox "Please ensure there are an equal number of cells in both ranges"
Exit Function
End If
'Find the inique valurs for the first array
For lCtr = lStartPoint To lEndPoint
vItem = arr1(lCtr, 1)
sIndex = CStr(vItem)
'first element, add automatically
If lCtr = lStartPoint Then
col1.Add vItem, sIndex
ReDim vAns1(lStartPoint To lStartPoint) As Variant
vAns1(lStartPoint) = vItem
Else
On Error Resume Next
col1.Add vItem, sIndex
If Err.Number = 0 Then
lCount = UBound(vAns1) + 1
ReDim Preserve vAns1(lStartPoint To lCount)
vAns1(lCount) = vItem
End If
End If
Err.Clear
Next lCtr
'Find the inique valurs for the second array
For lCtr = mStartPoint To mendpoint
vItem = arr2(lCtr, 1)
sIndex = CStr(vItem)
'first element, add automatically
If lCtr = mStartPoint Then
col2.Add vItem, sIndex
ReDim vAns2(mStartPoint To mStartPoint) As Variant
vAns2(mStartPoint) = vItem
Else
On Error Resume Next
col2.Add vItem, sIndex
If Err.Number = 0 Then
lCount = UBound(vAns2) + 1
ReDim Preserve vAns2(mStartPoint To lCount)
vAns2(lCount) = vItem
End If
End If
Err.Clear
Next lCtr
'finding length of vAns
nvans1 = UBound(vAns1)
nvans2 = UBound(vAns2)
'finding ovserved cells of for chisq
ReDim observed(1 To nvans2, 1 To nvans1) As Variant
ReDim Expected(1 To nvans2, 1 To nvans1) As Variant
For lCount = 1 To nvans1
For iCtr = 1 To nvans2
observed(lCount, iCtr) = 0
Next iCtr
Next lCount
For lCtr = 1 To mendpoint
For lCount = 1 To nvans2 'rows
For iCtr = 1 To nvans1 'columns
If arr1(lCtr, 1) = vAns1(iCtr) Then
If arr2(lCtr, 1) = vAns2(lCount) Then
observed(lCount, iCtr) = observed(lCount, iCtr) + 1
End If
End If
Next iCtr
Next lCount
Next lCtr
'finding row and column totals
ReDim Rowtotal(nvans1) As Variant
ReDim Columntotal(nvans2) As Variant
For lCtr = 1 To nvans1
Rowtotal(lCtr) = 0
Next lCtr
For lCtr = 1 To nvans2
Columntotal(lCtr) = 0
Next lCtr
For lCtr = 1 To nvans1
For lCount = 1 To nvans2
Rowtotal(lCtr) = Rowtotal(lCtr) + observed(lCount, lCtr)
Next lCount
Rowtotal(lCtr) = Rowtotal(lCtr) / mendpoint
Next lCtr
For lCtr = 1 To nvans2
For lCount = 1 To nvans1
Columntotal(lCtr) = Columntotal(lCtr) + observed(lCtr, lCount)
Next lCount
Next lCtr
Cramv = Columntotal(1)
'finding expected values and Chisq
For lCtr = 1 To nvans2
For lCount = 1 To nvans1
Expected(lCtr, lCount) = Columntotal(lCtr) * Rowtotal(lCount)
Xsq = Xsq + (((observed(lCtr, lCount) - Expected(lCtr, lCount)) ^ 2) / Expected(lCtr, lCount))
Next lCount
Next lCtr
'Finding value of k
k = nvans1
If nvans2 < k Then k = nvans2
Cramv = (Xsq / (mendpoint * (k - 1))) ^ 0.5
End Function