Machine Learning With IBM SPSS - Clustering

Blog

Machine Learning With IBM SPSS - Clustering

Machine Learning Using IBM SPSS – Two-Step Cluster

In this part of the machine learning series using SPSS, the clustering techniques will be introduced. SPSS presents us with three types of clustering analysis: Two-step, K-Means and Hierarchical Cluster. This blog is on the Two-step Cluster technique. Two-Step combines both the K-Means and Hierarchical Cluster, and pre-clustering is first performed on the data to group then it runs the hierarchical algorithm. One advantage of Two-Step is that large datasets that would involve many steps in hierarchical methods can be done easily in Two-Step Cluster at it makes use of quick cluster algorithm. Actually, this is not the first blog on Machine Learning using SPSS so ensure you try to check the previous blogs: Linear Regression and Logistic Regression.

The position of engine of a vehicle will be used to evaluate some variables in the automobile dataset that we have been using for this series.

The TwoStep Cluster is located as thus: Analyze>>>Classify>>>TwoStep Cluster…

Two Step Cluster Analysis - SPSS

When you click on the TwoStep Cluster… the page below should appear where you need to select your variables for clustering. Two types of variables are specified: Categorical Variables and Continuous Variables. What this implies is that you must place your categorical independent variables in the Categorical Variables Box and the Continuous Independent Variables in the Continuous Variables Box. This is because SPSS treat them differently. If you want to evaluate how SPSS does the magic, then do it otherwise and compare your results; the difference will be obvious!

Two Step Cluster Analysis- SPSS

In this case, we are putting the “horsepowerbinned” into the categorical box and “numofcylinders_tr” and “enginesize” into the continuous variables box. The goal is to evaluate how these parameters dictate the position of the engine of a vehicle. We are leaving other options as default, for instance, SPSS is deciding the number of cluster (you may also select yours by clicking Specify fixed in the Number of Clusters category).

Two Step Cluster Analysis - SPSS

Then go to the Output… option where you select the evaluation variable. Evaluation variable in this case is the “enginelocation_tr.” Evaluation variable can be more than one variable. In addition, you may ask SPSS to show the cluster group of each of the sample by checking box in Working Data File. In this case we are not checking the box. Then click Continue.

Two Step Cluster Output - SPSS

Then the (brief) result shows up with the Model Summary which comprises of the number of inputs (the categorical and the continuous variables) and the number of clusters as decided by SPSS. In addition, the cluster quality (more or else like accuracy of the model) is shown; actually, in a Good region (luckily for us!). One may ask what if the cluster quality is Poor then I suggest you change the number of cluster or variables.

Two Step Cluster Output - SPSS

You can get to know more details about the cluster by double click on the result shown above then the Figure below shows. There are two dropdown view options by default you have: Model Summary and Cluster Sizes.

Model Viewer - SPSS

We are actually concerned with the Predictor Importance and the Clusters. There are many options that you can explore, and I strongly encourage you to try each and every options to know their functions and with that you gain a much better understanding of the usage for SPSS for TwoStep Cluster.

Model Viewer - SPSS

From the Figure below, “horsepowerbinned” has the highest importance followed by the “enginesize.” One interesting about this SPSS algorithm is that it shows you details of all the available clusters for you to select. This result is dependent on the maximum number of clusters. By clicking on each of the cluster you can observe their distribution.

Model Viewer - SPSS

Model Viewer - SPSS

Currently, the evaluation variable is not part of the details about to study its distribution as well. To do that, you click on Display. Then the Figure below show display. By default, the Evaluation Fields will be unchecked therefore you have to check it then click Ok.

Model Viewer - SPSS

When this is done you should see the Figure below containing the evaluation variable.

Model Viewer - SPSS

One of the features I found so handy using this technique is Cells show relative distributions as indicated below (the second option is for absolute distribution which I don’t really like because it does not enable me to see how my model really fits the original).

Model Viewer - SPSS

Conclusion

In this blog, the Two-step Cluster of IBM SPSS has been introduced to study how a (or some) variables impacts another variable(s). The predictor importance and the clusters are presented.


← Back


Comments

No comments added


Leave a Reply

Success/Error Message Goes Here
Do you need help with your academic work? Get in touch

AcademicianHelp

Your one-stop website for academic resources, tutoring, writing, editing, study abroad application, cv writing & proofreading needs.

Get Quote
TOP