Final Project - Skier Data

Exploration of the dataset

The data set consists of 989 records with 16 data fields. The data appears to be related to the ski industry because of the references to ski resorts. The data fields include Rating, Survey, Prize, Punishment, Aspen, Snowmass, Breckenridge, Jeystone, ABasin, Loveland, CrestedButte, Vail, Silverton, WinterPark, MaryJane, and Eldora. The Rating field is a float with values from 0 to 1. The Survey field is an integer with values of either 1 or 20. The Prize Field is an integer with values of either 1 or 10. The Punishment field is an integer with values of either 30 or 50. The remaining fields are binary, either 0 or 1 for each resort.

Data preprocessing (cleansing, transformation)

There were several anomalies in the data. These include:

After the processing of these anamolies, there are 862 remaining records for analysis.

Summary statistics



Data Mining

Because the data did not contain clear class labels, I chose to perform k-means clustering, association analysis, and hierarchical clustering. The following tasks were performed: