Project 5.1: The Data Mining Process with WEKA
Objective
Explore your dataset, propose a question or general experimental theory, execute a mining strategy and discuss your results.
Specific Tasks
Now that you've got your datasets and discussed in general what might be possible, it's time to get to the details and mine the data. For this project, you are expected to explore the data, think about what you might be able to accomplish with the dataset, propose a mining strategy, execute that strategy and discuss your results.
- Explore the data (clustering? Summary stats. Anomalies? Other important things to consider?)
- Simplify/preprocess your dataset: what did your exploration yield? Can you reduce the scope of your mining goal to make it more attainable? What preprocessing steps do you need to accomplish?
- State your mining goal/theory and what you hope to accomplish.
- Choose one to three strategies/algorithms that you feel will help you meet your goal. Be specific and formally propose your theory and explain why you chose the algorithms.
- Run your process using Weka KnowledgeFlow, or equivalent GUI toolkit. You can go a different route (Orange widgets, RapidMiner, etc), but I recommend using KnowledgeFlow.
- Be sure to save and commit your mining schema (Weka .kf file or other) or at least describe in detail your mining process.
- Use your chosen strategy/ies repeatedly, changing the parameters of the algorithm and recording the results. Make at least three attempts with different algorithm parameters. Be sure to record the difference.
The expected deliverable for this is a professional quality paper describing the entire process outlined above.
Grading Criteria
You must complete all items listed above. Due Monday, Dec 14 @ 5PM. Be prepared to present your project the last week of class.