Due Friday, Dec 12 by 5PM.
The purpose of this project is to familiarize you with the process of data mining using a modern programming toolkit to apply numerous data mining strategies.
This project uses Orange, a suite of data mining tools interfaced via C++, Python or through GUI widgets.
This project will require you to do five things:
Your grade is based on your documentation of this project and your code repository.
A reading list is provided here, with targeted questions about each. Include your answers / summaries in the final documentation.
The orange home page.
The orange 'screenshots' page.
From Experimental Machine Learning to Interactive Data Mining.
Orange Widgets & Visual Programming and Orange and Visual Programming
Orange Widgets for Functional Genomics.
For each step in the tutorial below, unless otherwise noted, write a short summary of what you accomplished, complete the Python examples provided and commit your Python code to your repository.
After you've completed the tutorial steps above, you should have a good understanding of what tools are available to you in Orange. Now it's time to try some of these approaches on a dataset of your own choosing. For this part of the project, you must:
Your data mining process doesn't have to be perfect, or even yield incredibly interesting results; the important thing is the process. So don't be afraid to try something fun even if it may not yield amazing results.
Some resources to help you:
And here are some suggestions for sources of data:
Important: Be sure your process attempts to follow the general outline of acquire, parse, filter/preprocess, mine, and postprocess (represent, refine & interact). Some of these steps may be trivial (like acquire & parse) and others more evident (filter, mine).