Project 6: Applied Theory & Practice I



Week 1: October 4 - 11 2010

For the week of October 4, 2010, you should accomplish the following:

  1. Choose a dataset that you would like to mine. See the forum for some pointers to various datasets, APIs and clearing houses. Be sure to post links to datasets you've found. (due 10/6)
  2. Select a tool that you would like to gain familiarity with. (Weka, Rapidminer, Orange, Knime, R, or other). (due 10/7)
  3. Document your data as pragmatically as possible. Define what type of dataset it is, and describe the attribute types. (due 10/8)
  4. If you have a theory related to your dataset you wish to prove or disprove, define it clearly. (due 10/8)
  5. Explore the data through summary statistics, visualizations, etc. Declare any interesting aspects of the data you plan to explore. (due 10/11)
  6. Define and plan your next actions. (What strategies do you have in mind? What transformations or preprocessing do you think you need to perform?) (due 10/11)

Week 2: October 11 - 20 2010

For the week of October 11, 2010, you should accomplish the following:

  1. Complete any basic data preprocessing necessary to enable your toolkit to read your dataset. (RDBMS? CSV? ARFF?) (due 10/15)
  2. Create a trivial workflow with your toolkit that reads in the data, and sends that node's output to another node of your choice. (due 10/20)
  3. Define and plan your next actions. (What tasks do you need to get your tool to do next?) (due 10/20)

Grading Criteria

As discussed in class, we're going to use our Ore forum to communicate how we've met the above requirements. One thread for each requirement has been created -- post your responses accordingly (but feel free to start your own threads for other topics). You will be awarded one point per post, per each requirement above. Each requirement is due by midnight on the respective date.

Be sure to collaborate, ask questions, share resources and techniques you've discovered, interesting papers or articles on applied mining strategies, and so on. The fun and excitement of this project is dependent on your level of professional participation!