You should be working on Project 9.
Assignment: Continue Project 9 and work on your portfolio.
You should have finished Project 8.
What are characterizes linearly-separable classification problems?
What are some drawbacks to linear classifiers?
Midterm exam review
Project 9 introduction
Assignment: Reading 25 and Project 9.
You should have finished Reading 24, be reading the Knime workbook and working on Project 8.
Assignment: Prepare for the midterm exam and continue Project 8 and reading the Knime workbook.
You should be working on Project 8 and reading the Knime workbook.
What characterizes optimization problems?
What is "hill climbing" and what are its benefits and drawbacks?
What is "random restart hill climbing?"
What is simulated annealing? How is it better than hill climbing?
How might you apply optimization methods in data analysis?
What are genetic algorithms?
How can optimization solutions be presented as "generations" in genetic algorithms?
Assignment: Reading 24, continue Project 8 and continue reading the Knime workbook.
You should be working on Project 8.
What is dimensionality reduction?
What are some approaches to dimensionality reduction?
What is feature subset selection?
What are some approaches to feature subset selection?
What is feature creation?
What are discretization, binarization and variable transformation?
Assignment: Continue Project 8.
You should be working on Project 8.
What is data preprocessing?
What are aggregation and sampling?
What is dimensionality reduction?
What is feature subset selection?
What is feature creation?
What are discretization, binarization and variable transformation?
Assignment: Reading 23.
You should have completed Project 7.
How might you test an association rule learner with Weka?
What are some common "brands" of data mining tools?
How are most data mining toolkits meant to be used?
What challenges do big data introduce to the use of common mining toolkits?
Assignment: Finish Project 8.
You should have completed Reading 22 and be working on Project 7.
What is an anomaly?
What are some common generators of anomalies?
What are the classic approaches to anomaly detection?
What are some important issues to consider when detecting anomalies?
Assignment: Finish Project 7.
You should have completed Reading 21 and be working on Project 7.
What are some common approaches to evaluating association rules?
What metrics are often used when evaluating association rules?
How is the data domain important when assesssing the "worth" of association rules?
Assignment: Reading 22.
You should have completed Reading 20 and be working on Project 7.
What is the Fk-1 X F1 approach to frequent itemset generation?
What is the Fk-1 X Fk-1 approach to frequent itemset generation?
How might you efficiently generate association rules given all frequent itemsets?
Assignment: Reading 21.
You should have completed Reading 19 and be working on Project 7.
What is a frequent itemset?
What are some brute-force approaches to generating frequent itemsets?
What is the a priori approach to frequent itemset generation?
What is the classic itemset generation algorithm?
What are some efficient approaches to counting itemset support?
Assignment: Reading 20.
You should have completed Reading 18.
What is association analysis?
What do we mean by "asymmetric binary dataset?"
What is an itemset?
What is an itemset support count?
What is an association rule?
How do you calculate the support and confidence of a rule?
What are the challenges in generating assocation rules?
What are the two general steps in efficiently generating association rules?
Assignment: Reading 19.
You should have completed Reading 17.
What is Bayes Theorem?
How do Naive Bayesian Classifiers use probabilities to make predictions?
Assignment: Project 7 and Reading 18.
You should have completed Reading 16.
How do perceptrons work?
What problems can perceptrons not be trained properly to solve?
How do multiple layers allow us to solve non-linearly separable classification problems?
What is "hidden" about a hidden layer?
What is the importance of weights on input nodes?
What are some common activation functions?
What is backpropogation, in general?
Assignment: Reading 17.
You should have reviewed Reading 14.
What is a rule-based classifier? How does one work?
How can probability be used to create a classification model?
What is an ANN? How might one be used to create a classification model?
Assignment: Reading 15.
You should be finished with Project 6.
What is a Nearest-Neighbor classifier and how does it work?
What are some benefits and drawbacks of an NN classifier?
Project 6 summary & discussion.
Assignment: No homework (but be prepared on Monday to discuss Reading 14.
You should be working on Project 6.
How do we evaluate classifier performance?
What are some mechanisms for testing the validity of a classifier?
What is cross-validation?
Assignment: Reading 14 and finish Project 6.
You should be working on Project 6.
What is the general algorithm for decision tree induction? (review)
How do we calculate the "best split?"
Clustering implementation review
Assignment: Continue Project 6.
You should be working on Project 6.
What is the general algorithm for decision tree induction?
How do we determine the "best split?"
What is model overfitting? What are some common causes?
How do we evaluate classifier performance?
Assignment: Reading 13 and continue Project 6.
You should have completed Project 5.
What is decision tree induction?
How does a decision tree work?
How do we create decision trees? How are they used?
What is Hunt's algorithm?
How do attribute types influence attribute tests?
Assignment: Project 6.
You should be working on Project 5.
In what way is clustering an "unsupervised classifier"?
What do we mean by "classification?"
What are two general uses of classification models?
What is the general approach to solving a classification problem?
What is a training set? What is a test set? How are they used?
What is the general approach for measuring the quality of a classifier?
What is a confusion matrix?
Assignment: Reading 12 and continue Project 5.
You should have finished Reading 11 and be working on Project 5.
Clustering quiz
How might you choose an apporpriate Eps and MinPts?
How has the history of database systems, data warehouses and an explosion of data led to the "Rise of the Data Scientist?"
What is meant by data warehousing, ETL, and BI systems?
Assignment: Continue Project 5.
You should have finished Reading 10 and be working on Project 5.
What is density-based clustering?
What do we mean by density?
What are core, border and noise points?
What is DBSCAN? How does it work?
What are some strengths and weaknesses of DBSCAN?
Assignment: Reading 11 and continue Project 5.
You should have finished Reading 9 and Project 4.
What are the strengths and weaknesses of K-Means clustering? Why?
What are the strengths and weaknesses of hierarchical clustering? Why?
Assignment: Reading 10 and Project 5.
You should be working on Project 4.
What are some ways you might try to implement "groups"?
What is the K-Means clustering algorithm? How does it work?
What is the classic hierarchical clustering algorithm? How does it work?
What are some qualities and drawbacks of these algorithms?
You should have finished Reading 8 and be working on Project 4.
What is PNUTS?
What are some scalability issues inherent to the relational model when handling "big data"?
What is the difference between scaling up vs. scaling out?
What are some interesting requirements and features of PNUTS?
How does PNUTS compare to other databases?
Assignment: Continue Project 4.
You should be finished with Project 3.
Are there any flaws in our calculation of Euclidean distance from Friday?
What are some fundamental proximity metrics used repeatedly in data mining algorithms?
What is SMC? What contexts is it often used? How do you calculate it?
What is Jaccard? What contexts is it often used? How do you calculate it?
What is Cosine Similarity? What contexts is it often used? How do you calculate it?
What is Pearson Correlation Coefficient? How might it be used? How do you calculate it?
You should be working on Project 3.
Case Study: How might you implement movie recommendations using similarity?
You should have completed Reading 6 and be working on Project 3.
How does visualization fit in the data mining process?
Who is Edward Tufte?
What do we mean by similarity, dissimilarity and distance?
How might you represent similarity between objects with a single attribute?
How do attribute types affect how similarity is computed/represented?
How do you compute data object dissimilarity with Euclidean distance?
What are some other proximity metrics?
Assignment: Review DM 2.4 and continue Project 3.
You should have completed Reading 5 and Project 2.
What is Processing?
Why is it important for the data analyst to have a productive, flexible data visualization tool?
What are some examples of postmodern visualization? Why are they effective?
Assignment: Reading 6, Project 3 and Portfolio 1.
You should have completed Reading 4 and be working on Project 2.
What are some classic data visualization techniques?
What are some current trends in data visualization?
Why is data visualization an important part of the data mining process?
What are some historic and current pieces of literature from the visualization body of knowledge?
You should have completed Reading 3 and be working on Project 2.
What are some common data "formats" encountered in the Data Mining wilderness?
What are PEIR and YFD? How is data influencing the world around us?
Why is knowing your data important?
What are some common characteristics of datasets and data?
What are some data quality issues?
What should you first do once you have obtained a dataset?
What are some common summary statistics? What are some easy ways of generating them?
Weka summary stats demonstration
Assignment: Reading 4, continue Project 2 and post one cool visualization on Piazza.
You should have completed Reading 2 and Project 1.
What are two general categories of tasks in data mining?
What are some "families" of algorithms and tasks in data mining?
In general, what are some common examples of these algorithms?
What is an important, yet often overlooked, aspect in data mining?
Repository / project overview
You should have completed Reading 1 and be working on Project 1.
What are some common words in the Data Mining vocabulary?
What are the names of some strategies or algorithms used in data mining?
Given a particular context, how might you apply the ideas behind some of these algorithms?