Data Mining Portfolio
Artificial Neural Networks (ANN)
Definition
Artificial Neural Networks (ANN) is a classification strategy used for data mining that has its roots in the study
of biological neural systems. The human brain is formed up of several interconnected neurons that "learns" by adjusting
the level of strength between each neuron when an action is repeated. In an attempt to mimic this learning behavior found
in the brain ANNs are composed of several interconnected nodes and links with weights attached to them. By feeding an ANN
with a data set of desired inputs and desired outputs ANNs can repeatedly adjust the weights of each internal link in order
for the model to more accurately determine the correct output given an input. By exploiting the input/output behavior of
ANNs it is possible to train an ANN with known classification data (where input = characteristics and output = name of
the thing being classified) to predict the classification of unclassified data.
Perceptron
A perceptron is an example of the simpliest classification ANN model possible. A perceptron consists of two types of nodes.
Input nodes and output nodes. Each input node is connected to an output node by an edge with some weight attached to it.
The output of a perceptron is calculated by taking the weighted sum of the inputs (from the training data set), subtracting
a predetermined bias factor, and using some type of activation function on the result. Afterwards the weights are recalculated
depending on how the output compared to the real result of the training data and repeated until the performance of the model
reaches some acceptable level.
Source: http://danupranantha.wordpress.com/
Multilayer ANNs
A multilayered ANN is an example of a more complex ANN that in principle follows many of the fundamentals laid out by the
perceptron example above. The largest difference being that whereas the perceptron only had one set of input nodes and one
output node a multilayer ANN can have several intermediary layers inbetween the final input and output. These intermediary
layers are called hidden layers. The nodes can have links and interact in one of two ways with each other. In a feed-forward
ANN the nodes on one layer can only interact with the nodes with the nodes of the next layer. A recurrent neural network
however, allows for links made between one node to other nodes in the same layer, to preceeding layers, and even previous
layers. The objective of a multilayer ANN is to minimize the total sum of squared errors of all the weights in order to find
a model that more accuratly classifies data.
Source: http://www.2aida.net/aida/research2.htm
# This is part of an implementation of an artificial neural network
# as guided by the Programming Collective Intelligence book by Toby Segaran
# This function takes a list of inputs, which then go through the network , and returns all
# the values in the final output layer. This is the first step in the process of making a
# "learning" algorithm. At this point the NN has not changed in any way just the output has
# been created.
def feedforward(self):
# The only inputs are the query words
# Initializes te inputs nodes to be 1 (they will always end up remaining as 1)
for i in range (len(self.wordids)):
self.ai[i] = 1.0
# Hidden activiations
# Loops over all the hidden nodes and sums the outputs from the input layer
# mulitplied by the strengths of their lins. The output of each node is the tanh
# function of the sum of all the inputs.
for j in range(len(self.hiddenids)):
sum = 0.0
for i in range (len(self.wordids)):
sum=sum+self.ai[i]*self.wi[i][j]
self.ah[j] = tanh(sum)
# Output activations
# Multiplies the outputs of the previous layer by their strengths and then uses
# the tanh function in the same way as above to produce the final output
for k in range (len(self.urlids)):
sum= 0.0
for j in range(len(self.hiddenids)):
sum=sum+self.ah[j] * self.wo[j][k]
self.ao[k] = tanh(sum)
return self.ao[:]