Data Mining Portfolio

Artificial Neural Networks (ANN)

Definition

Artificial Neural Networks (ANN) is a classification strategy used for data mining that has its roots in the study of biological neural systems. The human brain is formed up of several interconnected neurons that "learns" by adjusting the level of strength between each neuron when an action is repeated. In an attempt to mimic this learning behavior found in the brain ANNs are composed of several interconnected nodes and links with weights attached to them. By feeding an ANN with a data set of desired inputs and desired outputs ANNs can repeatedly adjust the weights of each internal link in order for the model to more accurately determine the correct output given an input. By exploiting the input/output behavior of ANNs it is possible to train an ANN with known classification data (where input = characteristics and output = name of the thing being classified) to predict the classification of unclassified data.

Perceptron

A perceptron is an example of the simpliest classification ANN model possible. A perceptron consists of two types of nodes. Input nodes and output nodes. Each input node is connected to an output node by an edge with some weight attached to it. The output of a perceptron is calculated by taking the weighted sum of the inputs (from the training data set), subtracting a predetermined bias factor, and using some type of activation function on the result. Afterwards the weights are recalculated depending on how the output compared to the real result of the training data and repeated until the performance of the model reaches some acceptable level.

Source: http://danupranantha.wordpress.com/

Multilayer ANNs

A multilayered ANN is an example of a more complex ANN that in principle follows many of the fundamentals laid out by the perceptron example above. The largest difference being that whereas the perceptron only had one set of input nodes and one output node a multilayer ANN can have several intermediary layers inbetween the final input and output. These intermediary layers are called hidden layers. The nodes can have links and interact in one of two ways with each other. In a feed-forward ANN the nodes on one layer can only interact with the nodes with the nodes of the next layer. A recurrent neural network however, allows for links made between one node to other nodes in the same layer, to preceeding layers, and even previous layers. The objective of a multilayer ANN is to minimize the total sum of squared errors of all the weights in order to find a model that more accuratly classifies data.

Source: http://www.2aida.net/aida/research2.htm

# This is part of an implementation of an artificial neural network
# as guided by the Programming Collective Intelligence book by Toby Segaran

# This function takes a list of inputs, which then go through the network , and returns all
# the values in the final output layer. This is the first step in the process of making a
# "learning" algorithm. At this point the NN has not changed in any way just the output has
# been created.
def feedforward(self):
	# The only inputs are the query words
	# Initializes te inputs nodes to be 1 (they will always end up remaining as 1)
	for i in range (len(self.wordids)):
		self.ai[i] = 1.0

	# Hidden activiations
	# Loops over all the hidden nodes and sums the outputs from the input layer
	# mulitplied by the strengths of their lins. The output of each node is the tanh
	# function of the sum of all the inputs.
	for j in range(len(self.hiddenids)):
		sum = 0.0
		for i in range (len(self.wordids)):
			sum=sum+self.ai[i]*self.wi[i][j]
		self.ah[j] = tanh(sum)
	# Output activations
	# Multiplies the outputs of the previous layer by their strengths and then uses
	# the tanh function in the same way as above to produce the final output
	for k in range (len(self.urlids)):
		sum= 0.0
		for j in range(len(self.hiddenids)):
			sum=sum+self.ah[j] * self.wo[j][k]
		self.ao[k] = tanh(sum)
	return self.ao[:]