Simple Neural Networks in Python

How-to
Python
NN
An outline for creating a simple Neural Network in Python
Author

Mark Edney

Published

March 20, 2022

Neural Networks (NN) have become incredibly popular due to their high level of accuracy. The creation of a NN can be complicated and have a high level of customization. I wanted to explore just the simplest NN that you could create. A framework as a workhorse for developing new NN.

The SciKitlearn provides the easiest solution with the Multi-Layer Perceptron series of functions. It doesn’t provide a bunch of the more advanced features of TensorFlow, like GPU support, but that is not what I’m looking for.

Initialization

For the demonstration, I decided to use a data set on faults found in steel plates from the OpenML website. The data set includes 27 features with 7 binary predictors.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

df = pd.read_csv('https://www.openml.org/data/get_csv/1592296/php9xWOpn')

predictors = ['V28', 'V29', 'V30', 'V31', 'V32', 'V33', 'Class']
df['Class'] -= 1 

Since there are multiple binary predictors, I needed to create a single class variable to represent each class. The Class variable doesn’t currently represent this, it represents all faults that don’t fit in the categories of V28 to V33. The single variable class was created with the np.argmax function which returns the index of the highest value between all the predictors.

y = np.argmax(df[predictors].values, axis =1)
X = df.drop(predictors, axis = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

Modelling

This is the most basic model that I would like to evaluate. I’ve used the GridSearch function, so all combinations of parameters are tested. The only parameter I wanted to examine was the size of the hidden layers. Each hidden layer provided is a tuple, where each number represents the number of nodes in a singled layer. Multiple numbers represent additional layers.

from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'hidden_layer_sizes':[(1),(100), (100,100), (100,100,100), 
(100,100,100,100), 
(100,100,100,100,100), 
(100,100,100,100,100,100), 
(100,100,100,100,100,100,100),
(100,100,100,100,100,100,100,100),
(100,100,100,100,100,100,100,100,100),
(100,100,100,100,100,100,100,100,100,100)]}
model = MLPClassifier(random_state = 1,max_iter = 10000, 
solver = 'adam', learning_rate = 'adaptive')

grid = GridSearchCV(estimator = model, param_grid = parameters)
grid.fit(X_train, y_train)
print(grid.best_score_)
0.4054982817869416

The performance of the best model in the grid is not impressive. It took me awhile to realize that I had forgotten to scale the features. I included this error to show the importance of scaling on model performance.

Feature Scaling

The features are simply scaled with the StandardScaler function. The same model is used on the scaled features.

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
scaler = sc.fit(X_train)
X_train_sc = scaler.transform(X_train)
X_test_sc = scaler.transform(X_test)

parameters = {'hidden_layer_sizes':[(1),(100), (100,100), (100,100,100), 
(100,100,100,100), 
(100,100,100,100,100), 
(100,100,100,100,100,100), 
(100,100,100,100,100,100,100),
(100,100,100,100,100,100,100,100),
(100,100,100,100,100,100,100,100,100),
(100,100,100,100,100,100,100,100,100,100)]}
model = MLPClassifier(random_state = 1,max_iter = 10000, 
solver = 'adam', learning_rate = 'adaptive')

grid = GridSearchCV(estimator = model, param_grid = parameters, cv=3)
grid.fit(X_train_sc, y_train)
grid.best_score_
0.7553264604810996

The performance of the scaled model is much more impressive. After the GridSearch function finds the parameters for the best model, it retrains the model on the entire dataset. This is because the function utilize cross validation, so some data was withheld for comparing the different models on test data.

Conclusion

With our model constructed, we can now test its performance on the original test set. It is important to remember to use the scaled test features, as that is what the model is expecting.

grid.score(X_test_sc, y_test)
0.7304526748971193

The results are pretty satisfactory. A decent level of accuracy without a lot of complicated code. Default values were used, whenever they were appropriate. Additional steps could be taken, but this remains a good foundation for future exploratory analysis.

Photo by Alina Grubnyak on Unsplash