Inventory products: classification with Neural Networks and Logistic Regression

Introduction

This project aims to develop a classification system for a set of products of an organization. The data comprise a csv file, with 198,917 articles with 13 attributes as shown in figure 1.
Figura 1.
Some attributes can be understood by its name however I will explain the following as clarification:
  • SKU_number: Unique identifier of the product.
  • Order: Order Order
  • SoldFlag: Boolean attribute if it has been sold during the previous 6 months (1 = True, 0 = False).
  • MarketingType = Type of segment to which the product is directed (D or S).
  • New_Release_Flag = Products that might have a new release (if New_Release_Flag> = 1).
The attribute 'StrengthFactor' does not have a clear meaning for the organization, so it is omitted for this study. Also, the factors: Order, SoldCount, ReleaseYear and FileType are omitted.
When cleaning these attributes of the initial data a table is obtained as shown in figure 2:

Figura 2.
The objective of this system is to predict for each item what the value of the SoldFlag attribute will be, this implies that if the value is 1 (true), the item will continue in the inventory while a value of 0 (false) will discard the item.
The data to be predicted is 122,921. Similarly, the data used for training is 75,996. The data used for the construction and training of the systems are 26,127, of which 2,613 data will be used to validate (level of precision) classification systems. The total data belonging to class D are 97,971 of which 35,119 are historical data and 62,852 are the items to be predicted. The total data belonging to class S are 100,949 of which 40,877 are historical data and 60,072 are predicted items.
Two classifier systems will then be developed. The first is an Artificial Neural Network. The second consists of a Logistic Regression.

Developing and validation of systems

Artificial Neural Network

For each type of Element classified as D or S (see attribute "MarketingType"), the accuracy of the classifiers will be trained and evaluated.

We initially import the libraries that are needed for the development of an MLP (Artificial Neural Network) classifier.
# -*- coding: utf-8 -*-
'''
Python 3.5.2
'''
import pandas as pd
from sklearn.neural_network import MLPClassifier
In this second step we start the classifier consisting of a neural network with 6 neurons in a hidden layer, the 6 neurons. The data used for the training are 90% of the historical data of which the value of the attribute "SoldFlag" is known and the remaining 10% will be used to evaluate the accuracy of the classifier. This is done for class D and class S of the items.
The data corresponding to 90% of the data as shown:

  • X_train_D: are the historical data of class D.
  • Y_Train_D: is the value of the attribute "SoldFlag" for the data X_train_D.

# Construcción y entrenamiento del clasificador Red Neuronal Artificial
clf_D = MLPClassifier(activation="tanh",solver="lbfgs",hidden_layer_sizes=(6, 1))
clf_D.fit(X_train_D[:31608],Y_train_D[:31608])
prediction_D_train = clf_D.predict(X_train_D[31608:])
The accuracy level of this classifier is computed as follows:
# Cálculo de precisón
accuracy = 0
for i in range(0, len(prediction_D_train)):
    if Y_train_D["SoldFlag"].ix[31608+i] == prediction_D_train["Prediction D"].ix[i]:
        accuracy += 1
print("Precision del MLP: ", accuracy/len(prediction_D_train)*100," %")
# Nivel de precisión
Precision del MLP: 72.03076046710339 %.
Now that we know the level of accuracy of the classifier, we proceed to perform the evaluation of our objective data that are presented below:

  • X_test_D: they are the class D item without value of the attribute "SoldFlag" and presents a total of 62,852 item to classify.

# Predicción de los ítem no históricos
prediction_D_test = clf_D.predict(X_test_D)
This prediction shows that the 62,852 items have a value of 0 (False) in the target attribute and a level of accuracy of the classifier of 72%. This implies that these items should be omitted from the inventory. Figure 3 shows the final result of the table for class D:
Figura 3.
The above steps are developed again for the S class of products with the following data:

  • X_train_S: are the historical data of class S.
  • Y_train_S: is the value of the attribute "SoldFlag" for the data X_train_S.
  • X_test_S: are the class S items without value of the attribute "SoldFlag" and present a total of 60,069 items.

As for class S, the artificial neural network with 6 neurons and a hidden layer is constructed as follows:
# Construcción y entrenamiento del clasificador Red Neuronal Artificial
clf_S = MLPClassifier(activation="tanh",solver="lbfgs",hidden_layer_sizes=(6, 1))
clf_S.fit(X_train_S[:36790],Y_train_S[:36790])
And you get a precision level of 90.70% in the classification of the data. It is noted that 4 inventory references for this type of classifier could be maintained. Figure 4 shows the references evaluated as 1 in the attribute "SoldFlag".
Figura 4.
In general, this Artificial Neural Network shows an AVERAGE accuracy level of 81% and considers that of the total of the classes only 4 class S references should be maintained.

Logistic Regression.

As in the previous classifier we will import the following library:
# -*- coding: utf-8 -*-
'''
Python 3.5.2
'''
from sklearn.neural_network import MLPClassifier
We will use the same data both for training and for the evaluation of the level of precision and we will also segregate the items of class D and of class S.
# Variable para llamar al RL
logistic_R = LogisticRegression()
For class D we train our classifier like this:
# Entrenamiento del clasificador RL
logistic_D_train = logistic_R.fit(X_train_D[:31608],Y_train_D[:31608])

With which a level of accuracy of 72.97% is obtained. Finally, when making the prediction of the data "X_test_D" shows that there are 1,869 data valued as 1, as shown in figure 5:
Figura 5.
For class S the same process is performed and a precision level of 90.55% is found and there is a valuation of 168 item with value 0 (true). See figure 6:

Figura 6.

CONCLUSIONS

  • For both classifiers in class D elements an average accuracy level of 72.5% is shown.
  • For both classifiers in class S elements an average accuracy level of 90.5% is shown.
  • For Class D it is observed that the proportion of useful data (used for training and validation) represents 17.6% of the total data, while for S-class this useful data value is 20.5%. This explains that in both classifiers there is a variation of approximately 20 percentage points in the level of precision between class S and class D.
  • Likewise, the ratio of data to train and the amount of data to predict for classes D and S is respectively 0.56 and 0.68, which explains why, for this case, a relationship of this type that is closer to 1 sample A better performance of the classifiers.

Comments

  1. Great Informative blog and Navya enterprises is one of the leading company in providing Balcony Safety Net in Hyderabad and along with vulnerable services. As we all know balconies are ambitious gesture for apartments and stylistic look for every buildings

    ReplyDelete

Post a Comment

My photo
Mauricio Muñoz
Hola, soy Mauricio, ingeniero industrial me gustan las matemáticas, la producción eficiente y la inteligencia computacional. Con pasión por la mejora de los procesos de producción, el idioma Japonés y el desarrollo de energía limpia.