Most programmers struggled with deep learning because gradients generally vanish or explode. Furthermore, there was no theory around deep networks at the time, making it difficult for programmers to comprehend how it operated.
Fortunately, a plethora of materials are now available for our use.
In this post, we will examine specifically at what and how greedy layer wise training of deep networks processes, so that you may run one on your own.
We’ll also look at other relevant topics including how bridging in computer networks run about.
Let’s get started without further ado.
What is Greedy Layer-wise Pretraining?
We all know that the quantity of error data sent back to prior levels decreases considerably as the amount of hidden layers rises. This means that the weights of the hidden layers near an output layer are naturally updated whereas the ones close to the output layer, do not get updated. This is otherwise known as the vanishing gradient and it is the main reason why deep networks training gets evaded.
To help implicate deep neural networks, greedy layer-wise pretraining often called “pretraining” came into light. Pretraining includes implementing a second hidden layer to a network and refitting it several times. This enables the new updated layers to acquire the inputs from the current hidden layer.
Moreover, this also happens while maintaining the values for the old hidden layers fixed. As the model is trained layer by layer, the process is known as “layer-wise.”
The reason why it earned the term greedy is because as the process divides the process into layers, it feels like a greedy shortcut to quickly solve a problem.
Thus, the entire method garnered its name– greedy layer-wise pretraining.
Types of Greedy Layer-wise Pretraining
To better understand the concept behind pretraining, one must thoroughly know its types which are,
- Supervised greedy layer-wise pretraining: Under this type, the hidden layers get added to a supervised model.
- Unsupervised greedy layer-wise pretraining: In this type, an unsupervised model is built by adding a supervised layer. Only output layers are added to it.
Advantages of Pretraining
Some of the benefits of greedy layer-wise pretraining are,
- The training process is turned easy
- Greedy layer-wise pretraining of deep networks is made possible
- Weight generating idea massively helps
- Generalized errors are greatly reduced
Note: Before we move on to the methods, we must be aware of bridging in computer networks as when a programmer does that and encounters a problem, the below methods can be employed.
What is Bridging in Computer Networks?
A bridge is a connecting device which joins various LANs (Local Area Network) to make a huge LAN. We termed it as bridging as the mechanism it carries to connect LANS looks like it puts forth a bridge to meet each other. A bridge acts on the OSI model’s data link layer. Note how bridging has a significance on “layers”.
As we’re studying on greedy layer-wise pretraining, every programmer must be aware of bridging as when deep networks are tried to merge, this method is employed.
Greedy Layer-Wise Pretraining of Deep Networks: Key Aspects
As we know, the algorithm used by deep networks is greedy layer-wise pretraining for it as it helps in easily developing deep multi-layered neural networks. Under this algorithm there are two aspects through which deep networks can be pretrained.
Let us look at them briefly with examples.
Multi-Class Classification Problem
Training a classifier with the help of training data and using the trained classifier to classify new examples is a multi-class classification. To understand this better let us take an example.
To create a multi-class classification problem we need the make_blobs() function which is provided by scikit-learn. make_blobs() helps in creating a problem with the required samples, variables, classes and variances within a class.
Then, the multi-class classification problem will be modified with input variables. These two inputs will represent x and y coordinates of the points which will have the standard deviation of 2.0. To get the same data points, we will use the exact random state. Next, to model a network, we will use the input and output values of the dataset that comes as a result of the problem.
Hence, it becomes useful because through this problem, the neural network model can fetch many solutions for its networks.
Supervised Greedy Layer-Wise Pretraining
In this aspect, we will first create a dataset and create a deep multilayer perceptron(MLP) model. To make a MLP, we make use of greedy layer-wise supervised learning. Note that pretraining is not mandatorily required to solve this simple model problem because MLP model is developed to implement greedy layer-wise pretraining to just save the results for future purposes.
With the dataset that was created, we will start training a model with keras sequential. To enable that, we must create one-hot encoding. To do that, we will use the to_categorical() function from TensorFlow.
Next to train the dataset (90% to train and 10% to test, that is 9:1 ratio) we will use the train_test_split() function from the sklearn (scikit) library as it helps to distribute the dataset according to the fixed percentage (90%).
Once this is done, we can make a base model and train it. The model will be MLP which will use two inputs for the dataset’s input variables. It will contain 10 nodes in one hidden layer, and will also have a Relu function. Moreover, the output layer will have 4 nodes which will help in predicting each class’ probability. It will use the softmax activation function. Then we will print the model’s summary, create a function to evaluate it and finally we will run the evaluation.
Finally we will have the MLP’s basic model and we can employ greedy pretraining. To employ the pretraining, we will have to start a function which can enable adding hidden layers and can add weights in the output as well as the newly added hidden layers. Once the layers are added, we can easily train it and keep repeating the process to add more layers if needed.
Conclusion
We hope you have got your programming knowledge enhanced as we’ve covered greedy layer-wise pretraining of deep networks by looking into its aspects, types, mechanisms and pros. With its multi purposes, greedy approaches are used to solve advanced DSA problems.