In the era of Artificial Intelligence, everybody is striving to make the most intelligent machines to replicate human decision making. With the considerable increase of computational power, deep learning networks have shown huge progress. One of the biggest criticisms of these models, however, is that unlike humans, they require a huge amount of ground truth samples i.e. data marked with correct answers for the learning process.
One-shot learning is the paradigm that formalizes this problem. In One-shot learning, we provide the model with one reference point/ training sample and it learns to re-identify that instance in the testing data. One of the most common uses of One-shot learning is in Facial Recognition Systems.
Table of Contents
- How a Classification Problem is Solved using Deep Learning?
- One-Shot Classification
- Siamese Neural Network for One-Shot Learning
- How can we Identify Forged Signatures using Siamese Network?
- Details of the Training Process of Siamese Network
- Triplet Loss
How a Classification Problem is Solved using Deep Learning?
If we talk about building a facial recognition application with a standard classification process used in deep learning, we would follow the following steps:
- We will collect facial images of different people whom we want to identify.
- We will write a Convolution Neural Network (CNN, most widely used for images), where the inputs are collected images and the outputs are probabilities of different people/classes/categories.
- Say we collected facial images for 10 different people, then the output will be a probability map for each person.
- The person with the highest probability is selected as the output.
The two important requirements to build an accurate classifier using deep learning are:
- A large number of training samples: Many training samples are required for each of the categories/classes; so that the model can learn generalized features and can perform well on unseen data.
- Cost of re-training: If we want to categorize an image into any other category apart from which it was trained on, we would have to retrain the model by adding enough samples of this new category. Thus, the cost of data collection and periodical re-training is too high.
Now say you are building a facial recognition system for a company with 100 employees. The system wants to match an employee’s image with his/her image stored in company’s database in real-time.
A classic deep learning network can help us build the facial recognizer. But there are some concerns to consider. What if a new person joins the company or somebody leaves? The whole system has to be retrained with every addition or subtraction. Also, large number of images would be required for each employee.
The above problems can be solved with One-shot classification, requiring only one training sample for each category. Yes, you heard that right, just one example!
In One-shot classification, the model is not learning to directly classify the sample, but is striving to find how similar is the new sample to its reference. Therefore, we train a network to learn a similarity function that will take two samples as an input and output how similar they are.
This approach solves our requirements of retraining or adding many samples for applications like facial recognition. Only one sample will be required as reference (stored in the database), and the network will identify how similar is the real-time data. This approach is described as One-shot – one for a single reference point.
Siamese Network for One-Shot Learning
One of the networks used for One-shot learning is Siamese Neural Networks (SNN).
SNN is made up of two identical neural networks which are merged into a single neural network. It contains multiple instances of the same model and share the same architecture and weights.
Now let us discuss one more application where we can apply One-shot learning. Consider signature verification for detection of fraudulent cheques in the banking sector. Signature verification requires detailed analysis and is sometimes difficult for humans to identify the difference.
How can we Identify Forged Signatures using Siamese Networks?
Signatures continue to play an important role in financial, commercial and legal transaction for account openings, withdrawals and transaction payments. In contractual matters, signatures also play a vital role.
On the other hand, the threats and monetary losses continue to rise dramatically; in particular check fraud has reached epidemic scale . Any process that requires a signature is a prime contender for signature identification. Individuals are less likely to object to their signature being confirmed as compared to other possible authentication systems .
Below are the details of the training process of Siamese Network
1) We collect signature images that are similar and forged. Please see the image below where left are the original signatures while right is the forged.
2) Next we divide the dataset into two parts: Signature S1𝑖 (signature 𝑖 in the first batch) is a duplicate of S2𝑖 (signature 𝑖 in the second batch), but all other signatures in the second batch are not duplicates of S1𝑖.
3) Next we feed the batches to a Convolutional Neural Network (CNN) and obtain a 1-d vector of features, then we calculate the cosine similarity between the two vectors.
4) Now, say that we have 4 images in Batch1 and 4 images in Batch 2, you will have something similar to the example below out of the network explained in 3.
5) The above steps were for forward propagation. Now, what about loss function? How do we update the weights of our neural network?
We use Triplet Loss for training our Siamese Network.
The aim of the network is to differentiate between similar and dissimilar signatures.
Therefore, the cosine similarity between similar samples should be much higher than the negative ones. The triplet loss function would then be defined as follows:
diff = s (A, N) – s (A, P)
where A is our actual point of interest, N is the negative/dissimilar sample and P is the positive/similar sample. The smaller the difference, the better is our network’s performance.
Now, how do we select these triplets (A, P, N)?
One way is to randomly select positive and negative samples from two batches. But that is very easy to train. To avoid this easy learning, we select hard triplets that will be defined as follows:
We can select hard negative samples using the following two approaches: mean negative and closest negative.
The diagonal elements are the positive samples. The total loss can be defined as follows:
Loss = (mean_neg – s (A, P)) + (closest_neg – s (A, P))
Using the above loss function, we can back propagate and update the weights of our CNN.
In addition to applications like face recognition systems and signature authentication, Siamese networks are also used for NLP applications such as comparing the meaning of word sequences. They can also identify question duplicates, which is a very important NLP application at the core of platforms like Stack Overflow or Quora.
To conclude, it is easier to create training datasets and avoid re-training of networks using One-shot learning. There are, however, some limitations. In particular, during the training process, the network would require different types of samples to achieve generalization.