DistilBERT :
DistilBERT is a variant of the BERT (Bidirectional Encoder Representations from Transformers) model, which is a type of deep learning model based on the transformer architecture. BERT is known for its state-of-the-art performance on a variety of natural language processing (NLP) tasks, including text classification, question answering, and language translation. the knowledge learned by a larger BERT model into a smaller DistilBERT model using a process called distillation.