Enhancing Knowledge Distillation for Text Summarization

Kotit, Mohammad Basheer

المرشد	Shaban, Khaled
المؤلف	Kotit, Mohammad Basheer
تاريخ الإتاحة	2024-02-04T07:03:35Z
تاريخ النشر	2024-01
معرّف المصادر الموحد	http://hdl.handle.net/10576/51500
الملخص	In the realm of natural language processing, recent advancements have been significantly shaped by the development of large pretrained Seq2Seq Transformer models, including BART, PEGASUS, and T5. These models have revolutionized various text generation applications, such as machine translation, text summarization, and chatbot development, by offering remarkable improvements in accuracy and fluency. However, their deployment in text summarization often encounters significant challenges in environments with limited computational resources. This research proposes an innovative solution: the development of compact student models. These models are designed to emulate the capabilities of their larger pretrained counterparts (teacher models) while ensuring reduced computational demands and increased processing speed, thus maintaining high performance with greater efficiency. Knowledge distillation, a popular technique in model optimization, typically employs two primary techniques: direct knowledge distillation and the use of pseudo-labels. Our research enhances direct knowledge distillation by introducing an effective behavior function. This function selectively emphasizes the more certain predictions from the teacher model, thereby addressing the exposure bias issue that arises from differences between training and testing environments. In addition to this, we propose a novel approach to select the most reliable predictions from the teacher model. These highconfidence predictions are then utilized as pseudo-summaries, optimizing the student model’s training through the pseudo-label technique. This dual approach mainly focuses on the confidence of teacher predictions and offers a comprehensive solution to enhance the model’s performance while maintaining computational efficiency. We evaluated our methods using BART on the CNN/DM dataset and Pegasus on the XSUM dataset. The findings of these assessments revealed that our approaches not only successfully achieved the knowledge distillation objectives, but also significantly surpassed the performance of the teacher models.
اللغة	en
الموضوع	Knowledge Distillation Natural Language Processing
العنوان	Enhancing Knowledge Distillation for Text Summarization
النوع	Master Thesis
التخصص	Computer Science & Engineering

الملفات في هذه التسجيلة

الاسم:: Mohammad Kotit _ OGS Approved ...
الحجم:: 2.803Mb
الصيغة:: PDF

عرض / فتح

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎96‎ items ]

عرض بسيط للتسجيلة

Enhancing Knowledge Distillation for Text Summarization

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video