1. Full name: Le Ngoc Thang 2. Sex: Male
3. Date of birth: 07/12/1980 4. Place of birth: Nam Dinh
5. Admission decision number: 17/QĐ-CNTT Dated: 18/01/2017
6. Changes in academic process:
- Decision No. 55/QĐ-VCNTT dated December 31, 2019 on extending the study period for PhD students.
- Decision No. 2/QĐ-CNTT dated January 14, 2022 on the dismissal of PhD students.
7. Official thesis title: Research and development of text summarization techniques for summurizing Vietnamese online newspaper.
8. Major: Information system management 9. Code: 9480205.01QTD
10. Supervisors:
- Associate Professor, Dr. Pham Bao Son.
- Dr. Le Quang Minh.
11. Summary of the new findings of the thesis:
The dissertation studies text summarization methods and Vietnamese text summarization techniques. Based on this foundation, it has achieved new results compared to previous studies, as follows:
- Proposed a method to calculate sentence similarity in Vietnamese online news articles; constructed two corpora to support the task of summarizing Vietnamese online news articles: VNNEWS.100.2018 and VNNEWS.500.2024.
- Investigated and tested a summarization method for Vietnamese online news articles based on graph theory with two unsupervised learning algorithms (TextRank and LexRank). The results showed that with the LexRank algorithm, using the proposed sentence similarity calculation method for online news articles improved effectiveness by 2%.
- Studied the BERT model and proposed a summarization method for Vietnamese online news articles based on BERT, incorporating pre-existing knowledge from the text. The results indicated that the model with added knowledge outperformed the model without added knowledge by 2.5%.
12. Practical applicability, if any:
This dissertation explores several methods for summarizing Vietnamese online news articles, which can be practically applied to develop software for summarizing Vietnamese online news articles, serving the purposes of state management in information and communication.
13. Further research directions, if any:
- Expand the feature set of Vietnamese online news articles.
- Build a sufficiently large corpus of Vietnamese online news articles for the BERT model, encompassing all the unique characteristics of this text genre.
- Investigate and leverage pre-existing knowledge in Vietnamese online news articles to enhance performance and accuracy in the task of extractive sentence summarization using pre-trained models.
14. Thesis-related publications:
[CT1] Thắng, L., & Minh, L. (2018). Một số đặc trưng trong tóm tắt văn bản báo mạng điện tử tiếng Việt. In Kỷ yếu Hội nghị Khoa học công nghệ quốc gia lần thứ XI về Nghiên cứu cơ bản và ứng dụng công nghệ thông tin (FAIR), 2018, (pp. 330 - 335).
[CT2] Thắng, L., Minh, L. and Sơn, P. (2020). Tóm tắt báo mạng điện tử tiếng Việt sử dụng TextRank. In Kỷ yếu Hội nghị Khoa học công nghệ quốc gia lần thứ XIII về Nghiên cứu cơ bản và ứng dụng công nghệ thông tin (FAIR), 2020, (pp. 623 - 627).
[CT3] Le Ngoc Thang, Le Quang Minh (2023), Vietnamese online newspapers summarization using LexRank, Cборник научных трудов по материалам Международной научно-практической конференции 28 декабря 2023г.: Белгород, ISSN 2713-1513.
[CT4] Thang Le Ngoc, Minh Le Quang (2024), “Vietnamese Online Newspaper summarization using Pre-trained model”, Актуальные исследования: МЕЖДУНАРОДНЫЙ НАУЧНЫЙ ЖУРНАЛ (CURRENT RESEARCH: INTERNATIONAL SCIENTIFIC JOURNAL). №2 (184), 09 – 16, 2024, ISSN 2713-1513.
[CT5] Ngoc-Thang Le, Minh-Tien Nguyen, Nhat-Minh Do, Chi-Thanh Nguyen and Quang-Minh Le. (2024) A method to utilize prior knowledge for extractive summarization based on pre-trained language models. Vietnam Journal of Science and Technology.