Please use this identifier to cite or link to this item:
Title: Can You Read Cantonese? Candarin Can Help!
Authors: KU, SU WA(古樹樺)
Department: Department of Computer and Information Science
Faculty: Faculty of Science and Technology
Issue Date: 2021
Citation: Ku, S. W. (2021). Can You Read Cantonese? Candarin Can Help! (OAPS)). Retrieved from University of Macau, Outstanding Academic Papers by Students Repository.
Abstract: Cantonese is still a few developments for natural language processing. In the meantime, more people visit Macao, but they get communication problems at reading Cantonese text. People who do not understand Cantonese cannot read Cantonese in a written format. Cantonese is a Chinese dialect. It has a different grammar and replacing words can handle grammar problems. A translation system is one of the most valuable methods to help these people. Despite Cantonese dialect and Chinese Mandarin are similar, translating Cantonese is not a simple thing. To achieve a neural translation system, we select transformer as our model architecture. With the issue, Cantonese is a low resource language, and sparse data cannot support us to create neural machine translation models. We fall into low resource issues in the tasks. We discuss this issue of low resources and research approaches to build a corpus, generate data and train robust models. The solutions of low resources involve using bilingual dictionary NMT without parallel data, back-translation and dual learning. For the last, we will further explain the phenomenon of the character-level process in Chinese. We achieve our NMT models for Cantonese and Mandarin. By the approaches, we successfully solve low resource problems in Cantonese. Our NMT models have a gooddevelopment. Then, we take web programming and network technology to deploy our online translation system. There is a user-friendly design, and we provide convenient functions to users. Now, the translation system is opening on the Internet. For the result, we evaluate our translation quality. We successfully achieve the same performance with mainstream translation system, such as Baidu AI and Microsoft Bing. Our approach has proven the potential of NMT with low resource problems.
Course: Bachelor of Science in Computer Science
Instructor: Prof. Derek F. Wong, Prof. Lidia S. Chao
Programme: Bachelor of Science in Computer Science
Appears in Collections:FST OAPS 2021

Files in This Item:
File Description SizeFormat 
OAPS_2021_FST_DB725742_Ku SuWa_Can You Read Cantonese Candarin Can Help!.pdf27.87 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.