A Study of Proximal Policy Optimization (PPO) Algorithm in Carla

WONG, IOK KEONG(黃育強)

Please use this identifier to cite or link to this item: http://oaps.umac.mo/handle/10692.1/247

Title:	A Study of Proximal Policy Optimization (PPO) Algorithm in Carla
Authors:	WONG, IOK KEONG(黃育強)
Department:	Department of Computer and Information Science
Faculty:	Faculty of Science and Technology
Issue Date:	2021
Citation:	Wong, I. K. (2021). A Study of Proximal Policy Optimization (PPO) Algorithm in Carla (OAPS)). Retrieved from University of Macau, Outstanding Academic Papers by Students Repository.
Abstract:	In this report, we study the use of the latest algorithms of deep reinforcement learning to train self-driving cars and achieve driving safety. In the coming years, self-driving has been an important topic for researchers, governments, and enterprises. In one of the most densely populated cities in the world, Macau has many different traffic problems. Examples include traffic congestion and use a lot of time spent on traffic, including traffic problems caused by personal and government urban construction issues. Caused a high number of car accidents. According to the current development of autonomous driving technology, many organizations in the industry have established some autonomous driving algorithms based on this problem, driving several kilometers on public roads without any accidents. But whether it is manual driving or automatic driving, the real driving scene is very complicated. Therefore, this report uses reinforcement learning algorithms to train vehicles to learn how to drive autonomously in a simulated environment and better solve traffic problems in the real world. Through our work, we use the Carla environment and provide three types of operating models; the first is a new algorithm based on PPO, which includes setting up a fixed route based on the starting point and end point, setting the speed at which car approaches the target speed, and the second is The PPO-based reinforcement learning algorithm optimizes the training after the rewards design, so that the vehicle can safely drive on the route, and the last model will be based on the vehicle on the route ahead to avoidance vehicles. When training these environmental models, we have designed different decision making methods for various environments, such as training formulas with different rewards, or using environments with or without checkpoints, which will affect the learning results of the generated agent.
Course:	Bachelor of Science in Computer Science
Instructor:	Prof. Leong Hou U
Programme:	Bachelor of Science in Computer Science
URI:	http://oaps.umac.mo/handle/10692.1/247
Appears in Collections:	FST OAPS 2021

Files in This Item:

File	Description	Size	Format
OAPS_2021_FST_DB726106_Wong IokKeong_A Study of Proximal Policy Optimization (PPO) Algorithm in Carla.pdf		29.96 MB	Adobe PDF	View/Open

Show full item record