Figure 1. A Visualization Example of Object Detection for Autonomous Driving.
### 1.2 What will you learn from this assignment? - Understand the basic theories of Deep Learning exspecially for object detection task, such as the networks of [**ResNet**](https://arxiv.org/abs/1512.03385) and [**YOLO v1**](https://arxiv.org/abs/1506.02***0), the detection loss function, and important concepts of object detection (e.g. **ROC Curve and Mean AP**). - Apply object detection algorithm to solving **autonomous driving problem**, such as detecting cars and pedestrians in the traffic environment. - Gain experience of implementing neural networks with a popular deep learning framework [**PyTorch**](https://pytorch.org/). - Develop a deep learning system **from scratch**, including **network design, model training, hyperparameter tuning, training visualization, model inference and performance evaluation.** - *Spontaneously and deeply explore advanced solutions for object detection task. (Italics indicate bonus requirements, the same below.)* ### 1.3 What should you prepare for this assignment? - Master the basic use of Python and Pytorch. - Be familiar with the [**ResNet**](https://arxiv.org/abs/1512.03385). - Be familiar with the [**YOLO v1**](https://arxiv.org/abs/1506.02***0). (You can learn about them step by step in the process of completing this assignment.) ## 2. Assignment Tasks - You must fill in the blanks and submit the completed codes for this assignment with the provided codebase. This assignment offers an incomplete codebase for object detection with modified [**YOLO v1**](https://arxiv.org/abs/1506.02***0). + You must submit the test data's model outputs with your completed codes. **Now the test data has been released [here](https://drive.google.com/file/d/1-I1Rp1VrF4S5hx9_X3MUL50C10Ph1Tqe/view).** You should utilize hyperparameter tuning and other widely-used techniques to improve detection performance. - You must submit a maximum of 6-page technique report to explain how you improve the object detection performance. Besides, some visualization results (e.g., loss curve) are encouraged in this report. ```shell # Download this project git clone https://github.com/Liang-ZX/HKU-DASC7606-A1.git cd HKU-DASC7606-A1 ``` ### 2.0 Dataset Preparation The dataset is available [here](https://drive.google.com/file/d/1WhC8AsloaEUipGCQQncYir9Q-Kb9meTC/view?usp=sharing). The dataset is composed of train, val, test parts and their corresponding annotations. Please refer to [src/data/README.md](./src/data/README.md) for the dataset strcuture explanation. For your convenience, you can download the dataset through the following command. ```shell cd src chmod +x data/download2.sh ./data/download2.sh ``` ### 2.1 Submit the Completed Codebase The codebase [src](./src) is organized as followings. ``` └── src ├── data │ ├── dataset.py # dataloader │ └── download.sh # script to download dataset ├── model │ ├── block.py # backbone │ ├── head.py # detection head │ └── hkudetector.py # object detection network ├── utils │ ├── loss.py │ └── util.py ├── train.py # model training ├── predict.py # model inference └── eval.py # performance evaluation ``` #### Task 1: Filling in the Backbone with ResNet You should fill in two blanks in file [src/model/block.py](src/model/block.py) to complete the basic blocks of backbone network, including the block design and forward function. You should try to apply the ResNet into the Backbone. The ResNet architecture is as the following.Figure 2. A Visualization Illustration of Basic Block and Bottleneck Block in ResNet.
#### Task 2: Filling in the Detection Head You should fill in one blank in file [src/model/head.py](src/model/head.py). #### Task 3: Filling in the Network You should fill in three blanks in file [src/model/hkudetector.py](src/model/hkudetector.py). #### Task 4: Filling in the Object Detection Loss Function You should fill in three blanks to complete object detection loss in file [src/utils/loss.py](src/utils/loss.py). The object detection loss includes five terms as following (which is Equation (3) in [YOLO v1](https://arxiv.org/abs/1506.02***0)). You are required to complete two terms of them. $$\begin{aligned} loss&=\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(x_{i}-\hat{x}_{i}\right)^{2}+\left(y_{i}-\hat{y}_{i}\right)^{2}\right]+\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(\sqrt{w_{i}}-\sqrt{\hat{w}_{i}}\right)^{2}+\left(\sqrt{h_{i}}-\sqrt{\hat{h}_{i}}\right)^{2}\right] \\ &+\sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\mathrm{obj}}\left(C_{i}-\hat{C}_{i}\right)^{2}+\lambda_{\text {noobj }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {noobj }}\left(C_{i}-\hat{C}_{i}\right)^{2} \\ &+\sum_{i=0}^{S^{2}} \mathbb{1}_{i}^{\text {obj }} \sum_{c \in \text { classes }}\left(p_{i}(c)-\hat{p}_{i}(c)\right)^{2} \end{aligned}$$ To help you better understand and implement the loss function, we can split this equation into four parts. You will find it helpful when you do the task 4. - The first two terms forms the location loss, + The third term is the IOU loss for boxes containing the objects, - The fourth term represents the loss for boxes not containing the objects (Not IOU or No Objects), + And the last term is the classification loss. **Hint: In implementation, the fourth term is split into two part: (Part I) Not response loss: for the box containing the object but not the one with the maximum iou; (Part II) No onject loss: for the box not containing objects.** #### Task 5: Filling in the Training Pipeline We have implemented dataset preprocessing codes for you. We strongly suggest you to read the [src/data/dataset.py](src/data/dataset.py) carefully before completing the following tasks. You should fill in two blanks to complete the training pipline in file [src/train.py](src/train.py). After completing the training pipeline, you can run with the following command. ```bash python train.py --output_dir 'checkpoints' ``` #### Task 6: Filling in the Prediction Pipeline and Non-Maximum Suppression You should fill in two blanks to complete the prediction pipeline and the NMS in the file [src/utils/util.py](src/utils/util.py). Non Maximum Suppression is a computer vision method to select a single entity from many overlapping entities. For more details, refer to [nms algorithm](https://learnopencv.com/non-maximum-suppression-theory-and-implementation-in-pytorch/). After completing the inference pipeline, you can run in the following command. The visualization result will be saved in vis_results folder. And you should show some of your results in the final report. ```bash python predict.py --image_path "./ass1_dataset/test/image/000001.jpg" --vis_dir "./vis_results" ``` #### Task 7: Filling in the Evaluation Pipeline (To Understand mAP) You should fill in one blank to complete the mAP calculation in file [src/eval.py](src/eval.py). The mAP is one of the most essential evaluation metrics in object detection. For more details, refer to [mAP calculation](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173). After completing the evaluation pipeline, you can run in the following command. **IMPORTANT: You should submit the result.pkl generated in this part. (Submission details are illustrated in Sec 3.1)** ```bash python eval.py --split 'test' --output_file "./result.pkl" ``` **Hint: please pay attention to the threshold set in the evaluation pipeline. We suggest setting** *pos_threshold=0.1*, **and we will perform lower truncation on** *pos_threshold* **(that is, all** *pos_threshold<0.1* **will be truncated to** *0.1*) ### 2.2 Submitting the Model Outputs of the Test Data You should train your model on the **train+val** part, and you should generate and submit the outputs of the test part with your trained detection model. **The test part is released [here](https://drive.google.com/file/d/1-I1Rp1VrF4S5hx9_X3MUL50C10Ph1Tqe/view)**, and you can download using the [src/data/download_test.sh](src/data/download_test.sh). ```bash cd src chmod +x data/download_test.sh ./data/download_test.sh ``` You can generate the final result using the following command. **Please specify the image size here if you change it to a larger scale.** ```bash python for_submit.py --split 'test' --output_file "./result.pkl" --model_path