City-scale aerial image recording and its AI application:
A case study of identifying the bird habitats in Lancheng

Abstract
The Puli Lancheng area has been extensively cultivated with sugarcane in the past and serves as an important habitat for migratory birds.
Bird habitats play a crucial role in biodiversity and contribute to ecological balance.
This thesis focuses on long-term aerial photography using unmanned aerial vehicle at the urban scale to document habitat changes in this region.
Additionally, AI deep learning techniques are employed for automatic identification of bird habitats.
This thesis utilizes semantic segmentation techniques of deep learning to automate the recognition of different crop types, as well as important areas such as greenhouses, buildings, and roads in the images.
The classification labels are divided into six major categories, and training and testing are conducted using datasets from various aerial locations, resulting in an average accuracy of 85.45%.
Furthermore, the classification labels are further subdivided into thirteen subcategories, achieving an average accuracy of 65.54%.
Key word:
- Drone
- Semantic segmentation
- Deep learning
- Bird ecological habitat
What is FARMid?



The research site is located in Lancheng, Puli Town, Nantou County. We not only provide aerial images of the research field, but also provide label files of the field
What are the categories?

There are 6 major categories in total:
- Building
- Crop
- Tree
- Farm
- Land
- Road
Furthermore, the classification labels are further subdivided into 13 subcategories:
- GreenHouse
- House
- Saccgurm
- PassionFruit
- Clump
- WaterBamboo
- Banana
- Unknow
- Tree
- UndevelopedFarmland
- Tarp
- GrassLand
- Road













- People
- Contact
![]() |
![]() |
Hong-Shiun Ye |
Dr. Zhen-Chang Liu |

- Semantic Labelling & Aerial Image
If you download the dataset, lease cite: 城市尺度下空拍紀錄與AI應用:以籃城鳥類生態棲息地辨識為例
All Labels are provided with the corresponding numbers. Image and label files are named according to the date index in the file.
Download: FarmID_Dataset.zip (5.35G)
The FARMid dataset provides images and labels.
In the thesis, we partition the dataset as follows.
Training images: folders named 221018L (276 images), 221018Z (297 images)
Image resolution: 4056x3040 (taken by DJI Mavic Air)
Testing images: folder named 221018R (117 images)
Images are labeled using the Matlab Image Labeler tool.
Label image has pixel values ranging from 1 to 13, corresponding to the 13 fine classes. Pixel value 0 corresponds to an unlabeled pixel.
Import Label_Definition.mat for label definitions. Or refer to the following figure.
- Semantic Labelling
- Evaluation Metric
The task for FARMid dataset is to predict per-pixel semantic labelling for the aerial images.
The semantic labelling performance is assessed based on the Mean Pixel Accuracy (MPA) and Mean Intersection over Union (MIoU).
$$IoU = {TP \over TP+FP+FN}.$$
TP, TN, FP and FN are the numbers of true positive, true negative, false positive and false negative respectively,
which can be calculated through the confusion matrix determined over all data from test split.
The goal for this task is to achieve as high MAP and IoU score as possible.
For FARMid dataset, Tree class, Land class and road class has a relatively large pixel number ratio and consists of meaningful objects,
which is taken as one class for both training and evaluation rather than being ignored.