University of Tasmania
Wei_whole_thesis.pdf (28.59 MB)

Deep learning for multiple retail products image recognition

Download (28.59 MB)
Version 2 2024-03-20, 06:36
Version 1 2023-05-27, 19:05
posted on 2024-03-20, 06:36 authored by Yuchen Wei

Taking time to identify expected products and waiting for the check-out in a retail store are common scenes we all encounter in our daily lives. The realisation of automatic product recognition has greatly impacted both commercial and social progress as it is more reliable than manual operation and time-saving. Product recognition via images is a challenging task in the field of computer vision and receives increasing consideration due to the great application prospect, such as automatic check-out, stock tracking, planogram compliance, and visually impaired assistance. In recent years, deep learning has enjoyed a flourishing evolution with tremendous achievements in image classification and object detection. However, the deep learning-based computer vision methods for retail product recognition tasks still lack extensive and in-depth research. This research aims to improve the performance of retail product recognition for the classification and detection tasks.

At first, the multi-angle GAN (MAGAN) for data augmentation is proposed. The MAGAN can be used to produce new single product images with different perspectives to enlarge the training dataset for grocery product recognition. The experimental results on two benchmark datasets (Fruit-360 and Grocery-30) demonstrate that MAGAN outperforms the existing GANs and offers a remarkable recognition result with less training data.

After that, a method that can combine several single-product images into one image is applied. The current image synthesis methods can easily result in a number of objects being completely hidden, as the occlusion correlation between items is not considered. A new instance placement algorithm, named OCA, is developed, which calculates the total hiding area of each item when synthesising check-out images.

The proposed image synthesis approach ensures that no single product is severely obscured in the training images. The current application of CycleGAN for retail product domain adoption does not support the one-to-many translation. This means that we cannot translate the same synthesised check-out image to different domains with corresponding lighting conditions. A novel Image-to-Image translation model (DL-CycleGAN) based on disentangled representation is proposed, which can map a synthetic check-out image to multiple outputs with continuously changing lighting conditions. This model helps reduce the data distribution gap between the training and test sets. Finally, the experimental results indicate that the translated dataset becomes more effective for training the product detection networks.

This thesis reviews the critical challenges of deep learning for retail product recognition and discusses potential techniques that can be helpful for the research of the topic. The MAGAN is proposed for product classification tasks, while OCA and DL-CyclGAN are developed for the multiple products detection tasks in the automatic check-out scenario. This research provides new approaches for improving retail product recognition performance.



  • PhD Thesis


xi, xiii, 124 pages


School of Information and Communication Technology


University of Tasmania

Publication status

  • Unpublished

Event title


Date of Event (Start Date)


Rights statement

Copyright 2022 the author.


Chapter 2 appears to be the equivalent of a post-print version of an article published as: Wei, Y., Tran, S., Xu, S., Kang, B., Springer, M., Deep learning for retail product recognition: Challenges and techniques, Computational intelligence and neuroscience, 2020, 8875910. Copyright © 2020 Yuchen Wei et al. It is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Chapter 3 appears to be, in part, the equivalent of a pre-print version of an article published as: Wei, Y., Tran, S., Xu, S., Kang, B., 2020. Data augmentation with generative adversarial networks for grocery product image recognition, 16th International Conference on Control, Automation, Robotics and Vision, Shenzhen, China, 2020. pp. 963-968, doi: 10.1109/ICARCV50220.2020.9305421. Chapter 3 also appears to be, in part, the equivalent of a post-print version of an article published as: Wei, Y., Xu, S., Kang, B., Hoque, S., 2022. Generating training images with different angles by GAN for improving grocery product image recognition, Neurocomputing, 488, 694-705.

Usage metrics

    Thesis collection


    No categories selected


    Ref. manager