Machine learning approaches for fake online reviews detection
Online reviews have a substantial impact on decision making in various areas of society, predominantly in the arena of buying and selling of goods. The truthfulness of online reviews is critical for both consumers and vendors. Genuine reviews can lead to satisfied customers and success for quality businesses, whereas fake reviews can mislead innocent clients, influence customers' choices owing to false descriptions and inaccurate sales. Therefore, there is a need for efficient fake review detection models and tools that can help distinguish between fraudulent and legitimate reviews to protect the ecosystem of e-commerce sites, consumers, and companies from these misleading fake reviews.
Although several fake review detection models have been proposed in the literature, there are still some challenges regarding the performance of these models that need to be addressed. For instance, existing studies are highly dependent on linguistic features, but these are not enough to capture the semantic meaning of the reviews, necessary for improving prediction performance. Furthermore, when analysing a fake review text data stream, concept drift may occur where the input and output relationship of text changes over time, affecting model performance. The concept drift problem and its performance impact on fake review detection models has yet to be addressed. Moreover, existing models have only focused upon the review text, reviewer-based features, or both. They have also used a limited number of behavioural and content features that reduce the accuracy of the detection model. Further, these models have used fine-grained features (e.g., words) or coarse-grained features (e.g., documents, sentences, topics) separately, reducing the detection models' performance.
To address the above discussed research gaps, this research investigates and analyses the performance of different neural network models and advanced pre-trained models for fake review detection. Then, we propose an ensemble model that combines three transformer models to detect fake reviews on semi-real-world datasets. This research also investigates the concept drift problem by detecting the occurrence of concept drift within text data streams of fake reviews, finding a correlation between concept drift (if it exists) and the performance of detection models over time in the real-world data stream. Furthermore, this research proposes a new Interpretable ensemble of multi-view deep learning model (EEMVDLM) that can detect fake reviews based on different feature perspectives and classifiers and provide interpretability of deep learning models. This ensemble model comprises three popular machine learning models: bidirectional long-short-term-memory (Bi-LSTM), convolutional neural network (CNN), and deep neural network (DNN). Additionally, this model provides interpretations to achieve reliable results from deep learning models, which are usually considered as \Black Boxes\". For this purpose the shapley additive explanations (SHAP) method and attention mechanism are used to understand the underlying logic of a model and provide other hints to determine whether it is "unfair".
In summary this research provides the following contributions: (1) Comprehensive survey that analyses the task of fake review detection by providing the existing approaches existing feature extraction techniques challenges and available datasets. (2) This research investigates and analyses the performance of different neural network models and transformers to demonstrate their effect on fake review detection. Experimental results show that the transformer models perform well with a small dataset for fake review detection. Specifically the robustly optimised BERT pretraining approach (RoBERTa) achieves the highest accuracy. (3) This research proposes an ensemble of three transformer models to discover the hidden patterns in a sequence of fake reviews and detect them precisely. Experimental results on two semi-real datasets show that the proposed model outperforms the state-of-the-art methods. (4) This research provides an in-depth analysis for detecting concept drift within fake review data streams by using two methods: benchmarking concept drift detection methods and contentbased classification methods. The results demonstrate that there is a strong negative correlation between concept drift and the performance of fake review detection/prediction models which indicates the difficulty of building more efficient models. (5) This research proposes a new Interpretable ensemble of multi-view deep learning model (EEMVDLM) that can detect fake reviews based on different feature perspectives and classifiers. The experimental results on two real-life datasets present excellent performance and outperformed the state-of-the-art methods. Further, the experimental results prove that our proposed model can provide reasonable interpretations that help users understand why certain reviews are classified as fake or genuine. To the best of our knowledge this research provides the first study that incorporates advanced pre-trained models investigates the concept drift problem and the first Interpretable fake review detection approach. The findings of this thesis contribute to both practical and theoretical applications.
History
Sub-type
- PhD Thesis
Pagination
xviii, 216 pagesDepartment/School
School of Information and Communication TechnologyPublisher
University of TasmaniaPublication status
- Unpublished