Hierarchical deep learning approach using fusion layer for Source Camera Model Identification based on video taken by smartphone

Younes, Akbari; Al Maadeed, Somaya; Elharrouss, Omar; Ottakath, Najmath; Khelifi, Fouad

View/Open

Check access options

Hierarchical deep learning approach using fusion layer for Source Camera Model Identification based on video taken by smartphone.pdf (3.014Mb)

Date

2024-03-15

Author

Younes, Akbari
Al Maadeed, Somaya
Elharrouss, Omar
Ottakath, Najmath
Khelifi, Fouad

Metadata

Show full item record

Abstract

Over the last decade, videos uploaded and shared through web-based multimedia platforms and mobile applications have proliferated worldwide. This is because cloud-based applications such as iCloud, YouTube, Facebook, Twitter, and WhatsApp offer affordable and secure environments for video storage and sharing. However, new challenges have emerged alarming forensic analysts and investigators since videos can be used to commit heinous crimes such as blackmail, fraud, and forgery. Source Camera Identification (SCI) has become of paramount importance in the field of image and video forensics. Camera model identification can also help identify the perpetrators or narrow down the search and can be used to enhance SCI systems. In this context, existing approaches such as the Photo Response Non-Uniformity (PRNU) based methods and machine learning techniques such as the support vector machine (SVM) and deep learning models are commonly used solutions. This work exploits these two categories of methods by exploring a hierarchical deep learning model for camera model identification based on smartphone videos. The PRNU features are extracted by CNN-based structures during the training process. Proposed six-stream networks are leveraged to extract both low-level and high-level features through the network. A fusion layer is created based on joint sparse representation using forward and backward functions defined for fusing the proposed six streams. The proposed approach has been implemented and evaluated through intensive experiments, and results showed successful camera model identification with a performance at the frame level reaching an average accuracy of 69.9% for the Daxing dataset and 81.6% for the QUFVD dataset.