Pareto Optimized Large Mask Approach for Efficient and Background Humanoid Shape Removal

Maskeliunas, Rytis; Damaševičius, Robertas; Vitkute-Adzgauskiene, Daiva; Misra, Sanjay

dc.contributor.author	Maskeliunas, Rytis
dc.contributor.author	Damaševičius, Robertas
dc.contributor.author	Vitkute-Adzgauskiene, Daiva
dc.contributor.author	Misra, Sanjay
dc.date.accessioned	2024-02-07T14:03:03Z
dc.date.available	2024-02-07T14:03:03Z
dc.date.created	2023-05-16T09:45:29Z
dc.date.issued	2023
dc.identifier.citation	IEEE Access. 2023, 11, 33900-33914.	en_US
dc.identifier.issn	2169-3536
dc.identifier.uri	https://hdl.handle.net/11250/3116204
dc.description.abstract	The purpose of automated video object removal is to not only detect and remove the object of interest automatically, but also to utilize background context to inpaint the foreground area. Video inpainting requires to fill spatiotemporal gaps in a video with convincing material, necessitating both temporal and spatial consistency; the inpainted part must seamlessly integrate into the background in a variety of scenes, and it must maintain a consistent appearance in subsequent frames even if its surroundings change noticeably. We introduce deep learning-based methodology for removing unwanted human-like shapes in videos. The method uses Pareto-optimized Generative Adversarial Networks (GANs) technology, which is a novel contribution. The system automatically selects the Region of Interest (ROI) for each humanoid shape and uses a skeleton detection module to determine which humanoid shape to retain. The semantic masks of human like shapes are created using a semantic-aware occlusion-robust model that has four primary components: feature extraction, and local, global, and semantic branches. The global branch encodes occlusion-aware information to make the extracted features resistant to occlusion, while the local branch retrieves fine-grained local characteristics. A modified big mask inpainting approach is employed to eliminate a person from the image, leveraging Fast Fourier convolutions and utilizing polygonal chains and rectangles with unpredictable aspect ratios. The inpainter network takes the input image and the mask to create an output image excluding the background humanoid shapes. The generator uses an encoder-decoder structure with included skip connections to recover spatial information and dilated convolution and squeeze and excitation blocks to make the regions behind the humanoid shapes consistent with their surroundings. The discriminator avoids dissimilar structure at the patch scale, and the refiner network catches features around the boundaries of each background humanoid shape. The efficiency was assessed using the Structural Learned Perceptual Image Patch Similarity, Frechet Inception Distance, and Similarity Index Measure metrics and showed promising results in fully automated background person removal task. The method is evaluated on two video object segmentation datasets (DAVIS indicating respective values of 0.02, FID of 5.01 and SSIM of 0.79 and YouTube-VOS, resulting in 0.03, 6.22, 0.78 respectively) as well a database of 66 distinct video sequences of people behind a desk in an office environment (0.02, 4.01, and 0.78 respectively).	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.subject	videos	en_US
dc.subject	shape measurement	en_US
dc.subject	feature extraction	en_US
dc.subject	humanoid robots	en_US
dc.subject	semantics	en_US
dc.subject	computer architecture	en_US
dc.subject	image processing	en_US
dc.subject	semantic segmentation	en_US
dc.subject	occlusion-robust network	en_US
dc.subject	human shape extraction	en_US
dc.subject	background person removal	en_US
dc.subject	image inpainting	en_US
dc.title	Pareto Optimized Large Mask Approach for Efficient and Background Humanoid Shape Removal	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.pagenumber	33900-33914	en_US
dc.source.volume	11	en_US
dc.source.journal	IEEE Access	en_US
dc.identifier.doi	10.1109/ACCESS.2023.3253206
dc.identifier.cristin	2147732
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: MisraPareto2023.pdf
Størrelse:: 1.965Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonsteknologi og kommunikasjon [136]
Enheten inneholder bidrag fra ansatte ved Institutt for informasjonsteknologi og kommunikasjon

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal