Vis enkel innførsel

dc.contributor.authorMaskeliunas, Rytis
dc.contributor.authorDamaševičius, Robertas
dc.contributor.authorVitkute-Adzgauskiene, Daiva
dc.contributor.authorMisra, Sanjay
dc.date.accessioned2024-02-07T14:03:03Z
dc.date.available2024-02-07T14:03:03Z
dc.date.created2023-05-16T09:45:29Z
dc.date.issued2023
dc.identifier.citationIEEE Access. 2023, 11, 33900-33914.en_US
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/11250/3116204
dc.description.abstractThe purpose of automated video object removal is to not only detect and remove the object of interest automatically, but also to utilize background context to inpaint the foreground area. Video inpainting requires to fill spatiotemporal gaps in a video with convincing material, necessitating both temporal and spatial consistency; the inpainted part must seamlessly integrate into the background in a variety of scenes, and it must maintain a consistent appearance in subsequent frames even if its surroundings change noticeably. We introduce deep learning-based methodology for removing unwanted human-like shapes in videos. The method uses Pareto-optimized Generative Adversarial Networks (GANs) technology, which is a novel contribution. The system automatically selects the Region of Interest (ROI) for each humanoid shape and uses a skeleton detection module to determine which humanoid shape to retain. The semantic masks of human like shapes are created using a semantic-aware occlusion-robust model that has four primary components: feature extraction, and local, global, and semantic branches. The global branch encodes occlusion-aware information to make the extracted features resistant to occlusion, while the local branch retrieves fine-grained local characteristics. A modified big mask inpainting approach is employed to eliminate a person from the image, leveraging Fast Fourier convolutions and utilizing polygonal chains and rectangles with unpredictable aspect ratios. The inpainter network takes the input image and the mask to create an output image excluding the background humanoid shapes. The generator uses an encoder-decoder structure with included skip connections to recover spatial information and dilated convolution and squeeze and excitation blocks to make the regions behind the humanoid shapes consistent with their surroundings. The discriminator avoids dissimilar structure at the patch scale, and the refiner network catches features around the boundaries of each background humanoid shape. The efficiency was assessed using the Structural Learned Perceptual Image Patch Similarity, Frechet Inception Distance, and Similarity Index Measure metrics and showed promising results in fully automated background person removal task. The method is evaluated on two video object segmentation datasets (DAVIS indicating respective values of 0.02, FID of 5.01 and SSIM of 0.79 and YouTube-VOS, resulting in 0.03, 6.22, 0.78 respectively) as well a database of 66 distinct video sequences of people behind a desk in an office environment (0.02, 4.01, and 0.78 respectively).en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.subjectvideosen_US
dc.subjectshape measurementen_US
dc.subjectfeature extractionen_US
dc.subjecthumanoid robotsen_US
dc.subjectsemanticsen_US
dc.subjectcomputer architectureen_US
dc.subjectimage processingen_US
dc.subjectsemantic segmentationen_US
dc.subjectocclusion-robust networken_US
dc.subjecthuman shape extractionen_US
dc.subjectbackground person removalen_US
dc.subjectimage inpaintingen_US
dc.titlePareto Optimized Large Mask Approach for Efficient and Background Humanoid Shape Removalen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.subject.nsiVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550en_US
dc.source.pagenumber33900-33914en_US
dc.source.volume11en_US
dc.source.journalIEEE Accessen_US
dc.identifier.doi10.1109/ACCESS.2023.3253206
dc.identifier.cristin2147732
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal