Machine learning approach for identifying suspicious uniform resource locators (URLs) on Reddit social network

Azeez, Nureni Ayofe; Lawal, Ahmed Oladapo; Misra, Sanjay; Oluranti, Jonathan

dc.contributor.author	Azeez, Nureni Ayofe
dc.contributor.author	Lawal, Ahmed Oladapo
dc.contributor.author	Misra, Sanjay
dc.contributor.author	Oluranti, Jonathan
dc.date.accessioned	2022-01-21T11:18:22Z
dc.date.available	2022-01-21T11:18:22Z
dc.date.created	2021-10-27T13:05:24Z
dc.date.issued	2021
dc.identifier.citation	African Journal of Science, Technology, Innovation and Development. 2021.	en_US
dc.identifier.issn	2042-1338
dc.identifier.uri	https://hdl.handle.net/11250/2838670
dc.description.abstract	The applications and advantages of the Internet for real-time information sharing can never be over-emphasized. These great benefits are too numerous to mention but they are being seriously hampered and made vulnerable due to phishing that is ravaging cyberspace. This development is, undoubtedly, frustrating the efforts of the Global Cyber Alliance – an agency with a singular purpose of reducing cyber risk. Consequently, various researchers have attempted to proffer solutions to phishing. These solutions are considered inefficient and unreliable as evident in the conflicting claims by the authors. Against this backdrop, this work has attempted to find the best approach to solving the challenge of identifying suspicious uniform resource locators (URLs) on Reddit social networks. In an effort to handle this challenge, attempts have been made to address two major problems. The first is how can the suspicious URLs be identified on Reddit social networks with machine learning techniques? And the second is how can internet users be safeguarded from unreliable and fake URLs on the Reddit social network? This work adopted six machine learning algorithms – AdaBoost, Gradient Boost, Random Forest, Linear SVM, Decision Tree, and Naïve Bayes Classifier – for training using features obtained from Reddit social network and for additional processing. A total sum of 532,403 posts were analyzed. At the end of the analysis, only 87,083 posts were considered suitable for training the models. After the experimentation, the best performing algorithm was AdaBoost with an accuracy level of 95.5% and a precision of 97.57%.	en_US
dc.language.iso	eng	en_US
dc.publisher	Taylor & Francis	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.subject	Internet	en_US
dc.subject	machine learning algorithms	en_US
dc.subject	phishing	en_US
dc.subject	Reddit	en_US
dc.subject	uniform resource locators	en_US
dc.title	Machine learning approach for identifying suspicious uniform resource locators (URLs) on Reddit social network	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2021 The Authors.	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551	en_US
dc.source.pagenumber	9	en_US
dc.source.journal	African Journal of Science, Technology, Innovation and Development.	en_US
dc.identifier.doi	10.1080/20421338.2021.1977087
dc.identifier.cristin	1948892
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: MisraMachine2021.pdf
Størrelse:: 834.5Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonsteknologi og kommunikasjon [136]
Enheten inneholder bidrag fra ansatte ved Institutt for informasjonsteknologi og kommunikasjon

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal