Please use this identifier to cite or link to this item: http://hdl.handle.net/2067/46606
Title: Topic Modeling by Community Detection Algorithms
Authors: Amati, Giambattista
Angelini, Simone
Cruciani, Antonio
Fusco, Gianmarco
Gaudino, Giancarlo
Pasquini, Daniele
Vocca, Paola 
Issue Date: 2021
Abstract: 
We first estimate the number of Italian users active on Twitter in the last year by filtering the Italian flow of Twitter. We show that our filter misses about the 6.86% of the Italian flow, while 86.80% of the selected tweets belongs to the Italian language. Given this accuracy of the Italian Twitter's Firehose filter, we are able to assess the actual number of the Italian active users (AUs) of this platform. We then introduce a massive text document clustering algorithm that is easily applicable and scalable to the Twitter social network. Instead of a topic modeling approach based on features selection and any conventional clustering algorithm, such as LDA, we apply community detection algorithms on the weighted hashtag graph . In order to scale with the graph size, we apply two linear community detection algorithms, CoDA and Louvain. Once the hashtags have been assigned to clusters, both the most numerous clusters and hashtags were associated with topics of general interest, such as sports, politics, health etc. In this way we are able to provide significant statistics of the topics covered on Twitter in the past year.
URI: http://hdl.handle.net/2067/46606
ISBN: 9781450386326
DOI: 10.1145/3472720.3483622
Appears in Collections:D1. Contributo in Atti di convegno

Files in This Item:
File Description SizeFormat Existing users please
HT'21.pdf428.22 kBAdobe PDF    Request a copy
Show full item record

SCOPUSTM   
Citations 20

3
Last Week
0
Last month
0
checked on Apr 17, 2024

Page view(s)

69
Last Week
0
Last month
0
checked on Apr 24, 2024

Download(s)

5
checked on Apr 24, 2024

Google ScholarTM

Check

Altmetric


All documents in the "Unitus Open Access" community are published as open access.
All documents in the community "Prodotti della Ricerca" are restricted access unless otherwise indicated for specific documents