Latest Trends in Twitter from Arab Countries and the World
Abstract
1. Introduction: Twitter is the micro blogging social media platform which has at most variety of content. The open access to Twitter data with the usage of Twitter APIs has made it a important area of research. Twitter has a useful feature called "Trends" which displays Hot topics or trending information differing for every location. This trending information is evolved by tweets being shared on Twitter in a particular location. But Twitter limits the trending information to current tweets, as the algorithm for finding trends is concentrated on generating trends in real time, rather than trends summarization of hot topics on daily basis. Thus a clear summarization of the contemporary trending information is missing and is much needed. Latest Twitter Trends - our application discussed in this paper, is built to get the aggregate of hot topics on Twitter for Arab Countries and the World. This is a real time application with summarization of Hot topics over time. It enables users to study the summarization of twitter trends by location with the help of a Word Cloud. The tool also enables the user to click on the particular trend, which will allow the user to navigate and search through Twitter Search - also in real time. This tool also overcomes a drawback of Twitter trending information, in addition to the Twitter trend algorithm. The trends differ for different languages in different locations and are often mixed. For eg, if #Eid-ul-Adha is trending in Arab countries, ??? ??????# is also trending. This application focuses on consolidating the trends in Arabic and English, which have the same meaning and display only one trending topic, instead of two same topics in different language. This application also gives an estimation of the different kind of Twitter users, analyzing the percentage of tweets made by Male and Female in that location. 2. Trends data gathering: Twitter APIs give developers access to the real time data comprising of tweets & trends. The Twitter REST API is used by the tool - Latest Twitter Trends, to connect and get trending data from Twitter. The API is used to authenticate and establish connection with Twitter and also returns Twitter trending data in JSON format. Python programming language is used to write scripts to gather data from Twitter. A Data Crawling Script is developed for connecting with Twitter API by authenticating the credentials generated by twitter on creating an application from the app.twitter.com. The Customer Key, Customer Secret, Access Token, Access Token Secret are the credentials used to perform authentication by Twitter. The data returned by Twitter is in JSON (JavaScript Object Notation) format and the Python Data Crawling Script is commanded to handle the JSON files and create a CSV database. This High Level Gathering of Data comprises of the following: Python data crawling script connects and authenticates with Twitter API and gets trending places data in JSON format from Twitter. The data in JSON format is stored in to our tool database as a CSV file. The Twitter data gathered is all the trending location/places with the WOIED (Where On Earth ID). The WOEID is used as a key to get Twitter trending topics location by location - in real time using the Twitter REST API. The trends for every location are also returned to the tool in JSON format, which is again changed converted to CSV for saving in the tool database. This CSV file for Trends is appended every time a new trending data is collected from twitter. Another CSV file is maintained in the Database which holds only the current information for all trending places - for later use. Natural language processing is done on trends by location CSV data, dictionary, to consolidate and consider Arabic and English Trending topics as one. The results are stored in CSV file and will be used for the hot topic identification. 3. Hot topic identification: After the High level Data Gathering, CSV files containing data are used as a Database for generating Word Cloud using D3.js. This trending data is processed by calculating the number of occurrences to give an estimate of which trending topic was trending for a long time. The frequency is taken as the count value for trending topic and a word cloud is generated. This algorithm for calculating frequency is a python script, written mainly for Word Cloud Data Crawling. This word cloud data crawling script takes the Trends by Location data as input and generates a huge database of trends by cities in JSON files. This word cloud crawling script gives output in the JSON files to be stored with key as the trend topic and value as the frequency of the trend occurrence. 4. Architecture: Figure 1: Latest Trends in Twitter Application Architecture The python scripts for data crawling and word cloud crawling are sued to connect with Twitter, gather data, process and store in a database. The D3.js and Google fusion table API are used for displaying the application results. Google Fusion Table API is used to create a Map containing current trends by location - geo-tagged on the map. Java program is used as a dedicated project to connect & authenticate with Google API and clear old fusion table data to import new updated rows in to the Google Fusion Table. Python script Tagcloud.py is used to generate cities. JSON with trending topics from the Trends.csv file. These files from the database for generating word cloud using D3.js, individually for every city/location. Fusion table is used to visualize the trending information from Twitter. A java program along with Google API is used to authenticate and connect. Also to delete previous information in fusion table and update/import new records of data. 5. Results: The data crawling script establishes connection with Twitter and returns a JSON format as in Fig. 2. This data is processed and saved as a CSV in to our application database for later use. Figure 2: Trends Data Output from Twitter in JSON format The word cloud crawling script generates key value pairs of processed trending data from the database. The key containing trending topic and the value containing the frequency of the trending topic's occurrence. The Fig. 3 displays the JSON dataset used for generating word cloud. Figure 3: JSON data of the processed trending data The word cloud is generated using the D3.js library and is used to display summarized trending data to the user. Figure 4 shows the word cloud result for London country. Figure 4: Word cloud for trending data.
DOI/handle
http://hdl.handle.net/10576/28232Collections
- Computer Science & Engineering [2402 items ]