• English
    • العربية
  • العربية
  • Login
  • QU
  • QU Library
  •  Home
  • Communities & Collections
  • Help
    • Item Submission
    • Publisher policies
    • User guides
    • FAQs
  • About QSpace
    • Vision & Mission
View Item 
  •   Qatar University Digital Hub
  • Qatar University Institutional Repository
  • Academic
  • Faculty Contributions
  • College of Engineering
  • Computer Science & Engineering
  • View Item
  • Qatar University Digital Hub
  • Qatar University Institutional Repository
  • Academic
  • Faculty Contributions
  • College of Engineering
  • Computer Science & Engineering
  • View Item
  •      
  •  
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Effective Realtime Tweet Summarization

    Thumbnail
    View/Open
    qfarc.2018.ICTPD1061.pdf (179.2Kb)
    Date
    2018
    Author
    Suwaileh, Reem
    Elsayed, Tamer
    Metadata
    Show full item record
    Abstract
    Twitter has been developed as an immense information creation and sharing network through which users post information. Information could vary from the world»s breaking news to other topics such as sports, science, religion, and even personal daily updates. Although a user would regularly check her Twitter timeline to stay up-to-date on her topics of interest, it is impossible to cope with manual tracking of those topics while tackling the challenges that emerge from the Twitter timeline nature. Among these challenges are the big volume of posted tweets (about 500M tweets are posted daily), noise (e.g., spam), redundant information (e.g., tweets of similar content), and the rapid development of topics over time. This necessitates the development of real-time summarization systems (RTS) that automatically track a set of predefined interest profiles (representing the users» topics of interest) and summarize the stream while considering the relevance, novelty, and freshness of the selected tweets. For instance, if a user is interested in following the updates on the “GCC crises», the system should efficiently monitor the stream and capture the on-topic tweets including all aspects of the topic (e.g., official statements, interviews and new claims against Qatar) which change over time. Accordingly, real-time summarization approaches should use simple and efficient approaches that can scale to follow multiple interest profiles simultaneously. In this work, we tackle such problem by proposing RTS system that adopts a lightweight and conservative filtering strategy. Given a set of user interest profiles, the system tracks those profiles over Twitter continuous live stream in a scalable manner in a pipeline of multiple phases including pre-qualification, preprocessing, indexing, relevance filtering, novelty filtering, and tweets nomination. In prequalification phase, the system filters out non-English and low-quality tweets (i.e., tweets that are too short or including many hashtags). Once a tweet is qualified, the system preprocesses it in a series of steps (e.g., removing special characters) that aim at preparing the tweet for relevance and novelty filters. The system adopts a vector space model where both interest profiles and incoming tweets are represented as vectors constructed using idf-based term weighting. An incoming tweet is scored for relevance against the interest profiles using the standard cosine similarity. If the relevance score of a tweet exceeds a predefined threshold, the system adds the tweet to the potentially-relevant tweets for the corresponding profile. The system then measures the novelty of the potentially-relevant tweet by computing its lexical overlap with the already-pushed tweets using a modified version of Jaccard similarity. A tweet is considered novel if the overlap does not exceed a predefined threshold. This way the system does not overwhelm the user with redundant notifications. Finally, the list of potentially-relevant and novel tweets of each profile is re-ranked periodically based on both relevance and freshness and the top tweet is then pushed to the user; that ensures the user will not be overwhelmed with excessive notifications while getting fresh updates. The system also allows the expansion of the profiles over time (by automatically adding potentially-relevant terms) and the dynamic change of the thresholds to adapt to the change in the topics over time. We conducted extensive experiments over multiple standard test collections that are specifically developed to evaluate RTS systems. Our live experiments on tracking more than 50 topics over a large stream of tweets lasted for 10 days show both effectiveness and scalability of our system. Indeed, our system exhibited the best performance among 19 international research teams from all over the world in a research track organized by NIST institute (in the United States) last year.
    URI
    https://doi.org/10.5339/qfarc.2018.ICTPD1061
    DOI/handle
    http://hdl.handle.net/10576/31490
    Collections
    • Computer Science & Engineering [‎2428‎ items ]

    entitlement


    Qatar University Digital Hub is a digital collection operated and maintained by the Qatar University Library and supported by the ITS department

    Contact Us | Send Feedback
    Contact Us | Send Feedback | QU

     

     

    Home

    Submit your QU affiliated work

    Browse

    All of Digital Hub
      Communities & Collections Publication Date Author Title Subject Type Language Publisher
    This Collection
      Publication Date Author Title Subject Type Language Publisher

    My Account

    Login

    Statistics

    View Usage Statistics

    About QSpace

    Vision & Mission

    Help

    Item Submission Publisher policiesUser guides FAQs

    Qatar University Digital Hub is a digital collection operated and maintained by the Qatar University Library and supported by the ITS department

    Contact Us | Send Feedback
    Contact Us | Send Feedback | QU

     

     

    Video