• English
    • العربية
  • العربية
  • Login
  • QU
  • QU Library
  •  Home
  • Communities & Collections
  • Help
    • Item Submission
    • Publisher policies
    • User guides
    • FAQs
  • About QSpace
    • Vision & Mission
View Item 
  •   Qatar University Digital Hub
  • Qatar University Institutional Repository
  • Academic
  • Student Thesis & Dissertations
  • College of Engineering
  • Computing
  • View Item
  • Qatar University Digital Hub
  • Qatar University Institutional Repository
  • Academic
  • Student Thesis & Dissertations
  • College of Engineering
  • Computing
  • View Item
  •      
  •  
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    AUTOMATING INFORMATION EXTRACTION FROM PEROVSKITE SOLAR CELLS LITERATURE USING LARGE LANGUAGE MODELS

    Thumbnail
    View/Open
    Radwa Gad_ OGS Approved Thesis.pdf (3.075Mb)
    Date
    2025-06
    Author
    GAD, RADWA ESSAM
    Metadata
    Show full item record
    Abstract
    With the rapid advancement of perovskite solar cells (PSCs) research, efficiently extracting structured data from scientific literature has become essential for accelerating materials discovery and development. PSCs studies often report multiple device configurations within a single paper, making traditional single-device extraction approaches insufficient. In this thesis, we are the first to propose an automated information extraction pipeline that leverages Large Language models (LLMs) to extract structured attributes for all reported devices in PSCs research papers. Our experiments utilize open-source and closed-source LLMs, including GPT-4o-mini, LLaMA 3.1 70B, and Qwen 2.5 72B, ensuring a comprehensive evaluation across various model architectures. Additionally, we introduce the first multi-device evaluation framework using an optimization-based matching algorithm. We also define a wide range of PSC-specific attributes, carefully selected to enhance the practical utility of the extracted dataset for researchers. Our experimental results demonstrate that the proposed pipeline outperforms existing approaches, achieving a champion-device extraction F1 score of 90.06%, F1 score of 78.70% for multi-device extraction, and the best F1 score of 90.98% for the best device in multi-device extraction. These findings highlight the effectiveness of our approach in delivering a scalable, reproducible, and efficient solution for automating structured information extraction from PSCs literature.
    DOI/handle
    http://hdl.handle.net/10576/66442
    Collections
    • Computing [‎110‎ items ]

    entitlement


    Qatar University Digital Hub is a digital collection operated and maintained by the Qatar University Library and supported by the ITS department

    Contact Us | Send Feedback
    Contact Us | Send Feedback | QU

     

     

    Home

    Submit your QU affiliated work

    Browse

    All of Digital Hub
      Communities & Collections Publication Date Author Title Subject Type Language Publisher
    This Collection
      Publication Date Author Title Subject Type Language Publisher

    My Account

    Login

    Statistics

    View Usage Statistics

    About QSpace

    Vision & Mission

    Help

    Item Submission Publisher policiesUser guides FAQs

    Qatar University Digital Hub is a digital collection operated and maintained by the Qatar University Library and supported by the ITS department

    Contact Us | Send Feedback
    Contact Us | Send Feedback | QU

     

     

    Video