Legal Document Classification Using Generative AI
A Retrieval-Augmented Generation (RAG) and Gemini Approach
DOI:
https://doi.org/10.20873/uft.2675-3588.2026.v7n3.p19-24Keywords:
Legal AI, RAG, LLMs, Text Classification, Gemini, Data AugmentationAbstract
The Brazilian Judiciary faces a critical challenge regarding the massive volume of digital lawsuits, making manual screening costly and error-prone. This work investigates the application of Generative Artificial Intelligence to automate the classification of legal petitions using Large Language Models (LLMs). The research presents a methodological evolution in three stages: (i) an initial approach based on few-shot learning, establishing a 56% accuracy baseline; (ii) refinement through prompt engineering with N-grams and Data Augmentation techniques to address class imbalance, which achieved 85% accuracy; and (iii) the implementation of a Retrieval-Augmented Generation (RAG) architecture, connecting Google's Gemini 2.5 model to a vector knowledge base, which achieved 84% accuracy. The experiments utilized real datasets from the Court of Justice of Tocantins (TJTO), covering themes from Superior Courts (STF/STJ). Final results demonstrate that the RAG approach achieved 84% accuracy in a complex scenario of 11 thematic classes, effectively mitigating hallucinations and semantic ambiguities found in previous stages.
Downloads
Published
How to Cite
License
Copyright (c) 2026 Ruan Dias Santana, Marcelo Lisboa Rocha

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish in this journal agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication, with work simultaneously licensed under the Creative Commons Attribution License (CC BY-NC 4.0), allowing work sharing with acknowledgment of the work's authorship and initial publication in this journal. ;
- Authors are authorized to enter additional contracts separately for the non-exclusive distribution of the version of the work published in this journal (eg, publishing in an institutional repository or as a book chapter), with acknowledgment of authorship and initial publication in this journal;
- Authors are allowed and encouraged to post and distribute their work online (eg, in institutional repositories or on their personal page) at any point after the editorial process;
- In addition, the AUTHOR is informed and agrees with the journal that, therefore, his paper may be incorporated by the AJCEAM into existing or existing scientific information systems and databases (indexers and databases). in the future (indexers and future databases), under the conditions defined by the latter at all times, which will involve at least the possibility that the holders of these databases may perform the following actions on the paper:
- Reproduce, transmit and distribute the paper in whole or in part in any form or means of existing or future electronic transmission, including electronic transmission for research, viewing and printing purposes;
- Reproduce and distribute all or part of the article in print;
- Translate certain parts of the paper;
- Extract figures, tables, illustrations, and other graphic objects and capture metadata, captions, and related article for research, visualization, and printing purposes;
- Transmission, distribution, and reproduction by agents or authorized by the owners of database distributors;
- The preparation of bibliographic citations, summaries and indexes and related capture references from selected parts of the paper;
- Scan and/or store electronic article images and text.

