Legal Document Classification Using Generative AI

A Retrieval-Augmented Generation (RAG) and Gemini Approach

Authors

DOI:

https://doi.org/10.20873/uft.2675-3588.2026.v7n3.p19-24

Keywords:

Legal AI, RAG, LLMs, Text Classification, Gemini, Data Augmentation

Abstract

The Brazilian Judiciary faces a critical challenge regarding the massive volume of digital lawsuits, making manual screening costly and error-prone. This work investigates the application of Generative Artificial Intelligence to automate the classification of legal petitions using Large Language Models (LLMs). The research presents a methodological evolution in three stages: (i) an initial approach based on few-shot learning, establishing a 56% accuracy baseline; (ii) refinement through prompt engineering with N-grams and Data Augmentation techniques to address class imbalance, which achieved 85% accuracy; and (iii) the implementation of a Retrieval-Augmented Generation (RAG) architecture, connecting Google's Gemini 2.5 model to a vector knowledge base, which achieved 84% accuracy. The experiments utilized real datasets from the Court of Justice of Tocantins (TJTO), covering themes from Superior Courts (STF/STJ). Final results demonstrate that the RAG approach achieved 84% accuracy in a complex scenario of 11 thematic classes, effectively mitigating hallucinations and semantic ambiguities found in previous stages.

Published

2026-05-02

How to Cite

[1]
Dias Santana, R. and Rocha, M.L. 2026. Legal Document Classification Using Generative AI: A Retrieval-Augmented Generation (RAG) and Gemini Approach. Academic Journal on Computing, Engineering and Applied Mathematics. 7, 3 (May 2026), 19–24. DOI:https://doi.org/10.20873/uft.2675-3588.2026.v7n3.p19-24.

Issue

Section

Research Papers

Categories

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.