Generative AI as Tax Attorneys: Exploring Legal Understanding Through Experiments
Purpose: The purpose of the research presented in this article is to assess LLM models' ability to understand the law's language and legal reasoning for tax law. The choice of the tax law was dictated by the universality of its application (the easily accessible large corpus of tests) and the fact that in the case of European Union member states (such as Poland), this law is partially harmonised. These circumstances make it possible to reduce one of the indicated barriers to applying LLM in law (the multilingual and multicultural nature of the law). The latest GPT o1 - preview model from OpenAI was used for the study. The premiere of this model took place on 12.09.2024. It is a multilingual model with a universal rather than specialised nature, which would be explicitly trained for tax law use. Design/Methodology/Approach: The research used an experimental method in which selected GPT models simulated the responses of a tax law expert. The research used two GPT models by OpenAI: GPT - 4 (available 14.03.2023) and GPT o1-preview (available 12.09.2024). The method used is an extension of the Turing Test concept (Turing, 1950), in which the AI model is intended to mimic human communication by assessing the ability to think logically, be creative and understand the context. Four research experiments were conducted. The first experiment assessed the LLM's understanding of the language of law, the second and third assessed the LLM's understanding of legal language, and the fourth assessed the LLM's legal reasoning skills. Findings: The obtained results of the conducted research about LLM models allow the following conclusions to be formulated for Polish tax law: 1) the quality of understanding and legal reasoning of the GPT models is such that the models help support the work of professional tax advisors; 2) the accuracy of the legal advice provided by the GPT o1 - preview model is too low for the model to be used to provide legal advice on its own; 3) the GPT o1 - preview model can predict the position of the NRAIC for a given factual situation with high probability; 4) the GPT o1 - preview model has legal reasoning skills at the level of a professional lawyer except the ability to analyse court decisions and PTRs; 5) in the case of court decisions and PTRs analysis, there was a solid hallucinatory effect in the conducted studies, which affected 50% of the analysed cases.; 6) the quality of the LLM's understanding and reasoning was significantly influenced by the size of the training set and the number of domain-specific questions asked. Practical Implications: The research results have significant practical implications. They indicate the applicability of GPT models for tax attorneys and identify the main barriers to their practical application. The article also shows how to improve the accuracy of the models and significantly reduce the hallucinatory effect. The practical implementation of the presented research results may significantly affect the labour market for lawyers dealing with Polish and European tax law. Originality/Value: The research presents an original method of assessing the quality of legal reasoning based on the Quality of Legal Reasoning Indicator. It is also the first of its kind concerning Polish tax law. Therefore, it makes an important contribution to the development of Generative AI in the field of law, especially tax law.