Litig AI Benchmark Initiative - Transparency Statement

 

Legal AI Transparency Statement – [Insert Name of the Product and Provider]

Definition: Inspired by Google’s AI “model cards” this Legal AI Transparency Statement provides standardised information about a legal AI product, including the underlying AI models used, its intended use, performance, limitations, how testing has been conducted and ethical considerations. The guidance provided under each section consists of a series of example questions that may be relevant. They do not present an exhaustive list.

Guidance on level of transparency: Providers of AI Products are encouraged to be as transparent as possible.  Only information that could be reasonably considered “commercially confidential” or linked to a competitive advantage should require a confidentiality agreement.

A.   Basic Details:

  1. Name and version of Product:

  2. Provider of Product:

  3. Date Transparency Statement Updated:

  4. Underlying / Foundation AI Vendor(s), Model Name(s) and Version(s): e.g., Open AI’s GPT4o

B.   Technical Details

  1.  Hosting and Sub-Processors etc:

    Guidance: Please explain how / where the product is hosted (e.g., hosted in Europe in a Microsoft Azure data centre). Do you offer a customer managed encryption key solution?  Please also list any key sub-processors and brief details of the data these sub-processors have access to and the hosting / processing arrangements

  2.  Underlying Model Orchestration

    Guidance: If multiple underlying AI models are used, please explain: (i) how and where each model is hosted and how data flows / is protected, including whether any third-party APIs (e.g. OpenAI, Anthropic, etc.) use zero-retention endpoints or equivalent privacy controls; and (ii) how the product manages the orchestration between the models. Can we choose between different LLMs within the product?

  3. Data Retention and Administration

    Guidance: Please explain how long data (input and output) is retained. Please confirm that any input data (e.g. user prompts, chat history, file uploads, API calls) is not used to train or improve the product or any underlying model.  If any input data is used for any purpose other than to provide a response to the user in the live session / prompt thread (ie; the Product learns or adapts based on user input over time), please describe how input data is used. Can the customer control retention periods?  How is access to any retained data (ie; logs of queries etc) controlled?

  4. APIs and MCP

    Guidance: Please state if your Product has an open API and whether the product supports Model Context Protocol.

C.   Intended Use:

  1.  Purpose:

    Guidance: This section is intended to capture information about the problem(s) that the product is designed to solve; how the product leverages AI (specifically LLMs or other generative AI models); who the intended users of the product are; the product’s key features / functionality; and how the product fits into existing legal workflows (e.g., document management, review, approvals)?

  2. Use Cases:

    Guidance: This section is to explain the primary legal use-cases where the product is expected to perform reliably and any legal use cases where the product is not recommended or is not expected to perform reliably (e.g., legal research, contract analysis, document review, predictive analytics)? It would also be helpful to list any core constraints or dependencies (i.e., Does the product primarily support a specific language, jurisdiction or area of law? Does the product require any integrations or data in a specific or preferred format?)

D.   Data Inputs, Training Sets, System Prompts / Workflows and Agents:

  1.  Training / Grounding Dataset Description:

    Guidance: This section is to explain any data that the product has been trained / fine-tuned / configured or grounded with (either in the development or the use of the product) and explain: (i) how this data supports or represents the target use cases for the product; (ii) how recent and relevant this is data to legal applications; and (iii) if relevant, the jurisdictions and areas of law of this data.

    Note: you do not need to describe the data that any underlying foundation model has been trained with unless you are using a proprietary model.

  2. Web / external source grounding:

    Guidance: It is important that any connection to external sources does not expose confidential information to the internet. If the product grounds responses based on external (i.e., web) searches / data, please explain how the product ensures that no confidential information entered into the product (i.e., prompts or contents of documents) is exposed to the internet?

  3. Data, Maintenance of data sets:

    Guidance: How was the data cleaned and prepared for training / use in the product? (NOTE: avoid specifics where this is proprietary information and provide a general overview) e.g., “a large dataset of legal case reports, legislation, and commentary curated by subject matter experts, underwent a cleaning and preparation process.” Include general cleaning steps, e.g., removal of duplicates, handling of missing values, text normalisation. Was any of the dataset manually reviewed or annotated by legal professionals? Was any synthetic data used? How do you ensure the data used / referenced by the product is maintained to ensure it remains relevant and up to date? How do you ensure you have all of the rights necessary to use the data used / referenced by the product?

  4.  System Prompts and Workflows Governance

    Guidance: How do you create, manage and control any system / feature level system prompts, agents and workflows that are not visible to the user?  What versioning and change control governance do you have in place to ensure that the risk of changes is properly managed? How do you keep customers updated on changes?

  5. Agents and automated decision making

    Guidance: Does your product include any agentic or automated decision making that is not transparent to the user?

  6. Customer Knowledge

    Guidance: Can customers / users of the product bring their own legal data or knowledge base to augment or fine-tune the product (RAG, embeddings, etc.)?

  7. Input Data:

    Guidance: Please describe the type of input data the product expects (e.g., text, images, attachments) and any limits or recommendations (i.e., token limits from context window size) about the length of prompts / length of documents / number of documents that can be used effectively in the product (including what you have tested). Does the product have an API available?

E.   Data Outputs and Evaluation:

  1.  Output Data:

    Guidance: Briefly describe the type of output data the product generates (e.g., text responses) and how this is engineered to support legal use cases and minimise the risk of hallucinations. It might be useful to include an example output. 

  2. Testing, Metrics and Results:

    Guidance: This section explores how you have tested the accuracy of the product’s output. Please explain what methods and metrics are used to measure the product’s performance and consistency. It would be helpful to include information about the testing methodology, the datasets and use cases tested, the volume of testing, any metrics you use to evaluate performance (e.g., accuracy, precision / recall, F1 score, BLEU for summarisation, hallucination rate) and the scoring / assessment criteria. Has the product been tested by a third party / benchmarked against any third-party standards (e.g., contract clause extraction challenges)? How do you measure changes in performance over time (e.g., as underlying models change, you introduce new reference sources, or engineering changes to improve responses)?

  3. Hallucinations:

    Guidance: What steps have you taken to reduce or minimise hallucinations, help users to detect and mitigate hallucinations or other situations when the product may produce an incorrect or misleading response? Do you have thresholds or recommendations for when and how a user should review outputs? Does the product provide confidence scores or other indicators? What types of prompts or tasks tend to reduce accuracy or reliability?

  4. Ethical Considerations:

    Guidance: Please identify and discuss any potential ethical concerns arising from the product’s use, including potential bias and mitigation strategies. How do you test for and mitigate bias in product outputs? What ethical frameworks and responsible AI guidelines or principles guide your product development, evaluation and sales processes? How do you ensure responsible AI use in high-stakes legal scenarios (e.g., employment, criminal)? What mechanisms exist for users to provide feedback or flag problematic AI behaviour?

F.   Regulations & Compliance:

  1.  AI Principles:

    Guidance: Does your Product align with any published AI principles?  For example, the UK government's five principles for AI (safety, transparency, fairness, accountability, and redress), Microsoft’s Responsible AI Principles etc.

  2. EU AI Act:

    Guidance: What assessments have you made in relation to the product against the EU AI Act? What risk category do you consider the product to be in? How do you comply with your ‘provider’ obligations? Do you have relevant end-user usage guidelines or documentation?

  3. Environmental Impact:

    Guidance: All organisations need to ensure they understand and, where they can, reduce their environmental impact. What is the estimated carbon footprint for training and running your product? Do you track and report the energy consumption of everyday use, not just training? How do you track / estimate emissions from cloud infrastructure or compute resources? Do you have a strategy for reducing emissions over time (e.g., carbon offsets, model improvements, energy sourcing)? Do you use cloud services with renewable energy commitments or carbon-neutral datacentres?

1st September 2025 • Version 1.0

All feedback and suggestions are welcomed to help enhance the Transparency Statement. You can make comments or download the document here: Transparency Statement