Audit modelModel Documentation Form (GPAI Code of Practice - Transparency Chapter)

This form includes all information required to be documented under Measure 1.1 of the Transparency chapter of the GPAI Code of Conduct. It allows GPAI providers to compile and maintain the required technical and compliance information in a consistent and structured manner.

1. General information

1.1. Legal name for the model provider

1.2. Model name

The unique identifier for the model (e.g. Llama 3.1-405B), including the identifier for the collection of models where applicable, and a list of the names of the publicly available versions of the concerned model covered by the Model Documentation.

1.3. Model authenticity

Evidence that establishes the provenance and authenticity of the model (e.g. a secure hash if binaries are distributed, or the URL endpoint in the case of a service), where available.

1.4. Release date

Date when the model was first released through any distribution channel.

1.5. Union market release date

Date when the model was placed on the Union market.

1.6. Model dependencies

If the model is the result of a modification or fine-tuning of one or more general-purpose AI models previously placed on the market, list the model name(s) (and relevant version(s) if more than one version has been placed on the market) of those model(s). Otherwise write ‘N/A’.

2. Model properties

2.1. Model architecture

A general description of the model architecture, e.g. a transformer architecture [Recommended 20 words]

2.2. Design specifications of the model

A general description of the key design specifications of the model, including rationale and assumptions made, to provide basic insight into how the model was designed. If any other please specify. [Recommended 100 words]

2.3. Input modalities

2.4. If any other please specify

2.5. For each selected modality please include maximum input size or write 'N/A' if not defined

2.6. Output modalities

2.7. If any other please specify

2.8. For each selected modality please include maximum output size or write 'N/A' if not defined

2.9. Total model size

The total number of parameters of the model, recorded with at least two significant figures, e.g. 7.3*10^10 parameters.

2.10. The range within which the total number of parameters falls

3. Methods of distribution and licenses

1. Distribution channels

1.1. Distribution channels

2. License

2.1. Select the methods of distribution through which the model can be made available to downstream providers :

2.2. Provide a link to model license(s) (otherwise provide a copy of the license(s) upon a request from the AIO) or indicate that no model license exists

2.3. Select the type or category of licence(s) under which the model can be made available to downstream providers :

2.4. Describe how access to the model is provided for downstream use, such as through terms of service.

2.5. Additional assets

4. Use

4.1. Acceptable Use Policy

Provide a link to the acceptable use policy applicable or indicate that none exists

4.2. Provide a description of the intended uses

A description of either :

(i) the uses that are intended by the provider (e.g. productivity enhancement, translation, creative content generation, data analysis, data visualisation, programming assistance, scheduling, customer support, variety of natural language tasks, etc..) or

(ii) the uses that are restricted and/or prohibited by the provider (beyond those prohibited by EU or international law, including Article 5 AI Act), in both cases as specified in the information supplied by the provider in the instructions for use, terms and conditions, promotional or sales materials and statements, as well as in the technical documentation.

If specifying (i) or (ii) is incompatible with the nature of the license under which the model is provided, then 'N/A' can be entered. [Recommended 200 words]

4.3. Provide a description of type and nature of AI systems in which the general-purpose AI model can be integrated

A list or description of either

(i) the type and nature of AI systems into which the general-purpose AI model can be integrated or

(ii) the type and nature of AI systems into which the general-purpose AI model should not be integrated.

Examples may include autonomous systems, conversational assistants, decision support systems, creative AI systems, predictive systems, cybersecurity, surveillance, or human-AI collaboration. [Recommended up to 300 words]

4.4. Describe the technical means for model integration

A general description of the technical means (e.g. instructions for use, infrastructure, tools) required for the general-purpose AI model to be integrated into AI systems. [Recommended 100 words]

4.5. Describe the required hardware

A description of any hardware, including the version, required to use the model, where applicable. If not applicable (e.g. model offered via an API), 'N/A' should be entered. [Recommended 100 words]

4.6. Describe the required software

A description of any software, including the version, required to use the model where applicable. If not applicable, 'N/A' should be entered. [Recommended 100 words]

5. Training process

5.1. Describe the design specifications of the training process

A general description of the main steps or stages involved in the training process, including training methodologies and techniques, the key design choices, assumptions made and what the model is designed to optimise for, and the relevance of different parameters, as applicable.

For example, “the model is initialized with randomly selected weights and optimised using gradient-based optimization via the Adam optimizer in two steps. First, the model is trained to predict the next word on a large pretraining corpus using the cross-entropy loss, passing over the data for a single epoch. Second, the model is post-trained on a dataset of human preferences for 10 epochs to align the model with human values and make it more useful in responding to user prompts“.

[Recommended 400 words]

5.2. Provide a description of the decision rationale

A description of how and why key design choices were made in model training. [Recommended 200 words]

6. Information on the data used for training, testing, and validation

6.1. Select the data type/modality that apply :

6.2. If any other please specify :

6.3. Select the data provenance that apply :

6.4. If any other please specify :

6.5. Describe how data was obtained and selected

A description of the methods used to obtain and select training, testing, and validation data, including methods and resources used to annotate data, and models and methods used to generate synthetic data where applicable.

For data previously obtained from third parties, a description of how the provider obtained the rights to the data if not already disclosed in the public summary of training data published in accordance with Article 53(1), point (d).

[Recommended 300 words]

6.6. Provide a number of data points

The size (in number of data points) of the training, testing, and validation data respectively, together with the definition of the unit of data points (e.g. tokens or documents, images, hours of video or frames), recorded with at least one significant figure (e.g. 3x10^13 tokens).

6.7. Provide a number of data points

6.8. Describe the scope and main characteristics

A general description of the scope and main characteristics of the training, testing and validation data, such as domain (e.g. healthcare, science, law,...), geography (e.g. global, restricted to a certain region,...), language, modality coverage, where applicable. [Recommended 200 words]

6.9. Describe the data curation methodologies

General description of the data processing involved in transforming the acquired data into training, testing, and validation data for the model, such as cleaning (e.g. filtering out irrelevant content such as advertisements), normalisation (e.g. tokenizing), augmentation (e.g. back-translation). [Recommended 300 words]

6.10. Describe the measures to detect unsuitability of data sources

A description of any methods implemented in data acquisition or processing, if any, to detect the presence of unsuitable data sources considering the model’s intended uses, including but not limited to illegal content, child sexual abuse material (CSAM), non-consensual intimate imagery (NCII), and personal data leading to its unlawful processing. [Recommended 400 words]

6.11. Describe the measures to detect identifiable biases

A description of any methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training data. [Recommended 200 words]

7. Computational resources (during training)

7.1. Describe the training time

A description of what period is being measured along with the range that its duration falls under, within the following ranges: less than 1 month, 1—3 months, 3—6 months, more than 6 months.

7.2. Provide a description of what period is being measured

A description of what period is being measured along with the duration in wall clock days (e.g. 9x10^1 days) and in hardware days (e.g. 4x10^5 Nvidia A100 days and 2x10^5 Nvidia H100 days), both recorded with at least one significant figure.

7.3. Measured or estimated amount of computation used for training

Reported in floating point operations and recorded up to its order of magnitude (e.g. 10^24 floating point operations).

7.4. Measured or estimated amount of computation used for training

Reported in computational operations and recorded with at least two significant figures (e.g. 2.4x10^25 floating point operations).

7.5. What is the measurement methodology ?

In the absence of a delegated act adopted to detail measurement and calculation methodologies, describe the methodology used to measure or estimate the amount of computation used for training.

8. Energy consumption (during training and inference)

8.1. What is the amount of energy used for training?

Measured or estimated amount of energy used for training, reported in Megawatt-hours and recorded with at least two significant figures (e.g. 1.0x102 MWh). If the amount of energy used for training cannot be estimated due to the lack of critical information from a compute or hardware provider, enter ‘N/A’.

8.2. What is your measurement methodology?

In the absence of a delegated act adopted to detail measurement and calculation methodologies, describe the methodology used to measure or estimate the amount of energy used for training.

Where the energy consumption of the model is unknown, the energy consumption may be estimated based on information about computational resources used.

If the amount of energy used for training cannot be estimated due to a lack of critical information from a compute or hardware provider, the provider should disclose the type of information they lack.

[Recommended 100 words]

8.3. What is the benchmarked amount of computation used for inference?

Reported in floating point operations, recorded with at least two significant figures (e.g. 5.1x10^17 floating point operations).

This item relates to energy consumption during inference, which makes up the “energy consumption of the model” (Annex XI, 2(e), AI Act) together with energy consumption during training. Since energy consumption during inference depends on more than just the model itself, the information required for this item is limited to relevant information depending only on the model, namely computational resources used for inference.

8.4. What is the measurement methodology?

In the absence of a delegated act adopted to detail measurement and calculation methodologies, provide a description of a computational task (e.g. generating 100000 tokens) and the hardware (e.g. 64 Nvidia A100s) used to measure or estimate the amount of computation used for inference.

Created at:08/16/2025

Updated on :00/20/2026