[IMP] extract_api: document the new API version

We have introduced a second version of the Extract API that adds the IAP
account token to the `get_result` and `validate` routes.

There are also other changes in this commit:
 - The deprecated stuff has been removed as it's been deprecated for a
   while and won't be supported anymore in the future.
 - The `status` table of the `get_result` route has been added as it was
   missing for some reason.
 - The documentation regarding the `/validate` routes have been removed
   for security reasons.

closes odoo/documentation#6283

X-original-commit: a8c6272899
Signed-off-by: Antoine Vandevenne (anv) <anv@odoo.com>
Signed-off-by: Louis Baudoux (lba) <lba@odoo.com>
This commit is contained in:
Louis Baudoux 2023-10-30 16:58:29 +01:00
parent 86207443b1
commit cdc510295c
2 changed files with 67 additions and 340 deletions

View File

@ -7,7 +7,7 @@ or **resumes**.
The service scans documents using an :abbr:`OCR (Optical Character Recognition)` engine and then
uses :abbr:`AI(Artificial Intelligence)`-based algorithms to extract fields of interest such as the
total, due date, or invoice lines for *invoices*, the total, date, payment reference for *expenses*,
total, due date, or invoice lines for *invoices*, the total, date for *expenses*,
or the name, email, phone number for *resumes*.
This service is a paid service. Each document processing will cost you one credit.
@ -24,6 +24,8 @@ Overview
The extract API uses the JSON-RPC2_ protocol; its endpoint routes are located at
`https://extract.api.odoo.com`.
.. _extract_api/version:
Version
-------
@ -37,18 +39,15 @@ The latest versions are:
Flow
----
The flow is the same for the three services.
The flow is the same for each document type.
#. | Call :ref:`/parse <extract_api/parse>` to submit your invoices (one call for each
invoice). On success, you receive a `document_uuid` in the response.
#. | Call :ref:`/parse <extract_api/parse>` to submit your documents (one call for each
document). On success, you receive a `document_token` in the response.
#. | You then have to regularly poll :ref:`/get_result <extract_api/get_result>` to get the
document's parsing status.
| Alternatively, you can provide a `webhook_url` at the time of the call to
:ref:`/parse <extract_api/parse>` and you will be notified (via a POST request) when the
result is ready.
#. | Once the result is received, you can validate it by calling
:ref:`/validate <extract_api/validate>` and sending the expected values.
| This step is optional but greatly helps the system to improve.
The HTTP POST method should be used for all of them. A python implementation of the full flow for
invoices can be found :download:`here <extract_api/implementation.py>` and a token for integration
@ -59,20 +58,17 @@ testing is provided in the
Parse
=====
Request the processing of a document from the OCR. The route will return a `document_uuid`
(it replaces `document_id`, which is deprecated); you can use it to obtain the result of your
request.
Request the processing of a document from the OCR. The route will return a `document_token`,
you can use it to obtain the result of your request.
.. _extract_api/parse:
Routes
------
- /api/extract/invoice/1/parse
- /api/extract/expense/1/parse
- /api/extract/applicant/1/parse
- |ss| /iap/invoice_extract/parse |se| (deprecated)
- |ss| /iap/expense_extract/parse |se| (deprecated)
- /api/extract/invoice/2/parse
- /api/extract/expense/2/parse
- /api/extract/applicant/2/parse
Request
-------
@ -91,10 +87,9 @@ Request
``account_token`` (required)
The token of the account from which credits will be taken. Each successful call costs one
token.
``version`` (optional)
The version will determine the format of your requests and the format of the server
response. Some results can be unavailable in older versions. If not specified, the latest
version will be used.
``version`` (required)
The version will determine the format of your requests and the format of the server response.
You should use the :ref:`latest version available <extract_api/version>`.
``documents`` (required)
The document must be provided as a string in the ASCII encoding. The list should contain
only one string. If multiple strings are provided only the first string corresponding to a
@ -105,7 +100,7 @@ Request
Unique identifier of the Odoo database.
``webhook_url`` (optional)
A webhook URL can be provided. An empty POST request will be sent to
``webhook_url/document_uuid`` when the result is ready.
``webhook_url/document_token`` when the result is ready.
``user_infos`` (optional)
Information concerning the person sending the document to the extract service. It can be
the client or the supplier (depending on the ``perspective``). This information is not
@ -141,9 +136,11 @@ Request
"jsonrpc": "2.0",
"method": "call",
"params": {
"account_token": string (hex),
"account_token": string,
"version": int,
"documents": [string],
"dbuuid": string,
"webhook_url": string,
"user_infos": {
"user_company_vat": string,
"user_company_name": string,
@ -171,33 +168,25 @@ Response
``id``
see JSON-RPC2_
``result``
Dictionary containing the following content.
.. rst-class:: o-definition-list
``status`` (replaces ``status_code``)
The code indicating the status of the request. "success" in case of success. Other values
are detailed in the table below.
``status``
The code indicating the status of the request. See the table below.
``status_msg``
A string giving verbose details about the request status.
``document_uuid`` (replaces ``document_id``)
Only present if the request is successful.
``status_code`` (deprecated)
The code indicating the status of the request. 0 in case of success. Other values are
detailed in the table below.
``document_id`` (deprecated)
``document_token``
Only present if the request is successful.
========================== ============================================ ==========================
status status_msg status_code (deprecated)
========================== ============================================ ==========================
`success` Success 0
`error_internal` An error occurred 2
`error_no_credit` You don't have enough credit 3
`error_unsupported_format` Unsupported file format 6
`error_maintenance` Server is currently under maintenance.
Please try again later. 9
========================== ============================================ ==========================
=========================== ==============================================================
status status_msg
=========================== ==============================================================
`success` Success
`error_unsupported_version` Unsupported version
`error_internal` An error occurred
`error_no_credit` You don't have enough credit
`error_unsupported_format` Unsupported file format
`error_maintenance` Server is currently under maintenance, please try again later
=========================== ==============================================================
.. code-block:: js
@ -206,22 +195,11 @@ status status_msg status
"id": string,
"result": {
"status": string,
"status_code": int, // deprecated
"status_msg": string,
"document_uuid": string,
// "document_id": int, // deprecated
"document_token": string,
}
}
.. warning::
The ``document_id`` field is deprecated and will be removed in the future. Please use
``document_uuid`` instead.
Version introducing ``document_uuid``:
- invoices: 122
- expenses: 132
- applicant: 102
.. note::
The API does not actually use the JSON-RPC error scheme. Instead the API has its own error
scheme bundled inside a successful JSON-RPC result.
@ -234,11 +212,9 @@ Get results
Routes
------
- /api/extract/invoice/1/get_result
- /api/extract/expense/1/get_result
- /api/extract/applicant/1/get_result
- |ss| /iap/invoice_extract/get_result |se| (deprecated)
- |ss| /iap/expense_extract/get_result |se| (deprecated)
- /api/extract/invoice/2/get_result
- /api/extract/expense/2/get_result
- /api/extract/applicant/2/get_result
Request
-------
@ -252,16 +228,14 @@ Request
``id`` (required)
see JSON-RPC2_
``params``
Dictionary containing the following content.
.. rst-class:: o-definition-list
``version`` (required)
|SAME_AS_PARSE|
``documents_uuid`` (required, replaces ``documents_id``)
The ``document_uuid`` for which you want to get the current parsing status.
``documents_id`` (deprecated)
The ``document_id`` for which you want to get the current parsing status.
The version should match the version passed to the :ref:`/parse <extract_api/parse>` request.
``document_token`` (required)
The ``document_token`` for which you want to get the current parsing status.
``account_token`` (required)
The token of the account that was used to submit the document.
.. code-block:: js
@ -270,8 +244,8 @@ Request
"method": "call",
"params": {
"version": int,
// "documents_id": int, // deprecated
"documents_uuid": int,
"document_token": int,
"account_token": string,
},
"id": string,
}
@ -290,16 +264,12 @@ are the name of the field and the value is the value of the field.
``id``
see JSON-RPC2_
``result``
Dictionary where each key is a document_id. For each ``document_id``
.. rst-class:: o-definition-list
``status``
|SAME_AS_PARSE|
``status_code``
|SAME_AS_PARSE|
The code indicating the status of the request. See the table below.
``status_msg``
|SAME_AS_PARSE|
A string giving verbose details about the request status.
``results``
Only present if the request is successful.
@ -308,6 +278,21 @@ are the name of the field and the value is the value of the field.
``full_text_annotation``
Contains the unprocessed full result from the OCR for the document
================================ =============================================================
status status_msg
================================ =============================================================
`success` Success
`error_unsupported_version` Unsupported version
`error_internal` An error occurred
`error_maintenance` Server is currently under maintenance, please try again later
`error_document_not_found` The document could not be found
`error_unsupported_size` The document has been rejected because it is too small
`error_no_page_count` Unable to get page count of the PDF file
`error_pdf_conversion_to_images` Couldn't convert the PDF to images
`error_password_protected` The PDF file is protected by a password
`error_too_many_pages` The document contains too many pages
================================ =============================================================
.. code-block:: js
{
@ -315,7 +300,6 @@ are the name of the field and the value is the value of the field.
"id": string,
"result": {
"status": string,
"status_code": int, // deprecated
"status_msg": string,
"results": [
{
@ -452,8 +436,6 @@ list of all the fields we can extract from an invoice.
| ``due_date`` | Same as for ``date`` |
+-------------------------+------------------------------------------------------------------------+
| ``total_tax_amount`` | ``content`` is a float |
| (previously | |
| ``global_taxes_amount``)| |
+-------------------------+------------------------------------------------------------------------+
| ``invoice_id`` | ``content`` is a string |
+-------------------------+------------------------------------------------------------------------+
@ -535,196 +517,6 @@ list of all the fields we can extract from a resume.
| ``mobile`` | ``content`` is a string |
+-------------------------+------------------------------------------------------------------------+
Validate
========
The validation step is an optional step but is strongly recommended. By telling the system if it
were right or wrong for each feature you give an important feedback. It has no direct impact but it
helps the system to greatly improve its prediction accuracy for the documents you will send in the
future.
.. _extract_api/validate:
Routes
------
- /api/extract/invoice/1/validate
- /api/extract/invoice/1/validate_batch
- /api/extract/expense/1/validate
- /api/extract/expense/1/validate_batch
- /api/extract/applicant/1/validate
- /api/extract/applicant/1/validate_batch
- |ss| /iap/invoice_extract/validate |se| (deprecated)
- |ss| /iap/expense_extract/validate |se| (deprecated)
Request
-------
.. rst-class:: o-definition-list
``jsonrpc`` (required)
see JSON-RPC2_
``method`` (required)
see JSON-RPC2_
``id`` (required)
see JSON-RPC2_
``params`` (``/validate`` route only)
dictionary containing the following fields
.. rst-class:: o-definition-list
``document_uuid`` (required, replaces ``document_id``)
|SAME_AS_PARSE|
``values``
Contains the validation for each feature. For invoices, the field ``merged_line`` indicates
if the lines were merged or not.
``document_id`` (deprecated)
|SAME_AS_PARSE|
.. rst-class:: o-definition-list
``invoice_lines`` have been merged or not.
``params`` (``/validate_batch`` route only)
dictionary containing the following fields
.. rst-class:: o-definition-list
``documents``
Contains the validation for each feature for each document, the ``document_uuid`` are the
keys and their values is the content of the ``value`` field of the ``/validate`` route.
.. code-block:: js
// for the /validate route
{
"jsonrpc": "2.0",
"method": "call",
"params": {
// "document_id": int, // deprecated
"document_uuid": string,
"values": {
"merged_lines": bool, // for invoices
"feature_name_1": validation_1,
"feature_name_2": validation_2,
...
}
},
"id": string,
}
// for the /validate_batch route
{
"jsonrpc": "2.0",
"method": "call",
"params": {
"documents": {
document_uuid_1: {
"merged_lines": bool, // for invoices
"feature_name_1": validation_1,
"feature_name_2": validation_2,
...
},
document_uuid_2: {
"merged_lines": bool, // for invoices
"feature_name_1": validation_1,
"feature_name_2": validation_2,
...
},
...
}
},
"id": string,
}
.. note::
You don't have to validate all the features in order for the validation to succeed. However
:ref:`/validate <extract_api/validate>` can't be called multiple times for a same
document. Therefore you should validate all the features you want to validate at once.
validation
~~~~~~~~~~
A **validation** for a given feature is a dictionary containing the textual representation of the
expected value for this given feature. This format apply for all the features except for
``invoice_lines`` which has a more complex validation format.
.. code-block:: js
"feature_name": {
"content": string|float
}
invoice_lines
*************
**lines** is a list of dictionaries. Each dictionary represents an invoice line. The dictionary keys
speak for themselves. Note that there is no ``content`` for this feature.
.. code-block:: js
"invoice_lines": {
"lines": [
{
"description": string,
"quantity": float,
"unit_price": float,
"product": string,
"taxes_amount": float,
"taxes": [
{
"amount": float,
"type": "fixed"|"percent",
"price_include": bool
},
...
],
"subtotal": float,
"total": float
},
...
]
}
Response
--------
.. rst-class:: o-definition-list
``jsonrpc``
see JSON-RPC2_
``id``
see JSON-RPC2_
``result``
.. rst-class:: o-definition-list
``status``
|SAME_AS_PARSE|
``status_msg``
|SAME_AS_PARSE|
``status_code`` (deprecated)
|SAME_AS_PARSE|
========================== =========================================== ===========================
`status` status_msg status_code (deprecated)
========================== =========================================== ===========================
`success` Success 0
`error_validation_format` Validation format is incorrect 12
========================== =========================================== ===========================
.. code-block:: js
{
"jsonrpc": "2.0",
"id": string,
"result": {
"status": string,
// "status_code": int, // deprecated
"status_msg": string,
}
}
.. _latestextract_api/integration_testing:
Integration Testing
@ -745,8 +537,6 @@ A python implementation of the full flow for invoices can be found
.. _JSON-RPC2: https://www.jsonrpc.org/specification
.. |SAME_AS_PARSE| replace:: Same as for :ref:`/parse <extract_api/parse>`.
.. |ss| raw:: html
<strike>

View File

@ -6,7 +6,7 @@ import time
import requests
account_token = "integration_token" # Use your token
domain_name = "https://extract.api.odoo.com/"
domain_name = "https://extract.api.odoo.com"
path_to_pdf = "/path/to/your/pdf"
doc_type = "invoice" # invoice, expense or applicant
@ -38,16 +38,17 @@ def send_document_to_extract(doc_path: str):
'version': API_VERSION[doc_type],
'documents': [encoded_doc],
}
response = extract_jsonrpc_call(f"/api/extract/{doc_type}/1/parse", params)
response = extract_jsonrpc_call(f"/api/extract/{doc_type}/2/parse", params)
return response
def get_result_from_extract(document_uuid: str):
def get_result_from_extract(document_token: str):
params = {
'version': API_VERSION[doc_type],
'document_uuid': document_uuid,
'document_token': document_token,
'account_token': account_token,
}
endpoint = f"/api/extract/{doc_type}/1/get_result"
endpoint = f"/api/extract/{doc_type}/2/get_result"
response = extract_jsonrpc_call(endpoint, params)
while response['result']['status'] == 'processing':
print("Still processing... Retrying in 5 seconds")
@ -56,66 +57,6 @@ def get_result_from_extract(document_uuid: str):
return response
def get_result_batch_from_extract(document_uuids: list):
"""Get the results of multiple documents at once."""
params = {
'version': API_VERSION[doc_type],
'document_uuids': document_uuids,
}
endpoint = f"/api/extract/{doc_type}/1/get_result_batch"
response = extract_jsonrpc_call(endpoint, params)
for uuid in document_uuids:
while response['result'][uuid]['status'] == 'processing':
print("Still processing... Retrying in 5 seconds")
time.sleep(5)
response = extract_jsonrpc_call(endpoint, params)
yield response
def validate_results(document_uuid: str):
# This is an example of how to validate the results of the parsing
# These values should be the correct values for the document reviewed by the user
params = {
'document_id': document_uuid,
'values': {
'total': {'content': float},
'subtotal': {'content': float},
'total_tax_amount': {'content': float},
'date': {'content': str}, # YYYY-MM-DD
'due_date': {'content': str}, # YYYY-MM-DD
'invoice_id': {'content': str},
'partner': {'content': str},
'VAT_Number': {'content': str},
'currency': {'content': str},
'merged_lines': bool,
'invoice_lines': {
'lines': [
{
'description': str,
'quantity': float,
'unit_price': float,
'product': str,
'taxes_amount': float,
'taxes': [
{
"amount": float,
"type": "fixed"|"percent",
"price_include": bool
},
...
],
'subtotal': float,
'total': float,
},
...
],
}
}
}
response = extract_jsonrpc_call(f"/api/extract/{doc_type}/1/validate", params)
return response
if __name__ == '__main__':
# Parse the document
@ -125,10 +66,10 @@ if __name__ == '__main__':
if response['result']['status'] != 'success':
sys.exit(1)
document_uuid = response['result']['document_uuid']
document_token = response['result']['document_token']
# Get the results of the parsing
response = get_result_from_extract(document_uuid)
response = get_result_from_extract(document_token)
# Write the response to a file
output_file = 'response.json'
@ -147,7 +88,3 @@ if __name__ == '__main__':
print("Invoice id:", document_results['invoice_id']['selected_value']['content'])
print("Date:", document_results['date']['selected_value']['content'])
print("...\n")
# Validate the results
response = validate_results(document_uuid)
print("/validate call status: %s" % response['result']['status_msg'])