PDF Document

PDF document parsing & processing APIs

PDF Table Extraction

Parse tables in the PDF document

Securityapi_key

Request

query Parameters

parser

string

An optional parameter that refers to the PDF Table parser.

Request Body schema:
application/pdf
required

string <binary> (Binary File Request)

Binary file e.g. pdf, docx, html

Responses

200

The data was received successfully

400

Invalid request

403

The request is forbidden (Please input a valid API key)

post/docs/parsers/pdf/table

Request samples

JavaScript
curl
Node.js
Java
Python

Response samples

application/json

{"status": {"success": true,
"code": 200
},
"result": {"model-version": "string",
"blocks": [{"id": "string",
"type": "CELL",
"geometry": {"bbox": {"width": 0,
"height": 0,
"top": 0,
"left": 0
}
},
"relationships": [{"type": "string",
"ids": ["string"
]
}
],
"page-index": "string",
"text": "string",
"text-type": "string",
"conf": 100,
"row-index": 1,
"col-index": 1,
"row-span": 1,
"col-span": 1
}
]
}
}

PDF Content Extraction

Parse PDF document

Securityapi_key

Request

query Parameters

engine	string An optional parameter that refers to the PDF Table parser.
y_mul	string An optional hyper-parameter to control text clustering along the y-axis.
page_index	string The page index to parse. The index of the first page is 1.
w_mul	string An optional hyper-parameter to control text clustering along the x-axis.
y_mul_small	string An optional hyper-parameter to control small font-text clustering along the y-axis.
y_mul_space	string An optional hyper-parameter for engine=v2, to control text space clustering along the y-axis. Must be used in conjunction with y_mul.

Request Body schema:
application/pdf
required

string <binary> (Binary File Request)

Binary file e.g. pdf, docx, html

Responses

200

The data was received successfully

400

Invalid request

403

The request is forbidden (Please input a valid API key)

post/docs/parsers/pdf

Request samples

JavaScript
curl
Node.js
Java
Python

Response samples

application/json

{"result": {"metadata": { },
"contents": [{"text": "string",
"hash": "string",
"metadata": { }
}
]
}
}

PDF-2-JPEG

Converts the pages of the input PDF file into JPEG with text clusters marked with bounding boxes.

Securityapi_key

Request

query Parameters

engine	string An optional parameter that refers to the PDF Table parser.
y_mul	string An optional hyper-parameter to control text clustering along the y-axis.
draw_bb	string Control if the returned image should have the boundng boxes drawn.
y_mul_small	string An optional hyper-parameter to control small font-text clustering along the y-axis.
w_mul	string An optional hyper-parameter to control text clustering along the x-axis.
y_mul_space	string An optional hyper-parameter for engine=v2, to control text space clustering along the y-axis. Must be used in conjunction with y_mul.

Request Body schema: application/pdf
required

string <binary> (Binary File Request)

Binary file e.g. pdf, docx, html

Responses

200

The data was received successfully

400

Invalid request

403

The request is forbidden (Please input a valid API key)

post/docs/parsers/pdf/image

Request samples

JavaScript
curl
Node.js
Java
Python

Response samples

application/json

{"status": {"success": true,
"code": 200
},
"result": {"pages": [{"index": 0,
"dimensions": {"width": 0,
"height": 0,
"rotation": 0
},
"contents": [{"text": "string",
"location": {"left": 0,
"top": 0,
"right": 0,
"bottom": 0
},
"dimension": {"width": 0,
"height": 0
}
}
],
"image": {"type": "string",
"base64": "string"
}
}
]
}
}

➔ Next to Word Document

PDF Document

PDF Table Extraction

query Parameters

Request Body schema: application/pdfapplication/octet-streamapplication/pdfrequired

PDF Content Extraction

query Parameters

Request Body schema: application/pdfmultipart/form-dataapplication/octet-streamapplication/pdfrequired

PDF-2-JPEG

query Parameters

Request Body schema: application/pdfrequired

Request Body schema:
application/pdf
required

Request Body schema:
application/pdf
required

Request Body schema: application/pdf
required