DocuProx Core Features

Master the essential features of DocuProx: create powerful templates for document processing and integrate seamlessly with your existing business systems and workflows.

Templates

A DocuProx Template is a user-defined blueprint that teaches the DocuProx AI how to recognize, locate, and extract specific pieces of information from a particular type of document.

Key aspects of a DocuProx Template:

Folders

Organize and categorize your templates for better management. Create folders to group templates by document type, department, or any custom classification system that suits your workflow.

Reference Image Mode

Choose between two extraction approaches based on your document characteristics:

With Reference Image

Use this mode when you have a standard document format and want to guide the AI with a reference image. In this mode, you can define:

VALUE The actual value present in the reference document that will be used to identify label-value pair based extraction from actual documents at runtime
RECTANGLE A bounded region (numbered for identification) that defines the area where the target value is located, helping the system know exactly which region to extract data from
PROMPT Natural language instructions that tell the system how to identify and extract the target value
STATIC Returns a predefined static value as default, or uses the value provided in the API payload
Without Reference Image

Perfect for variable document layouts and multi-page PDFs. Processes documents directly without requiring a reference image, offering flexible extraction capabilities.

FLEXIBLE Adapts to varying document structures and layouts
MULTI-PAGE Supports complex PDF documents with multiple pages
STATIC Returns a predefined static value as default, or uses the value provided in the API payload
Agent Mode

Designed for AI agents to agentically process documents. This mode enables autonomous document processing with custom prompts, allowing AI systems to dynamically define extraction fields via the /v1/process-agent endpoint.

AGENTIC Built for AI agents to autonomously process documents
PROMPT Define custom extraction prompts per field
DYNAMIC Configure document type and instructions at runtime

Custom Instructions (Optional)

For specialized documents that the AI hasn't been extensively trained on, use this field to describe each aspect of your document in natural language. This helps the AI better understand the document structure and content.

JSON Structure Builder

A dynamic JSON builder that allows you to define the exact structure of your desired response format. You can:

  • • Add elements and nodes (objects) to create complex JSON structures
  • • Define target keys where extracted values will be populated at runtime when your system calls the backend API
  • • Configure VALUE, RECTANGLE, PROMPT, and STATIC-based extraction methods within the JSON structure for precise data mapping

Test

After configuring your template, use the test feature to validate that your template returns the desired data in the correct format. You'll need to generate an API key to use this testing functionality, then upload sample documents to verify extraction accuracy before deploying to production.

Supported Formats

Images

  • • JPEG
  • • PNG
  • • GIF, BMP, TIFF

Documents

  • • PDF (single page)
  • • PDF (multi-page)
  • • ZIP (batch uploads)

Input Methods

  • • Base64 encoded
  • • File uploads

Integrations

DocuProx integrates seamlessly with your existing business systems and workflows through our robust API-first architecture.

Common Integration Scenarios

CRM Systems

Automatically process customer documents and populate CRM records (Salesforce, HubSpot).

Accounting Software

Extract invoice data and feed directly into QuickBooks, SAP, or ERP systems.

Document Management

Integrate with SharePoint, Google Drive, or custom document repositories.

Integration Methods

RESTful API

Direct API calls for real-time document processing and data extraction.

Webhooks

Real-time notifications and event-driven processing for automated workflows.

SDKs & Libraries

Pre-built libraries for popular programming languages (Python, JavaScript, Java).

Example Integration Workflow

Document Upload

User uploads document to your system

API Call

Your system calls DocuProx API with document

Data Extraction

DocuProx processes and extracts data

Data Storage

Structured data stored in your database

Webhook Integration

Receive real-time notifications when your document processing is complete. Configure webhooks to automatically receive extracted data in your application without polling.

Key Benefits

  • Real-time result delivery via webhook
  • No need for API polling - results are pushed to you
  • Automated integration with your system
  • Fast, non-blocking job submission
  • Supports images, multi-page PDFs, and ZIP batch uploads

How It Works

1

Configure Your Webhook

Set up a webhook URL from the dashboard for your template. You can also configure custom headers for authentication.

2

Submit a Processing Job

Use the POST /process-job API with your template_id and document (image, PDF, or ZIP file).

3

Background Processing

Documents are securely stored and processed asynchronously. Credits are deducted based on document type.

4

Webhook Result Delivery

Once complete, a POST request is sent to your webhook URL with extracted data in JSON format.

Job Status Values

Status Description
NEW Job has been created and is queued for processing
UNZIPPING FILE Extracting files from uploaded ZIP archive
UNZIP FILE SUCCESS ZIP archive extraction completed successfully
UNZIP FILE FAILED ZIP archive extraction failed
PROCESSING IMAGE Processing document images for data extraction
PROCESS IMAGE SUCCESS Image processing completed successfully
PROCESS IMAGE FAILED Image processing failed
SUCCESS Job completed successfully, results are ready
FAILED Job processing failed

Security & Reliability

  • 🔒 API authentication required for job submission
  • 🔒 Webhook URLs and headers stored securely
  • 🔒 Automatic retry mechanism for failed webhooks
  • 🔒 Files automatically deleted after processing

Job Results API

As an alternative to webhooks, you can retrieve processed results using the POST /v1/job-results endpoint. This is useful when you prefer to poll for results or need to retrieve results at a later time.

Output Formats

  • json - Structured JSON response
  • csv - CSV format for spreadsheets

Request Parameters

  • job_id - UUID of the job
  • result_format - json or csv

⏱️ 24-Hour Retention: Job results are stored for 24 hours after processing completion. Make sure to retrieve your results within this window. After 24 hours, results are automatically deleted and cannot be recovered.

Learn More: For detailed API reference, payload examples, and implementation guides, visit the Webhook Integration Documentation.

SDKs

Official SDKs for Python and Node.js. Build powerful document processing applications with our easy-to-use SDK packages that enable automated document extraction and processing.

Quick Start Examples

Python

from docuprox import Docuprox

# Initialize client
client = Docuprox(api_key="your_api_key")

# Process a file
result = client.processfile(
    "invoice.pdf",
    "your-template-uuid"
)

print(result)

Node.js

const Docuprox = require("docuprox");

const client = new Docuprox();

(async () => {
  const result = await client.processFile(
    "invoice.pdf",
    "your-template-uuid"
  );

  console.log(result);
})();

SDK Features

Easy Integration

Simple APIs to quickly integrate document processing into any application.

Multiple Input Types

Process local files directly or send base64 encoded data to the API.

Async Support

Node.js SDK provides promise-based async operations for non-blocking workflows.

Environment Config

Configure API URL and API keys easily using environment variables.

Batch Processing

Process multiple documents efficiently with batch processing support.

AI Agent Processing

Use AI agents to extract custom data with dynamic prompts.

Learn More: For detailed installation guides, API reference, and advanced usage, visit the SDK Documentation.