smartextract’s power lies in its flexibility to handle diverse document types through customizable AI models. While pre-built models cover common document formats like invoices and receipts, your organization may work with unique document structures requiring specialized extraction capabilities. This guide provides a detailed walkthrough of creating, refining, and optimizing custom AI models tailored to your specific document processing needs.
Consider developing a custom AI model when:
Your documents contain industry-specific fields not found in standard templates
You work with proprietary document formats
You need to extract specialized information not covered by pre-built models
You want complete control over field organization and data extraction logic
Access the inbox creation interface
Log in to your smartextract account
Navigate to your dashboard
Click "New Inbox"
When prompted to select a model, choose "Create Your Own"
Name your model
Provide a descriptive name reflecting the document type it will process
This name will also become the default name for your inbox, though you can rename it later
Upload your first document
Select a clear, high-quality sample document that represents your typical use case
Ensure this document contains all fields you'll want to extract
Use drag-and-drop functionality or the "Browse Files" button to upload
Access the extraction view
Once uploaded, click on the document name to open it in the Extraction View
This three-panel interface (thumbnails, document preview, and extraction fields) is where you'll build your model
Initial AI analysis
smartextract will perform a preliminary analysis and attempt to identify common fields
This serves as a starting point for your customization
Access the customization interface
Click the "Customize AI Model" button in the upper right corner
The right panel will transform into the model editing interface
Create field groups
Field groups organize related information logically
Click "Add Field Group"
Provide a descriptive name (e.g., "Document Identification," "Client Information," "Financial Details")
Click "Next" to begin adding fields within this group
Define individual fields
For each piece of information you want to extract:
Click "Add New Field"
Enter a clear field name
Select the appropriate data type (more on this below)
Add an optional description to help the AI understand context
Click "Confirm" to add the field
smartextract offers several field types to ensure proper data formatting and validation:
Text fields
Best for: Names, addresses, description fields, reference numbers, IDs
Example: "Contract Number," "Vendor Name," "Property Address"
Configuration: Simply select "Text" as the type
Quantity fields
Best for: Monetary values, quantities, measurements, percentages
Example: "Total Amount," "Square Footage," "Quantity Ordered"
Configuration: Select "Quantity" as the type
Date fields
Best for: Any calendar dates in your documents
Example: "Issue Date," "Expiration Date," "Birth Date"
Configuration: Select "Date" as the type
Benefit: Standardizes various date formats into a consistent output
Multiple Choice fields
Best for: Status indicators, categories, classifications with predefined options
Example: "Payment Status," "Document Type," "Priority Level"
Configuration:
Select "Multiple Choice" as the type
Enter all possible values separated by commas
Example: "Approved, Pending, Rejected" or "Residential, Commercial, Industrial"
For each field, consider these additional customization options:
Field descriptions
Adding detailed descriptions helps the AI understand what to look for
Particularly useful for industry-specific terminology or uncommon field names
Example: For a field named "MOQ," add the description "Minimum Order Quantity required by the supplier"
Inference settings
Some fields may not be explicitly stated but can be inferred
Example: "Document Completeness" could be set to infer "Complete" or "Incomplete" based on the presence of signatures
Save your initial model
Click "Save AI Model" to apply your configuration
The system will reprocess your document using the new model settings
Evaluate results
Review the extracted data for accuracy
Pay attention to:
Missing fields
Incorrectly extracted values
Fields that should be detected but weren't
Iterative refinement
If results aren't satisfactory, return to customization mode
Adjust field names, descriptions, or types
Add additional fields or remove unnecessary ones
Save and test again
Test with diverse samples
Include documents with different layouts, formats, and content
Upload examples from different sources or time periods
Include edge cases with unusual formatting or content placement
Field redundancy
For critical information that might appear in different formats or locations, create multiple fields
Example: "Invoice Date" and "Issue Date" might refer to the same information but appear differently
Strategic field grouping
Group related fields together based on their typical proximity in documents
This improves extraction accuracy as the AI looks for contextual relationships
Field naming considerations
Use clear, specific names that match how the information appears in your documents
If your document uses "Purchase Order #," name your field exactly that rather than just "Order Number"
Spatial context
For fields with consistent positions, provide position hints in the description
Example: "Usually found in the top-right corner of the first page"
Page-specific extraction
Some fields may consistently appear on specific pages
Use the page thumbnails on the left to navigate between pages during customization
Create separate field groups for information typically found on different pages
Line item recognition
For documents with tables (like invoices with multiple line items):
Create a specific field group for line items
Define fields for each column (e.g., "Item Description," "Quantity," "Unit Price")
The AI will identify table structures and extract repeated rows
Table summaries
Create separate fields for table totals and subtotals
Example: "Total Line Items," "Subtotal," "Tax Amount"
Low recognition accuracy
Ensure your testing documents are high-quality and representative
Try renaming fields to match exactly how they appear in the document
Add more descriptive context in field descriptions
Inconsistent extraction
Check for document variations causing the inconsistency
Test the model with more examples showing different variations
Consider splitting into multiple specialized models if variations are significant
Creating a custom AI model in smartextract empowers you to extract precisely the information you need from any document type. While the process requires some initial investment in setup and training, the resulting efficiency gains and accuracy improvements deliver significant returns through automated, consistent data extraction tailored specifically to your business documents.