8.1. Document to API Call
Overview
Apply AI to handle incoming automated requests by parsing generated content and turning it into precise API calls. A common example: tenders or RFQs arrive as emails with PDF/Word attachments; the goal is to parse the message and documents, extract product lines, map them to your catalog, and automatically create Opportunities and Line Items in systems like Salesforce.
The Phased AI Solution
- Extract Products
→ AI identifies and pulls product names, specifications, and quantities from tender documents. → Proves the core ability to read complex text.
- Map Products
→ Matches extracted names to the official product catalog. → Translates customer terminology into internal product codes/SKUs.
- Create Quotes
→ Generates an automated quote based on the mapped products. → Depends entirely on the success of the first two phases.
Two-step approach
Step 1: Document Processing • Convert tender documents (PDFs/Word) and email bodies into a structured digital format (JSON). • This creates a foundation of clean, searchable, and structured data.
Step 2: Intelligent Extraction • Feed the JSON into a prompt template and then to an LLM. • Use a specialized prompt with detailed instructions to guide the model. • Extract specific product information with high accuracy.
Use case and scope
Inputs: RFC-822 email (MIME) and attachments (PDF/Docx), languages such as English/German.
Entities: Product mentions (name/spec/qty/unit/packaging/standards), optional price.
Outputs: Structured JSON for extracted items; mapped SKUs with confidence; API payload for CRM.
Targets: CRM objects such as Salesforce (Account/Contact, Opportunity, OpportunityLineItem, Files/Notes) or other CRMs via an adapter.
Users: Sales Ops/Bid Desk; optional human-in-the-loop for low-confidence lines.
Architecture (POC)
Ingest: webhook/IMAP/SES → queue → worker.
Parsing: text + tables; OCR for scanned PDFs.
LLM extraction: JSON-only mode, validated by schema.
Catalog mapping: hybrid search (lexical + embeddings) + rules and unit constraints.
Review: approve/correct low-confidence items.
Integration: create CRM records via REST/Composite API with retries.
Observability: logs, metrics, costs, drift.
Data flow and contracts
Input: email body and attachments.
Intermediate: structured JSON from parsing (Step 1) and refined JSON from LLM (Step 2).
Output: mapped decisions with confidences and a ready-to-send CRM payload.
Extraction schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "TenderExtraction",
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"raw_mention": {"type": "string"},
"normalized_name": {"type": "string"},
"qty": {"type": "number", "minimum": 0},
"unit": {"type": "string"},
"packaging": {"type": ["string", "null"]},
"standards": {"type": ["array", "null"], "items": {"type": "string"}},
"price": {"type": ["number", "null"], "minimum": 0},
"context_span": {"type": "string"},
"notes": {"type": ["string", "null"]}
},
"required": ["raw_mention", "normalized_name", "qty", "unit", "context_span"]
},
"minItems": 1
},
"source_language": {"type": ["string", "null"]}
},
"required": ["items"]
}
Prompt template
SYSTEM:
You are an expert tender analyst. Extract product items from text and tables.
Output JSON only. No prose. Follow the schema exactly.
SCHEMA:
{
"items": [
{
"raw_mention": "string",
"normalized_name": "string",
"qty": 0,
"unit": "pcs|m|kg|l|box|pack|set|...",
"packaging": "string|null",
"standards": ["string", "..."] | null,
"price": 0 | null,
"context_span": "string",
"notes": "string|null"
}
],
"source_language": "en|de|..."
}
INSTRUCTIONS:
- Parse line items, tables, and free text.
- Prefer quantities and units from tables; normalize unit spellings.
- Keep context_span as a short quote of the source line.
- If price is not present, set it to null.
- If packaging or standards are not present, set them to null.
USER:
{{INPUT_TEXT}}
Mapping logic (sketch)
from __future__ import annotations
from typing import List, Tuple, Dict
from .models.schemas import ExtractedItem, MappingDecision
def candidate_generation(query: str) -> List[Tuple[str, str]]:
"""Return candidate (sku, name) pairs. Placeholder for BM25 + embeddings."""
# In a real system, query Postgres/OpenSearch and pgvector for hybrid recall
return [
("SKU-001", "Steel Rod EN 10060, 10mm"),
("SKU-002", "Steel Rod EN 10060, 12mm"),
]
def score_candidate(item: ExtractedItem, candidate: Tuple[str, str]) -> float:
"""Score based on lexical similarity, UoM compatibility, and standards hints."""
sku, name = candidate
score = 0.0
if item.get("normalized_name") and item["normalized_name"].lower() in name.lower():
score += 0.6
if any(k in name.lower() for k in ["en 10060", "iso", "din"]):
score += 0.2
# toy heuristic for UoM presence
if item.get("unit"):
score += 0.2
return min(score, 1.0)
def map_items(items: List[ExtractedItem]) -> List[MappingDecision]:
decisions: List[MappingDecision] = []
for item in items:
cands = candidate_generation(item.get("normalized_name") or item.get("raw_mention", ""))
scored = [(cand, score_candidate(item, cand)) for cand in cands]
scored.sort(key=lambda x: x[1], reverse=True)
best, conf = (scored[0][0], scored[0][1]) if scored else ((None, None), 0.0)
decision: MappingDecision = {
"extracted": item,
"candidate_sku": best[0] if best[0] else None,
"candidate_name": best[1] if best[1] else None,
"confidence": conf,
"rationale": "toy matcher: lexical + hints",
}
decisions.append(decision)
return decisions
Salesforce payload (sketch)
from __future__ import annotations
from typing import List, Dict, Any
from ..models.schemas import MappingDecision
def build_sfdc_composite_payload(
account_external_id: str,
opportunity_name: str,
close_date: str,
stage_name: str,
pricebook2_id: str,
decisions: List[MappingDecision],
) -> Dict[str, Any]:
"""Build a Salesforce Composite API payload to create Opportunity and Line Items.
Assumes candidate_sku maps to a PricebookEntry via an External ID.
Replace references with your org's External IDs and fields.
"""
ref_opp = "refOpp"
composite: List[Dict[str, Any]] = []
# Create Opportunity
composite.append(
{
"method": "POST",
"url": "/services/data/v59.0/sobjects/Opportunity",
"referenceId": ref_opp,
"body": {
"Name": opportunity_name,
"StageName": stage_name,
"CloseDate": close_date,
"Pricebook2Id": pricebook2_id,
"Account__c": account_external_id, # replace with your lookup strategy
"Source__c": "Tender Intake",
},
}
)
# Create line items for candidates with a mapping
for idx, d in enumerate(decisions):
if not d.get("candidate_sku"):
continue
composite.append(
{
"method": "POST",
"url": "/services/data/v59.0/sobjects/OpportunityLineItem",
"referenceId": f"refLine{idx}",
"body": {
"OpportunityId": f"@{{{ref_opp}.id}}",
"PricebookEntryId": f"@{{refPbe{idx}.id}}",
"Quantity": d["extracted"].get("qty", 1),
},
}
)
# Resolve PricebookEntry by SKU external id
composite.append(
{
"method": "GET",
"url": f"/services/data/v59.0/query?q=SELECT+Id+FROM+PricebookEntry+WHERE+Product2.ExternalId__c='{d['candidate_sku']}'+AND+Pricebook2Id='{pricebook2_id}'+LIMIT+1",
"referenceId": f"refPbe{idx}",
}
)
return {"allOrNone": True, "compositeRequest": composite}
Pipeline skeleton
from __future__ import annotations
from typing import Dict, Any
from ..models.schemas import TenderExtraction, ExtractedItem
from ..mapping.catalog_mapping import map_items
from ..integrations.crm_adapter import build_crm_payload
def parse_email_and_docs_to_json(email_body: str, attachments: bytes) -> TenderExtraction:
"""Step 1: Document Processing. Placeholder converting email/PDF into structured JSON.
Replace with real parsing: MIME processing, PDF table extraction, OCR if needed.
"""
# Toy output for illustration
return {
"items": [
{
"raw_mention": "Steel Rod EN 10060 10mm qty 100",
"normalized_name": "Steel Rod EN 10060 10mm",
"qty": 100,
"unit": "pcs",
"packaging": None,
"standards": ["EN 10060"],
"price": None,
"context_span": "Table row 4",
"notes": None,
}
],
"source_language": "en",
}
def extract_with_llm(parsed_json: TenderExtraction, prompt_template: str) -> TenderExtraction:
"""Step 2: Intelligent Extraction. Placeholder that would call an LLM in production.
Here we simply return the parsed_json to illustrate the flow.
"""
return parsed_json
def run_pipeline(email_body: str, attachments: bytes, prompt: str, *, crm_provider: str = "salesforce") -> Dict[str, Any]:
"""End-to-end orchestration following the Phased AI Solution.
Phase 1: Extract Products (parse + LLM)
Phase 2: Map Products (catalog mapping)
Phase 3: Create Quotes (quote payload) and/or CRM payload
"""
# Phase 1
parsed = parse_email_and_docs_to_json(email_body, attachments)
extracted = extract_with_llm(parsed, prompt)
# Phase 2
decisions = map_items(extracted["items"]) # mapping results with confidence
# Phase 3 — vendor-agnostic adapter (Salesforce is one option)
crm_payload = build_crm_payload(
provider=crm_provider,
decisions=decisions,
customer_id="ACME-EXT-123",
currency="EUR",
sfdc_opportunity_name="Tender Intake - ACME",
sfdc_close_date="2025-09-30",
sfdc_stage_name="Qualification",
sfdc_pricebook2_id="01sXXXXXXXXXXXX",
)
return {"crm": crm_payload}
CRM adapter (select provider)
from __future__ import annotations
from typing import Any, Dict, List, Optional
from ..models.schemas import MappingDecision
from .salesforce_payload import build_sfdc_composite_payload
from .quotes_payload import build_quote_payload
def build_crm_payload(
provider: str,
decisions: List[MappingDecision],
*,
customer_id: str,
currency: str = "EUR",
# Salesforce-specific (optional)
sfdc_opportunity_name: Optional[str] = None,
sfdc_close_date: Optional[str] = None,
sfdc_stage_name: Optional[str] = None,
sfdc_pricebook2_id: Optional[str] = None,
) -> Dict[str, Any]:
"""Return a payload for the selected CRM provider.
- provider='salesforce': returns a Composite API payload to create Opp + Line Items.
- other: returns a generic quote payload suitable for custom or alternate CRMs.
"""
if provider.lower() == "salesforce":
if not (sfdc_opportunity_name and sfdc_close_date and sfdc_stage_name and sfdc_pricebook2_id):
raise ValueError("Missing Salesforce parameters for CRM adapter")
return build_sfdc_composite_payload(
account_external_id=customer_id,
opportunity_name=sfdc_opportunity_name,
close_date=sfdc_close_date,
stage_name=sfdc_stage_name,
pricebook2_id=sfdc_pricebook2_id,
decisions=decisions,
)
# Default: vendor-agnostic quote payload
return build_quote_payload(customer_id=customer_id, currency=currency, decisions=decisions)
Quotes payload (generic)
from __future__ import annotations
from typing import List, Dict, Any
from ..models.schemas import MappingDecision
def build_quote_payload(
customer_id: str,
currency: str,
decisions: List[MappingDecision],
) -> Dict[str, Any]:
"""Generic quote payload independent of any specific CRM.
The payload contains the customer reference, currency, and quote line items
with SKU, name, qty, unit, and price if available.
"""
items: List[Dict[str, Any]] = []
for d in decisions:
if not d.get("candidate_sku"):
continue
ext = d["extracted"]
items.append(
{
"sku": d["candidate_sku"],
"name": d.get("candidate_name"),
"qty": ext.get("qty", 1),
"unit": ext.get("unit", "pcs"),
"price": ext.get("price"), # may be null; pricing can be resolved later
"confidence": d.get("confidence", 0.0),
}
)
return {
"customer_id": customer_id,
"currency": currency,
"items": items,
"notes": "Auto-generated from tender intake",
}
Operational notes
Guardrails: enforce JSON-only LLM output; validate against schema; cap tokens; redact PII.
Constraints: only map active SKUs with valid pricebook entries and unit compatibility.
HITL: show source snippet, extracted line, candidate SKUs with confidence; approve/correct.
Retries/backoff: handle 429/5xx from CRM APIs.
Metrics: extraction accuracy (precision/recall/F1), mapping confidence histogram, turnaround time, API errors.
Delivery outline
MVP: Ingest → Parse → Extract → Map → Create Opportunity/Line Items (fixed pricebook); small reviewer UI; a golden set for regression.
Iteration: composite API, account match via domain/external ID, synonyms cache, telemetry dashboard.
V1+: improved OCR for tables, additional languages, canary prompts, role-based reviewer.
Portability
The examples are provider-agnostic. Swap AWS components for GCP/Azure equivalents. Replace CRM integration with your target system while keeping the contracts and steps unchanged.