8.1. Document to API Call

Overview

Apply AI to handle incoming automated requests by parsing generated content and turning it into precise API calls. A common example: tenders or RFQs arrive as emails with PDF/Word attachments; the goal is to parse the message and documents, extract product lines, map them to your catalog, and automatically create Opportunities and Line Items in systems like Salesforce.

The Phased AI Solution

  • Extract Products

    → AI identifies and pulls product names, specifications, and quantities from tender documents. → Proves the core ability to read complex text.

  • Map Products

    → Matches extracted names to the official product catalog. → Translates customer terminology into internal product codes/SKUs.

  • Create Quotes

    → Generates an automated quote based on the mapped products. → Depends entirely on the success of the first two phases.

Two-step approach

  • Step 1: Document Processing • Convert tender documents (PDFs/Word) and email bodies into a structured digital format (JSON). • This creates a foundation of clean, searchable, and structured data.

  • Step 2: Intelligent Extraction • Feed the JSON into a prompt template and then to an LLM. • Use a specialized prompt with detailed instructions to guide the model. • Extract specific product information with high accuracy.

Use case and scope

  • Inputs: RFC-822 email (MIME) and attachments (PDF/Docx), languages such as English/German.

  • Entities: Product mentions (name/spec/qty/unit/packaging/standards), optional price.

  • Outputs: Structured JSON for extracted items; mapped SKUs with confidence; API payload for CRM.

  • Targets: CRM objects such as Salesforce (Account/Contact, Opportunity, OpportunityLineItem, Files/Notes) or other CRMs via an adapter.

  • Users: Sales Ops/Bid Desk; optional human-in-the-loop for low-confidence lines.

Architecture (POC)

  • Ingest: webhook/IMAP/SES → queue → worker.

  • Parsing: text + tables; OCR for scanned PDFs.

  • LLM extraction: JSON-only mode, validated by schema.

  • Catalog mapping: hybrid search (lexical + embeddings) + rules and unit constraints.

  • Review: approve/correct low-confidence items.

  • Integration: create CRM records via REST/Composite API with retries.

  • Observability: logs, metrics, costs, drift.

Data flow and contracts

  • Input: email body and attachments.

  • Intermediate: structured JSON from parsing (Step 1) and refined JSON from LLM (Step 2).

  • Output: mapped decisions with confidences and a ready-to-send CRM payload.

Extraction schema

JSON schema for extracted items
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "TenderExtraction",
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "raw_mention": {"type": "string"},
          "normalized_name": {"type": "string"},
          "qty": {"type": "number", "minimum": 0},
          "unit": {"type": "string"},
          "packaging": {"type": ["string", "null"]},
          "standards": {"type": ["array", "null"], "items": {"type": "string"}},
          "price": {"type": ["number", "null"], "minimum": 0},
          "context_span": {"type": "string"},
          "notes": {"type": ["string", "null"]}
        },
        "required": ["raw_mention", "normalized_name", "qty", "unit", "context_span"]
      },
      "minItems": 1
    },
    "source_language": {"type": ["string", "null"]}
  },
  "required": ["items"]
}

Prompt template

Prompt for product extraction
SYSTEM:
You are an expert tender analyst. Extract product items from text and tables.
Output JSON only. No prose. Follow the schema exactly.

SCHEMA:
{
  "items": [
    {
      "raw_mention": "string",
      "normalized_name": "string",
      "qty": 0,
      "unit": "pcs|m|kg|l|box|pack|set|...",
      "packaging": "string|null",
      "standards": ["string", "..."] | null,
      "price": 0 | null,
      "context_span": "string",
      "notes": "string|null"
    }
  ],
  "source_language": "en|de|..."
}

INSTRUCTIONS:
- Parse line items, tables, and free text.
- Prefer quantities and units from tables; normalize unit spellings.
- Keep context_span as a short quote of the source line.
- If price is not present, set it to null.
- If packaging or standards are not present, set them to null.

USER:
{{INPUT_TEXT}}

Mapping logic (sketch)

Hybrid mapping toy example
from __future__ import annotations

from typing import List, Tuple, Dict
from .models.schemas import ExtractedItem, MappingDecision


def candidate_generation(query: str) -> List[Tuple[str, str]]:
    """Return candidate (sku, name) pairs. Placeholder for BM25 + embeddings."""
    # In a real system, query Postgres/OpenSearch and pgvector for hybrid recall
    return [
        ("SKU-001", "Steel Rod EN 10060, 10mm"),
        ("SKU-002", "Steel Rod EN 10060, 12mm"),
    ]


def score_candidate(item: ExtractedItem, candidate: Tuple[str, str]) -> float:
    """Score based on lexical similarity, UoM compatibility, and standards hints."""
    sku, name = candidate
    score = 0.0
    if item.get("normalized_name") and item["normalized_name"].lower() in name.lower():
        score += 0.6
    if any(k in name.lower() for k in ["en 10060", "iso", "din"]):
        score += 0.2
    # toy heuristic for UoM presence
    if item.get("unit"):
        score += 0.2
    return min(score, 1.0)


def map_items(items: List[ExtractedItem]) -> List[MappingDecision]:
    decisions: List[MappingDecision] = []
    for item in items:
        cands = candidate_generation(item.get("normalized_name") or item.get("raw_mention", ""))
        scored = [(cand, score_candidate(item, cand)) for cand in cands]
        scored.sort(key=lambda x: x[1], reverse=True)
        best, conf = (scored[0][0], scored[0][1]) if scored else ((None, None), 0.0)
        decision: MappingDecision = {
            "extracted": item,
            "candidate_sku": best[0] if best[0] else None,
            "candidate_name": best[1] if best[1] else None,
            "confidence": conf,
            "rationale": "toy matcher: lexical + hints",
        }
        decisions.append(decision)
    return decisions

Salesforce payload (sketch)

Composite API payload builder example
from __future__ import annotations

from typing import List, Dict, Any
from ..models.schemas import MappingDecision


def build_sfdc_composite_payload(
    account_external_id: str,
    opportunity_name: str,
    close_date: str,
    stage_name: str,
    pricebook2_id: str,
    decisions: List[MappingDecision],
) -> Dict[str, Any]:
    """Build a Salesforce Composite API payload to create Opportunity and Line Items.

    Assumes candidate_sku maps to a PricebookEntry via an External ID.
    Replace references with your org's External IDs and fields.
    """
    ref_opp = "refOpp"
    composite: List[Dict[str, Any]] = []

    # Create Opportunity
    composite.append(
        {
            "method": "POST",
            "url": "/services/data/v59.0/sobjects/Opportunity",
            "referenceId": ref_opp,
            "body": {
                "Name": opportunity_name,
                "StageName": stage_name,
                "CloseDate": close_date,
                "Pricebook2Id": pricebook2_id,
                "Account__c": account_external_id,  # replace with your lookup strategy
                "Source__c": "Tender Intake",
            },
        }
    )

    # Create line items for candidates with a mapping
    for idx, d in enumerate(decisions):
        if not d.get("candidate_sku"):
            continue
        composite.append(
            {
                "method": "POST",
                "url": "/services/data/v59.0/sobjects/OpportunityLineItem",
                "referenceId": f"refLine{idx}",
                "body": {
                    "OpportunityId": f"@{{{ref_opp}.id}}",
                    "PricebookEntryId": f"@{{refPbe{idx}.id}}",
                    "Quantity": d["extracted"].get("qty", 1),
                },
            }
        )
        # Resolve PricebookEntry by SKU external id
        composite.append(
            {
                "method": "GET",
                "url": f"/services/data/v59.0/query?q=SELECT+Id+FROM+PricebookEntry+WHERE+Product2.ExternalId__c='{d['candidate_sku']}'+AND+Pricebook2Id='{pricebook2_id}'+LIMIT+1",
                "referenceId": f"refPbe{idx}",
            }
        )

    return {"allOrNone": True, "compositeRequest": composite}

Pipeline skeleton

End-to-end POC pipeline steps
from __future__ import annotations

from typing import Dict, Any

from ..models.schemas import TenderExtraction, ExtractedItem
from ..mapping.catalog_mapping import map_items
from ..integrations.crm_adapter import build_crm_payload


def parse_email_and_docs_to_json(email_body: str, attachments: bytes) -> TenderExtraction:
    """Step 1: Document Processing. Placeholder converting email/PDF into structured JSON.
    Replace with real parsing: MIME processing, PDF table extraction, OCR if needed.
    """
    # Toy output for illustration
    return {
        "items": [
            {
                "raw_mention": "Steel Rod EN 10060 10mm qty 100",
                "normalized_name": "Steel Rod EN 10060 10mm",
                "qty": 100,
                "unit": "pcs",
                "packaging": None,
                "standards": ["EN 10060"],
                "price": None,
                "context_span": "Table row 4",
                "notes": None,
            }
        ],
        "source_language": "en",
    }


def extract_with_llm(parsed_json: TenderExtraction, prompt_template: str) -> TenderExtraction:
    """Step 2: Intelligent Extraction. Placeholder that would call an LLM in production.
    Here we simply return the parsed_json to illustrate the flow.
    """
    return parsed_json



def run_pipeline(email_body: str, attachments: bytes, prompt: str, *, crm_provider: str = "salesforce") -> Dict[str, Any]:
    """End-to-end orchestration following the Phased AI Solution.

    Phase 1: Extract Products (parse + LLM)
    Phase 2: Map Products (catalog mapping)
    Phase 3: Create Quotes (quote payload) and/or CRM payload
    """
    # Phase 1
    parsed = parse_email_and_docs_to_json(email_body, attachments)
    extracted = extract_with_llm(parsed, prompt)

    # Phase 2
    decisions = map_items(extracted["items"])  # mapping results with confidence

    # Phase 3 — vendor-agnostic adapter (Salesforce is one option)
    crm_payload = build_crm_payload(
        provider=crm_provider,
        decisions=decisions,
        customer_id="ACME-EXT-123",
        currency="EUR",
        sfdc_opportunity_name="Tender Intake - ACME",
        sfdc_close_date="2025-09-30",
        sfdc_stage_name="Qualification",
        sfdc_pricebook2_id="01sXXXXXXXXXXXX",
    )

    return {"crm": crm_payload}

CRM adapter (select provider)

Adapter: Salesforce or vendor-agnostic quote payload
from __future__ import annotations

from typing import Any, Dict, List, Optional
from ..models.schemas import MappingDecision
from .salesforce_payload import build_sfdc_composite_payload
from .quotes_payload import build_quote_payload


def build_crm_payload(
    provider: str,
    decisions: List[MappingDecision],
    *,
    customer_id: str,
    currency: str = "EUR",
    # Salesforce-specific (optional)
    sfdc_opportunity_name: Optional[str] = None,
    sfdc_close_date: Optional[str] = None,
    sfdc_stage_name: Optional[str] = None,
    sfdc_pricebook2_id: Optional[str] = None,
) -> Dict[str, Any]:
    """Return a payload for the selected CRM provider.

    - provider='salesforce': returns a Composite API payload to create Opp + Line Items.
    - other: returns a generic quote payload suitable for custom or alternate CRMs.
    """
    if provider.lower() == "salesforce":
        if not (sfdc_opportunity_name and sfdc_close_date and sfdc_stage_name and sfdc_pricebook2_id):
            raise ValueError("Missing Salesforce parameters for CRM adapter")
        return build_sfdc_composite_payload(
            account_external_id=customer_id,
            opportunity_name=sfdc_opportunity_name,
            close_date=sfdc_close_date,
            stage_name=sfdc_stage_name,
            pricebook2_id=sfdc_pricebook2_id,
            decisions=decisions,
        )

    # Default: vendor-agnostic quote payload
    return build_quote_payload(customer_id=customer_id, currency=currency, decisions=decisions)

Quotes payload (generic)

Quote payload builder (vendor-agnostic)
from __future__ import annotations

from typing import List, Dict, Any
from ..models.schemas import MappingDecision


def build_quote_payload(
    customer_id: str,
    currency: str,
    decisions: List[MappingDecision],
) -> Dict[str, Any]:
    """Generic quote payload independent of any specific CRM.

    The payload contains the customer reference, currency, and quote line items
    with SKU, name, qty, unit, and price if available.
    """
    items: List[Dict[str, Any]] = []
    for d in decisions:
        if not d.get("candidate_sku"):
            continue
        ext = d["extracted"]
        items.append(
            {
                "sku": d["candidate_sku"],
                "name": d.get("candidate_name"),
                "qty": ext.get("qty", 1),
                "unit": ext.get("unit", "pcs"),
                "price": ext.get("price"),  # may be null; pricing can be resolved later
                "confidence": d.get("confidence", 0.0),
            }
        )

    return {
        "customer_id": customer_id,
        "currency": currency,
        "items": items,
        "notes": "Auto-generated from tender intake",
    }

Operational notes

  • Guardrails: enforce JSON-only LLM output; validate against schema; cap tokens; redact PII.

  • Constraints: only map active SKUs with valid pricebook entries and unit compatibility.

  • HITL: show source snippet, extracted line, candidate SKUs with confidence; approve/correct.

  • Retries/backoff: handle 429/5xx from CRM APIs.

  • Metrics: extraction accuracy (precision/recall/F1), mapping confidence histogram, turnaround time, API errors.

Delivery outline

  • MVP: Ingest → Parse → Extract → Map → Create Opportunity/Line Items (fixed pricebook); small reviewer UI; a golden set for regression.

  • Iteration: composite API, account match via domain/external ID, synonyms cache, telemetry dashboard.

  • V1+: improved OCR for tables, additional languages, canary prompts, role-based reviewer.

Portability

The examples are provider-agnostic. Swap AWS components for GCP/Azure equivalents. Replace CRM integration with your target system while keeping the contracts and steps unchanged.