Package dyntabs.ai

Class ExtractionBuilder<T>

java.lang.Object
dyntabs.ai.ExtractionBuilder<T>
Type Parameters:
T - the type to extract

public class ExtractionBuilder<T> extends Object
Builds a one-shot, typed extraction: turn unstructured text or a document into a populated Java object (record or POJO).

Analogy: this is the bridge from the "AI / unstructured" world into your normal typed-Java world. After .from(...) returns, no AI is involved any more — you hold a plain Invoice, Order, or Candidate that your existing code, JPA entities, EJBs, and PrimeFaces forms already know how to handle.

You never construct this directly — start from EasyAI.extract(Class):


 record Invoice(String vendor, String invoiceNumber, java.time.LocalDate date,
                java.math.BigDecimal total, java.util.List<LineItem> items) {}

 // From free text (an email body, a chat message, a description)
 Invoice inv = EasyAI.extract(Invoice.class).from(emailBody);

 // From a document's bytes - parses (Tika) AND extracts in one call
 Invoice inv = EasyAI.extract(Invoice.class)
                     .from(DocumentSource.of("invoice.pdf", pdfBytes));

 // Then it is just data:
 em.persist(inv);
 if (inv.total().compareTo(LIMIT) > 0) approvalService.require(inv);
 

Robustness is built in: if the model returns malformed JSON, the extraction retries (see withRetries(int)); enable validate() to additionally run Jakarta Bean Validation on the result.

See Also:
  • Method Details

    • withModel

      public ExtractionBuilder<T> withModel(String modelName)
      Overrides the model name for this extraction (e.g. "gpt-4o", "llama3").
      Parameters:
      modelName - the model name
      Returns:
      this builder
    • withApiKey

      public ExtractionBuilder<T> withApiKey(String apiKey)
      Overrides the API key for this extraction.
      Parameters:
      apiKey - the API key
      Returns:
      this builder
    • withProvider

      public ExtractionBuilder<T> withProvider(String provider)
      Overrides the provider ("openai" or "ollama") for this extraction.
      Parameters:
      provider - the provider name
      Returns:
      this builder
    • withBaseUrl

      public ExtractionBuilder<T> withBaseUrl(String baseUrl)
      Overrides the API base URL (proxies, Azure OpenAI, self-hosted endpoints).
      Parameters:
      baseUrl - the base URL
      Returns:
      this builder
    • withTemperature

      public ExtractionBuilder<T> withTemperature(double temperature)
      Overrides the sampling temperature. Extraction defaults to 0.0 (deterministic); raise it only if you have a reason to.
      Parameters:
      temperature - value between 0.0 and 1.0
      Returns:
      this builder
    • withRetries

      public ExtractionBuilder<T> withRetries(int retries)
      Sets how many additional attempts to make if the model returns malformed JSON.

      The default is 2 (so up to three calls in total). Set 0 to disable retrying.

      Parameters:
      retries - number of retries on unparseable output (must be >= 0)
      Returns:
      this builder
    • validate

      public ExtractionBuilder<T> validate()
      Enables Jakarta Bean Validation on the extracted object.

      When enabled, constraints such as @NotNull, @Size, or @Min declared on the target type are checked after extraction; a violation throws ExtractionException. Requires a Bean Validation provider (e.g. Hibernate Validator) on the classpath — present by default in a Jakarta EE container.

      Returns:
      this builder
    • withChatModel

      public ExtractionBuilder<T> withChatModel(dev.langchain4j.model.chat.ChatModel model)
      Injects an externally created ChatModel, bypassing easyai.properties and EasyAI.configure(). Mainly for testing with a mock model.
      Parameters:
      model - a pre-built ChatModel instance
      Returns:
      this builder
    • withEventListener

      public ExtractionBuilder<T> withEventListener(EasyAIListener eventListener)
      Registers a listener that receives a live EasyAIEvent stream as the extraction runs: EasyAIEvent.Phase.STARTED when it begins, a EasyAIEvent.Phase.PROGRESS when the model is queried, a EasyAIEvent.Phase.RETRY for each re-attempt on malformed JSON, and a final EasyAIEvent.Phase.RESULT (or EasyAIEvent.Phase.ERROR).

      Familiar analogy: a "your form is being processed" status bar — you see it parse, stumble, retry, and finally hand you the finished, typed object.

      Parameters:
      eventListener - the listener to receive extraction events (may be null)
      Returns:
      this builder
      See Also:
    • from

      public T from(String text)
      Extracts the target type from a plain text string.

      Terminal step of the EasyAI.extract(Type.class).from(...) chain. Resolves the model, then delegates to ExtractionEngine.extract(ChatModel, Class, String, int, boolean).

      Parameters:
      text - the source content (email body, message, description, etc.)
      Returns:
      a populated instance of the target type
      Throws:
      ExtractionException - if extraction (or validation, if enabled) fails
    • from

      public T from(DocumentSource source)
      Extracts the target type directly from a document's bytes.

      Parses the document (PDF, DOCX, TXT, ... via Apache Tika) into text using RagEngine.parseDocumentSources(List) and then extracts from that text — so parsing and extraction happen in a single call. Ideal for a PDF/DOCX pulled from a DMS, a database BLOB, or a user upload.

      Parameters:
      source - the document content + file name (extension drives parsing)
      Returns:
      a populated instance of the target type
      Throws:
      ExtractionException - if the document yields no text, or extraction/validation fails