Class ExtractionEngine

java.lang.Object
dyntabs.ai.extract.ExtractionEngine

public final class ExtractionEngine extends Object
The extraction assembly line: prompt the model with a JSON skeleton, read its answer, parse it into the target type, and (optionally) validate it — retrying if the model's first answer is not valid JSON.

Analogy: RagEngine is to retrieval what ExtractionEngine is to structured output — the low-level worker that ExtractionBuilder delegates to once the caller has chosen a type and options. Think of it as a factory line: SchemaDescriber stamps the blank form, the model fills it in, Gson presses it into a Java object, and a quality-control step (retry + optional Bean Validation) rejects defective parts.

It is provider-agnostic: instead of relying on a provider-specific JSON response mode, it instructs the model to emit JSON and then tolerantly extracts the JSON object from the reply (stripping markdown fences or stray prose). This keeps it working across OpenAI, Groq, Ollama, and any other LangChain4J ChatModel.

See Also:
  • Method Details

    • extract

      public static <T> T extract(dev.langchain4j.model.chat.ChatModel model, Class<T> type, String content, int maxRetries, boolean validate)
      Runs the full extraction for one piece of content.

      Called by ExtractionBuilder.from(String) after the builder has resolved the model and options.

      Type Parameters:
      T - the target type
      Parameters:
      model - the chat model to query (real or a test mock)
      type - the class to extract (record or POJO)
      content - the source text to extract from
      maxRetries - how many additional attempts to make if the model returns unparseable JSON (0 = a single attempt)
      validate - whether to run Jakarta Bean Validation on the result
      Returns:
      a populated instance of type
      Throws:
      ExtractionException - if no valid JSON could be parsed within the retries, or if validation is enabled and the result is invalid
    • extract

      public static <T> T extract(dev.langchain4j.model.chat.ChatModel model, Class<T> type, String content, int maxRetries, boolean validate, EventEmitter emitter)
      Same as extract(ChatModel, Class, String, int, boolean), but additionally narrates its progress to the given EventEmitter.

      Called by ExtractionBuilder.from(String). Emits a STARTED event up front, a PROGRESS event when the model is queried, a RETRY event for each re-attempt on malformed JSON, and a terminal RESULT (success) or ERROR event. The emitter is a no-op when no listener was registered, so this path is free when nobody is observing.

      Type Parameters:
      T - the target type
      Parameters:
      model - the chat model to query (real or a test mock)
      type - the class to extract (record or POJO)
      content - the source text to extract from
      maxRetries - how many additional attempts on unparseable JSON (0 = a single attempt)
      validate - whether to run Jakarta Bean Validation on the result
      emitter - the live-event emitter (never null; pass a no-op emitter to disable)
      Returns:
      a populated instance of type
      Throws:
      ExtractionException - if no valid JSON could be parsed within the retries, or if validation is enabled and the result is invalid