Class RagEngine

java.lang.Object
dyntabs.ai.rag.RagEngine

public final class RagEngine extends Object
RAG engine that loads documents, splits them, embeds them, and provides a ContentRetriever for use with AI assistants. Uses LangChain4J easy-rag module which includes Tika for PDF/DOCX and a local embedding model.
  • Method Summary

    Modifier and Type
    Method
    Description
    static dev.langchain4j.rag.content.retriever.ContentRetriever
    createRetriever(EasyRAG ragAnnotation)
    Creates a ContentRetriever based on the EasyRAG annotation.
    static dev.langchain4j.rag.content.retriever.ContentRetriever
    createRetriever(String[] sources, int maxResults, double minScore)
    Creates a ContentRetriever programmatically.
    static dev.langchain4j.rag.content.retriever.ContentRetriever
    createRetriever(List<DocumentSource> documentSources, int maxResults, double minScore)
    Creates a ContentRetriever from in-memory document sources.
    static List<dev.langchain4j.data.document.Document>
    loadDocuments(String[] sources)
    Loads documents from path-based sources (classpath, file system, or relative paths) into LangChain4J Documents.
    static List<dev.langchain4j.data.document.Document>
    Parses in-memory document sources (byte arrays) into LangChain4J Documents using Apache Tika.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • createRetriever

      public static dev.langchain4j.rag.content.retriever.ContentRetriever createRetriever(EasyRAG ragAnnotation)
      Creates a ContentRetriever based on the EasyRAG annotation.
    • createRetriever

      public static dev.langchain4j.rag.content.retriever.ContentRetriever createRetriever(String[] sources, int maxResults, double minScore)
      Creates a ContentRetriever programmatically.
    • createRetriever

      public static dev.langchain4j.rag.content.retriever.ContentRetriever createRetriever(List<DocumentSource> documentSources, int maxResults, double minScore)
      Creates a ContentRetriever from in-memory document sources.

      Use this when documents come from a DMS, database, REST API, or any source that provides content as byte[].

      Parameters:
      documentSources - the documents as byte arrays
      maxResults - maximum relevant segments to retrieve
      minScore - minimum relevance score (0.0 to 1.0)
      Returns:
      a configured ContentRetriever
    • parseDocumentSources

      public static List<dev.langchain4j.data.document.Document> parseDocumentSources(List<DocumentSource> sources)
      Parses in-memory document sources (byte arrays) into LangChain4J Documents using Apache Tika.

      Shared loader: called by the in-memory RAG path here in RagEngine and by the Milvus ingestion path (EasyIndexer.index(DocumentSource...)), so byte-array documents are parsed identically whether they end up in memory or in Milvus.

      Parameters:
      sources - the documents as byte arrays (from a DMS, DB BLOB, upload, etc.)
      Returns:
      the parsed documents; any source that fails to parse is logged and skipped
    • loadDocuments

      public static List<dev.langchain4j.data.document.Document> loadDocuments(String[] sources)
      Loads documents from path-based sources (classpath, file system, or relative paths) into LangChain4J Documents.

      Shared loader: called by the annotation/programmatic RAG paths here in RagEngine and by the Milvus ingestion path (EasyIndexer.index(String...)), so a "classpath:", "file:", or bare path string resolves the same way regardless of destination.

      Parameters:
      sources - one or more paths, each optionally prefixed with classpath: or file:
      Returns:
      the loaded documents; any source that fails to load is logged and skipped