Package dyntabs.ai.rag
Class RagEngine
java.lang.Object
dyntabs.ai.rag.RagEngine
RAG engine that loads documents, splits them, embeds them,
and provides a
ContentRetriever for use with AI assistants.
Uses LangChain4J easy-rag module which includes Tika for PDF/DOCX
and a local embedding model.-
Method Summary
Modifier and TypeMethodDescriptionstatic dev.langchain4j.rag.content.retriever.ContentRetrievercreateRetriever(EasyRAG ragAnnotation) Creates aContentRetrieverbased on theEasyRAGannotation.static dev.langchain4j.rag.content.retriever.ContentRetrievercreateRetriever(String[] sources, int maxResults, double minScore) Creates aContentRetrieverprogrammatically.static dev.langchain4j.rag.content.retriever.ContentRetrievercreateRetriever(List<DocumentSource> documentSources, int maxResults, double minScore) Creates aContentRetrieverfrom in-memory document sources.static List<dev.langchain4j.data.document.Document> loadDocuments(String[] sources) Loads documents from path-based sources (classpath, file system, or relative paths) into LangChain4JDocuments.static List<dev.langchain4j.data.document.Document> parseDocumentSources(List<DocumentSource> sources) Parses in-memory document sources (byte arrays) into LangChain4JDocuments using Apache Tika.
-
Method Details
-
createRetriever
public static dev.langchain4j.rag.content.retriever.ContentRetriever createRetriever(EasyRAG ragAnnotation) Creates aContentRetrieverbased on theEasyRAGannotation. -
createRetriever
public static dev.langchain4j.rag.content.retriever.ContentRetriever createRetriever(String[] sources, int maxResults, double minScore) Creates aContentRetrieverprogrammatically. -
createRetriever
public static dev.langchain4j.rag.content.retriever.ContentRetriever createRetriever(List<DocumentSource> documentSources, int maxResults, double minScore) Creates aContentRetrieverfrom in-memory document sources.Use this when documents come from a DMS, database, REST API, or any source that provides content as
byte[].- Parameters:
documentSources- the documents as byte arraysmaxResults- maximum relevant segments to retrieveminScore- minimum relevance score (0.0 to 1.0)- Returns:
- a configured ContentRetriever
-
parseDocumentSources
public static List<dev.langchain4j.data.document.Document> parseDocumentSources(List<DocumentSource> sources) Parses in-memory document sources (byte arrays) into LangChain4JDocuments using Apache Tika.Shared loader: called by the in-memory RAG path here in
RagEngineand by the Milvus ingestion path (EasyIndexer.index(DocumentSource...)), so byte-array documents are parsed identically whether they end up in memory or in Milvus.- Parameters:
sources- the documents as byte arrays (from a DMS, DB BLOB, upload, etc.)- Returns:
- the parsed documents; any source that fails to parse is logged and skipped
-
loadDocuments
Loads documents from path-based sources (classpath, file system, or relative paths) into LangChain4JDocuments.Shared loader: called by the annotation/programmatic RAG paths here in
RagEngineand by the Milvus ingestion path (EasyIndexer.index(String...)), so a"classpath:","file:", or bare path string resolves the same way regardless of destination.- Parameters:
sources- one or more paths, each optionally prefixed withclasspath:orfile:- Returns:
- the loaded documents; any source that fails to load is logged and skipped
-