[LangChain.jsでいろんなRAGを作る] 一度生成した回答文章を用いて、もう一度検索とテキスト生成を実施する

RAGを作る場合、検索処理をどのように実装するかや、テキストをどのように生成するかなど、さまざまな調整・実装方法があります。今回は「一度RAGで処理した結果を用いて、もう一度RAGを実行する方法」を紹介します。

なぜもう一度RAG処理を実施するのか

考え的には、「仮説的な回答文章を作成して、それを用いて検索・テキスト生成を行う手法」のHyDE (Hypothetical Document Embedding)の変形パターンです。一度質問文と検索結果から回答文章を生成し、その回答文章を利用して、再度検索処理と回答文章の生成を行います。

これは、ベクトル検索が類似性を見る仕組みであることから来ている手法で、簡潔にいうなれば「質問文より回答文章の方が、類似性の高いデータを取得できそうだよね」という考え方です。HyDEの場合は仮説的な回答をLLMに生成させますが、今回紹介する手法では、一度質問文から回答文章を実際に生成させています。

一度目の回答を生成する

LangChain.jsで実装してみましょう。まずはシンプルなRAGを実装します。ここでは与えられたコンテキスト（検索結果）を利用して回答を生成するqaChainと、Cloudflare Vectorizeを利用したベクトル検索処理の2つをLCEL(RunnableSequence)でつなぎ合わせています。


  const question = "HonoをAWS Lambdaで使う方法"
  const chatModel = new ChatOpenAI({
    modelName: "gpt-4",
    temperature: 0,
    openAIApiKey: c.env.OPENAI_API_KEY
  });;
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: c.env.OPENAI_API_KEY
  });
  const vectorStore = new CloudflareVectorizeStore(embeddings, {
    index: c.env.VECTORIZE_INDEX,
  });
  const qaChain = RunnableSequence.from([
    ChatPromptTemplate.fromMessages([
      [
        'system',
        `You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
        
        Context:
        {context} 
        `,
      ],
      ["human", "{question}"],
    ]),
    chatModel,
    new StringOutputParser()    
  ])
  
  const iterativeQAChain = RunnableSequence.from([
    {
      question: input => input.question,
      context: async input => {
        const searchResult = await vectorStore.similaritySearch(input.question, 20)
        return searchResult.map(result => result.pageContent).join('\n====\n')
      }
    },
    {
      question: input =>  input.question,
      context: input => input.context,
      answer: qaChain
    },
    chatModel,
    new StringOutputParser()  
  ])

1回目の結果を利用した検索とテキスト生成処理を追加する

今回の変則HyDEでは、先ほど実装した処理で生成された回答文章を利用してもう一度検索を行います。qaChainを呼び出すステップの後ろに、ベクトル検索を実行するステップと、その結果を利用して回答文章を生成するステップの2つを追加しました。

app.get('/iterative-rag', async c => {
  const question = "HonoをAWS Lambdaで使う方法"
  const chatCloudflare = new ChatOpenAI({
    modelName: "gpt-4",
    temperature: 0,
    openAIApiKey: c.env.OPENAI_API_KEY
  });;
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: c.env.OPENAI_API_KEY
  });
  const vectorStore = new CloudflareVectorizeStore(embeddings, {
    index: c.env.VECTORIZE_INDEX,
  });
  const qaChain = RunnableSequence.from([
    ChatPromptTemplate.fromMessages([
      [
        'system',
        `You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
        
        Context:
        {context} 
        `,
      ],
      ["human", "{question}"],
    ]),
    chatCloudflare,
    new StringOutputParser()    
  ])
  
  const iterativeQAChain = RunnableSequence.from([
    {
      question: input => input.question,
      context: async input => {
        const searchResult = await vectorStore.similaritySearch(input.question, 20)
        return searchResult.map(result => result.pageContent).join('\n====\n')
      }
    },
    {
      question: input =>  input.question,
      context: input => input.context,
      answer: qaChain
    },
    {
      question: input =>  input.question,
      context: async input => {
        const searchResult = await vectorStore.similaritySearch(input.answer, 20)
        return searchResult.map(result => result.pageContent).join('\n====\n')
      },
      answer: input => input.answer
    },
    ChatPromptTemplate.fromMessages([
      [
        'system',
        `
        You are tasked with enhancing the provided answer by integrating the additional context information supplied. If the original answer is incomplete or lacks detail, use the additional context to fill in the gaps and provide a more comprehensive response. If you don't know the answer after considering the additional context, it's okay to say that you don't know. Aim to keep your enhanced answer concise, using no more than three sentences.

        ## Original Answer:
        {answer}
        
        ## Additional Context:
        {context}

        **Enhanced Answer**: Revise the original answer by incorporating the additional context information. This should result in a response that is more informative and precise, directly addressing the question with the newfound insights.
        `,
      ],
      ["human", "{question}"],
    ]),
    chatCloudflare,
    new StringOutputParser()  
  ])
  return c.json(await iterativeQAChain.invoke({question}))
})

2回目のテキスト生成Chainでは、「contextの情報を踏まえて、answerの文章を改善してください（意訳）」という指示を出しています。また、2回目のベクトル検索のクエリについても、qaChainの結果であるanswerを利用するようにしています。

おわりに

体感としては、そこまで劇的な回答文章の変化をみることはできませんでした。しかしこれは元のデータの前処理（HTMLタグなどの除去やchunk作成）まわりが不十分だった可能性も高い様子がありますので、一度の経験だけで効果がないと判断するのは避けたいと思います。

インデックス生成周りやデータの前処理、ハイブリット検索などの仕組みを追加で学んだあと、もう一度試してみたいと思います。

[LangChain.jsでいろんなRAGを作る] 一度生成した回答文章を用いて、もう一度検索とテキスト生成を実施する

なぜもう一度RAG処理を実施するのか

一度目の回答を生成する

1回目の結果を利用した検索とテキスト生成処理を追加する

おわりに

ブックマークや限定記事（予定）など

Related Category posts

HonoでCloudflare Pagesを作りつつ、wrangler.tomlを使ってVectorizeをよびだしてみた

外部APIを利用したRAGをLangChain.jsのLCELだけで作る2 – 部分的なベクトル検索を採用する

[LangChain.jsでいろんなRAGを作る]LangChain.jsのRunnableLambdaで入力値を動的に処理する

[LangChain.jsでいろんなRAGを作る]Cloudflare Workers AIで作ったRAGに翻訳機能を追加してみた

LangChain.jsでCloudflare Workers AIの翻訳モデルを利用する