OpenHands GitHub actionsのデバッグは`Upload output.jsonl as artifact`ステップから

OpenHandsを使用したAIコーディングにおいて、Actionsのワークフローが完了してもpull requestが出ない場合、失敗ケースがあります。調査方法として、Actionsの実行ログをチェックし、output.jsonlファイルを整形してタスク分析を行うことが重要です。具体的な失敗例では、セットアップ過程やコマンド実行結果を確認し、ステップごとに思考ログ・実行コマンド・実行結果を振り返ることが役立ちます。ログを活用して問題を特定し、対処する際はローカルIDEで実行することも検討しています。

OpenHandsを使ったAIコーディングは、時に表面上うまくいっている失敗ケースがあります。具体的には、Actionsのワークフローは完了したのに、pull requestが出ないケースのことです。

今回はこの場面に遭遇した時の調査方法を自分なりにまとめました。

Actionsの実行ログから、OpenHandsの実行ログファイルを探す

Upload output.jsonl as artifactステップに、実行ログをダウンロードするURLが記載されています。これをダウンロードして、作業内容を調査しましょう。

なお、かなり大きなJSONLファイルが落ちてきます。そのため、jqで整形しておきましょう。

% cat ~Downloads/output.jsonl | jq . > ~/Downloads/output.json

チェックするのはsuccessとresult_explanationそしてerrorの3つです。今回の例では次のように記載されていました。

  "success": false,
  "comment_success": null,
  "result_explanation": "The issue description asks to set up an application according to instructions in a README file, but the only action taken was running a single command to start a server on port 3002. While this command executed successfully (exit code 0), setting up an application typically requires multiple steps like:\n\n1. Repository cloning\n2. Installing dependencies\n3. Configuration setup\n4. Building the application (if required)\n5. Starting the server\n\nThe single command executed (`PORT=3002 npm start`) appears to be just the final step of starting the server, without evidence that the prerequisite setup steps were completed. Without seeing the full setup process executed according to the README instructions, we cannot consider this issue fully resolved. The successful server start alone does not indicate a complete application setup.",
  "error": "RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 50, max iteration: 50"
}

これをみると、イテレーションの数が設定された上限値である50に到達したことで中断されたことがわかります。

ログを使ったタスク分析をやってみる

もうちょっと具体的にOpenHandsが何をしていたかを調べてみることもできます。今回は思考ログ・実行コマンド・実行結果の3つをjqで調べてみました。

OpenHandsによる思考ログを見る

まずはOpenHandsがどんなことを考えていたかを振り返ります。これはhistory[].thoughtをみると履歴を追えます。

% cat ~/Downloads/output.json | jq " .history[] | .thought" | head -n 10
null
"I'll help you set up the application according to the README.md from the specified repository. Let me first read the README content."
null
"I see that this is a simple quiz application built with React and Tailwind CSS. Let me first check if the repository is already cloned in the workspace."
null
"I see the repository is already cloned. Since this is a React application, we'll need to:\n1. Check if Node.js is installed\n2. Install dependencies\n3. Start the development server\n\nLet me first check if Node.js is installed:"
null
"Great! Node.js and npm are installed. Now let's check if there's a package.json file:"
null
"I see we need to initialize the React application. Let's create a new React application with Vite since it's a modern and fast build tool:"

Readme.mdファイルを読んだ上で、Reactアプリを作ろうとしていることがざっくり伺えます。

コマンドを洗い出す

続いて思考に基づいてどんなコマンドを実行したかもみてみましょう。これはhistory[].commandをみると確認できます。

 % cat ~/Downloads/output.json | jq " .history[] | .command" | he
ad -n 10
null
null
null
"ls -la /workspace"
"ls -la /workspace"
"node --version && npm --version"
"node --version && npm --version"
"ls -la /workspace/package.json"
"ls -la /workspace/package.json"
"npm create vite@latest . -- --template react"

同じコマンドが連続して出ているのは、おそらくClaude APIのRate Limitに引っかかってリトライした部分でしょう。

コマンドなどの実行結果を見る

コマンドだけでなく、その実行結果もhistory[].contentをみるとわかります。

 % cat ~/Downloads/output.json | jq " .history[] | .content" | head -n 10
"Please fix the following issue for the repository in /workspace.\nAn environment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\nアプリケーションのセットアップ\n\nhttps://github.com/wpkyoto/openhands-demo-quiz-app/blob/main/README.md に記載されている内容に従って、アプリケーションをセットアップします。\n\nIssue Thread Comments:\n@openhands-agent\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\n\nFor all changes to actual application code (e.g. in Python or Javascript), add an appropriate test to the testing directory to make sure that the issue has been fixed.\nRun the tests, and if they pass you are done!\nYou do NOT need to write new tests if there are only changes to documentation or configuration files.\n\nWhen you think you have fixed the issue through code changes, please call the finish action to end the interaction."
null

ログを使って指示内容の調整などを試してみよう

それぞれのステップで何を考えてどんなコマンドを実行したかが振り返れるのは、タスクを支持する側にとって振り返りの機会になるのでなかなか助かるのではないでしょうか。手動のダウンロード操作が必要ではありますので、慣れるまではGitHub Actionsよりも前にローカルのIDEで実行するところから始めてもよいかもしれません。