To understand how this CLI Agent works, we need to peel back the layers. It's not just a script; it's a Cognitive Loop.
We are implementing the ReAct (Reasoning + Acting) pattern. Here is the blueprint:
Explanation of above blueprints:
1. User → Input Processor
Everything begins with the user’s query.
The input processor is responsible for validating the input, cleaning it, and enriching it with system-level context such as instructions, constraints, and guardrails. This is where prompt rules and system policies are applied.
Key point:
The model never receives raw user input directly.
2. Agent Controller (State)
The Agent Controller represents the system’s state and memory.
It stores:
- Conversation history
- Task goals
- Intermediate results
- Tool execution outcomes
This state enables continuity and decision-making across multiple steps.
Key point:
Without state, there is no agent — only a chatbot.
3. LLM Reasoning Engine
The LLM Reasoning Engine is responsible for understanding the problem and determining the next logical step.
All internal reasoning occurs within the model and remains hidden from the user. This ensures safety, reliability, and consistency with production standards.
Key point:
Raw chain-of-thought is not exposed. Reasoning happens internally.
4. Decision Layer (PLAN)
Instead of exposing internal reasoning, the system produces a structured and explainable plan.
The plan describes:
- What the agent intends to do next
- Whether a tool is required
- Whether the task can be completed
This provides transparency without revealing internal model reasoning.
Key principle:
The plan is explainable. The reasoning is private.
5. Tool Runner (Action)
When an action is required—such as running code, calling an API, or interacting with the file system—the agent invokes a tool.
The model selects the action, but execution is handled by the system.
Key point:
The model decides; the system executes.
6. Observation Layer
After a tool is executed, the system captures the result in the Observation layer.
This includes:
- Successful outputs
- Errors
- Logs and diagnostics
These observations are critical for informed decision-making in subsequent steps.
7. State Update and Loop
The observation is written back into the agent’s state.
If the task is not yet complete, the agent re-enters the decision process. This loop continues until the goal is satisfied or termination conditions are met.
Key point:
This feedback loop is what enables autonomous behavior.
8. Final Response
Once the agent determines that the task is complete, a final response is generated and returned to the user.
This response is concise, user-facing, and free of internal system details.
An AI agent is not just a language model.
It is a structured system that combines state, reasoning, planning, actions, observation, and feedback loops to solve problems autonomously.
The Power of "Chain of Thought"-
Most simple chatbots operate on System 1 thinking: fast, instinctive, and often hallucination prone. They guess the answer immediately.
Our Agent uses System 2 thinking. By forcing the AI to output a PLAN step before acting, we trigger a "Chain of Thought."
Why is this powerful?
When you ask an AI: "Write a script to calculate the 100th prime and save it to a file, then run it."
Standard AI: Might try to write the file and guess the output in one breath. It fails because it hasn't actually run the code yet.
Our CLI Agent:
PLAN: "I need to write a Python script first using
create_file
."
ACT: Writes primes.py.
OBSERVE: "File created successfully."
PLAN: "Now I need to run this file using
run_command
to get the result."
ACT: Runs python primes.py.
OUTPUT: "The 100th prime is 541."
This separation of concerns—planning vs. doing—is the secret sauce that turns a text generator into a capable engineer.
🌟 Why This Matters
By building this yourself, you gain:
Total Control: No black-box magic from heavy libraries.
Debuggability: You see every single "thought" the AI has.
Understanding: You learn that "Agents" aren't magic—they are just loops with memory.
Whole Magic from Here:
main.py
from openai import OpenAI
from dotenv import load_dotenv
import os
from pydantic import BaseModel, Field
from typing import Optional
import json
load_dotenv()
client = OpenAI()
def run_command(cmd: str):
"""Execute a system command and return the output."""
import subprocess
try:
result = subprocess.run(
cmd,
shell=True,
capture_output=True,
text=True,
timeout=30,
encoding='utf-8',
errors='ignore'
)
if result.returncode == 0:
return f"Success: {result.stdout.strip()}"
else:
return f"Error: {result.stderr.strip() or result.stdout.strip()}"
except Exception as e:
return f"ERROR: {str(e)}"
def create_file(filename: str, content: str):
"""Create a file with the given content (works on Windows)."""
try:
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, 'w', encoding='utf-8') as f:
f.write(content)
return f"File '{filename}' created successfully"
except Exception as e:
return f"ERROR creating {filename}: {str(e)}"
def create_file_ps(filename: str, content: str):
"""Create a file using PowerShell for Windows."""
try:
escaped_content = content.replace('"', '`"').replace("'", "`'")
ps_command = f'powershell -Command "Set-Content -Path \'{filename}\' -Value \'{escaped_content}\'"'
return run_command(ps_command)
except Exception as e:
return f"ERROR (PowerShell): {str(e)}"
available_tools = {
"run_command": run_command,
"create_file": create_file,
"create_file_ps": create_file_ps
}
SYSTEM_PROMPT = """Hey You are an expert AI Assistant in resolving user Queries using chain of thoughts.
Your work on START,PLAN and OUTPUT steps.
You need to first plan what needs to be done .The PLAN can be multiple steps.
once you think enough PLAN has been done, finally you can give an OUTPUT.
You can also call a tool if required from the list of available tools
for every tool call wait for the OBSERVE step which is output from the called tool.
Rules:
-Strictly Follow the given JSON output format
-only run one step at a time
-the sequence of steps is START (where user give an input),PLAN(that can be multiple times) and Finally OUTPUT(which is going to the displayed to the user).
**IMPORTANT CODING RULES:**
- When providing code in OUTPUT, ALWAYS provide it in clean, readable format
- DO NOT use escape sequences like \\n or \\"
- DO NOT wrap code in triple quotes
- Provide code as it should appear in the actual file
- For multi-file projects, clearly separate each file with clear labels
**CRITICAL WINDOWS FILE CREATION RULES:**
- You are running on WINDOWS, not Linux/Mac
- DO NOT use 'echo' commands with HTML tags as they will fail on Windows
- DO NOT use 'cat' or 'heredoc' syntax as they don't work on Windows
- For creating files, ALWAYS use the 'create_file' or 'create_file_ps' tools
- When using create_file, provide filename and content as separate parameters
**TOOL USAGE FORMAT:**
For create_file: {"step":"TOOL", "tool":"create_file", "input":"{\\"filename\\": \\"path/to/file.html\\", \\"content\\": \\"<html>content</html>\\"}"}
For create_file_ps: Same format as create_file
For run_command: Use only for simple commands like 'mkdir folder' or 'dir'
Output format:
{"step":"START"| "PLAN"|"OUTPUT"| "TOOL", "content":"string", "tool":"string","input":"string"}
Available Tools:
- run_command(cmd:str): executes simple system commands (use for dir, mkdir only)
- create_file(filename:str, content:str): creates a file with content (recommended)
- create_file_ps(filename:str, content:str): creates a file using PowerShell
Example of file creation:
TOOL:{"step":"TOOL", "tool":"create_file", "input":"{\\"filename\\": \\"index.html\\", \\"content\\": \\"<!DOCTYPE html><html><body>Hello</body></html>\\"}"}
Example 1:
START:Hey can you solve 2+3*5/10
PLAN:{"step":"PLAN", "content":"User wants to solve a math expression"}
PLAN:{"step":"PLAN", "content":"Following order of operations: multiplication and division first"}
PLAN:{"step":"PLAN", "content":"3*5 = 15, then 15/10 = 1.5"}
PLAN:{"step":"PLAN", "content":"Finally 2 + 1.5 = 3.5"}
OUTPUT:{"step":"OUTPUT","content":"The answer is 3.5"}
Example 2 (Creating HTML file):
START:Create an HTML file named test.html with simple content
PLAN:{"step":"PLAN", "content":"I will use create_file tool to create the HTML file"}
TOOL:{"step":"TOOL", "tool":"create_file", "input":"{\\"filename\\": \\"test.html\\", \\"content\\": \\"<!DOCTYPE html><html><head><title>Test</title></head><body><h1>Hello World</h1></body></html>\\"}"}
PLAN:{"step":"PLAN", "content":"File should be created successfully"}
OUTPUT:{"step":"OUTPUT","content":"HTML file test.html has been created"}
"""
class MyOutputFormat(BaseModel):
step: str = Field(..., description="The ID of the step. Example: START, PLAN, OUTPUT, TOOL")
content: Optional[str] = Field(None, description="The optional string content for the step")
tool: Optional[str] = Field(None, description="The ID of the tool to call")
input: Optional[str] = Field(None, description="The input params for the tool (JSON string if needed)")
message_history = [
{"role": "system", "content": SYSTEM_PROMPT}
]
def parse_tool_input(tool_input: str, tool_name: str):
"""Parse tool input, handling JSON or simple string formats."""
if not tool_input:
return None
if tool_name in ["create_file", "create_file_ps"]:
try:
return json.loads(tool_input)
except json.JSONDecodeError:
if "," in tool_input:
parts = tool_input.split(",", 1)
return {"filename": parts[0].strip(), "content": parts[1].strip()}
return None
return tool_input
while True:
user_query = input("🔣 ")
if user_query.lower() in ['exit', 'quit', 'bye']:
print("Goodbye!")
break
message_history.append({"role": "user", "content": user_query})
while True:
try:
response = client.chat.completions.parse(
model="gpt-4o-mini",
response_format=MyOutputFormat,
messages=message_history
)
raw_result = response.choices[0].message.content
message_history.append({"role": "assistant", "content": raw_result})
parsed_result = response.choices[0].message.parsed
if parsed_result.step == "START":
print("🔥", parsed_result.content)
continue
elif parsed_result.step == "TOOL":
tool_to_call = parsed_result.tool
tool_input = parsed_result.input
if not tool_to_call:
print("❌ Error: Tool name is empty")
message_history.append({"role": "developer", "content": json.dumps({
"step": "OBSERVE",
"tool": None,
"input": tool_input,
"output": "ERROR: Tool name not provided"
})})
continue
if tool_to_call not in available_tools:
print(f"❌ Error: Tool '{tool_to_call}' not found")
message_history.append({"role": "developer", "content": json.dumps({
"step": "OBSERVE",
"tool": tool_to_call,
"input": tool_input,
"output": f"ERROR: Tool '{tool_to_call}' not available"
})})
continue
print(f"🛠️ Calling tool: {tool_to_call}")
try:
parsed_input = parse_tool_input(tool_input, tool_to_call)
if tool_to_call in ["create_file", "create_file_ps"]:
if parsed_input and "filename" in parsed_input and "content" in parsed_input:
tool_response = available_tools[tool_to_call](
parsed_input["filename"],
parsed_input["content"]
)
else:
tool_response = f"ERROR: Invalid input for {tool_to_call}. Expected JSON with 'filename' and 'content'"
else:
tool_response = available_tools[tool_to_call](tool_input)
print(f"✅ Tool Response: {tool_response}")
message_history.append({"role": "developer", "content": json.dumps({
"step": "OBSERVE",
"tool": tool_to_call,
"input": tool_input,
"output": str(tool_response)
})})
except Exception as e:
error_msg = f"ERROR executing {tool_to_call}: {str(e)}"
print(f"❌ {error_msg}")
message_history.append({"role": "developer", "content": json.dumps({
"step": "OBSERVE",
"tool": tool_to_call,
"input": tool_input,
"output": error_msg
})})
continue
elif parsed_result.step == "PLAN":
print("🧠", parsed_result.content)
continue
elif parsed_result.step == "OUTPUT":
print("🤖", parsed_result.content)
break
except Exception as e:
print(f"❌ Error in main loop: {str(e)}")
break
Building with AI feels like discovering fire. It's powerful, exciting, and sometimes a little chaotic. But remember: The best way to predict the future is to invent it.
Don't be intimidated by complex jargon. Start small. Build a loop. Give your AI a tool. Watch it succeed.
Happy Coding! ✨
