Building a Self-Improving AI: Reflection Agents with Memory

Most AI agents today are like employees with severe amnesia. They process your request, give you an answer, and then immediately forget everything that just happened. If they make a mistake today—and you correct them—they will likely make the exact same mistake tomorrow.

This limitation keeps AI from being truly autonomous. You can't trust an agent that never learns from experience.

Today, in **Day 11** of our AI Agent Challenge, we are solving this by building a **Reflection Agent** with **Long-Term Memory**.

## The Problem: Linear Thinking

Standard RAG (Retrieval-Augmented Generation) agents think in a straight line:

> **Input** → **Search** → **Answer**

If the search is slightly off, or the logic is flawed, the agent has no way to catch itself. It blurted out the first thing that came to mind.

## Solution Part 1: The Reflection Loop (The "Internal Monologue")

To fix this, we borrow a concept from human psychology called **metacognition**—thinking about thinking. In AI terms, this is often called the "Reflexion" pattern.

Instead of answering immediately, our agent follows a three-step internal loop:

1. **Draft**: The agent generates an initial attempt at the solution.

2. **Critique**: The agent pauses and reviews its own work. It plays the role of a harsh critic, looking for security vulnerabilities, logic gaps, or hallucinations.

3. **Refine**: The agent takes its own feedback and rewrites the solution.

Think of it like a writer and an editor living in the same brain. The writer generates ideas fast, and the editor polishes them before they go public.

## Solution Part 2: Long-Term Memory (The "Notebook")

Reflection is powerful, but it has a flaw: **it helps the agent fix the *current* answer, but it doesn't help it improve for *future* answers.** Once the process ends, the insight is lost.

This is where **Mem0** comes in.

We give our agent a permanent memory layer (using a Graph Database + Vector Store). Now, when the agent critiques itself and finds a mistake (e.g., *"I shouldn't use `os.system` because it's insecure"*), it doesn't just fix the code—it **writes that lesson down**.

## The New Workflow: How It Actually Works

Here is the lifecycle of our Self-Improving Agent:

### 1. The Recall (Before Acting)

Before the agent writes a single word, it pauses. It queries Mem0: *"Have I done a task like this before? What mistakes did I make?"*

If it finds a relevant lesson (e.g., *"Always validate user input for file paths"*), it loads that context into its working memory.

### 2. The Informed Draft

The agent generates its first draft. Because it reviewed its past lessons *first*, this draft is already smarter and safer than a standard agent's attempt. It avoids the pitfalls it fell into yesterday.

### 3. The Critique (Safety Check)

Even with memory, we double-check. The agent reviews its draft.

* *Is this code secure?*

* *Does it answer the user's specific constraints?*

If it finds a new issue, it flags it.

### 4. The Refinement

The agent rewrites the answer, fixing any issues found during the critique.

### 5. The Learning Event

This is the most critical step. If the agent found a mistake during the Critique phase, it sends that specific insight back to Mem0.

* **Mistake:** "I tried to use a deprecated library."

* **Lesson:** "Use the updated v2 library for this specific function."

The next time you ask for help with that library, the agent will "remember" this lesson instantly.

## Why This Changes Everything

This architecture marks the shift from **Static Tools** to **Adaptive Teammates**.

A static tool works the same way every time. An adaptive teammate gets better the more you work with them. They learn your preferences, they remember their training, and they stop making the same mistakes. By combining **Reflection** (short-term correction) with **Mem0** (long-term persistency), we are building AI that finally grows with us.

Complete code :

import os
from dotenv import load_dotenv
from mem0 import Memory
from openai import OpenAI

# Load environment variables
load_dotenv()

# Configuration (matching your existing setup)
config = {
    "version": "v1.1",
    "embedder": {
        "provider": "openai",
        "config": {
            "api_key": os.getenv("OPENAI_API_KEY"),
            "model": "text-embedding-3-small"
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "api_key": os.getenv("OPENAI_API_KEY"),
            "model": "gpt-4o-mini"
        }
    },
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": os.getenv("NEO_URI"),
            "username": os.getenv("NEO_USERNAME"),
            "password": os.getenv("NEO_PASSWORD")
        }
    },
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333,
        }
    }
}

class ReflectionAgent:
    def __init__(self):
        print("Initializing Reflection Agent with Mem0...")
        self.memory = Memory.from_config(config)
        self.client = OpenAI()
        self.user_id = "reflection_agent_v1"

    def get_memories(self, task):
        """Retrieve relevant past lessons and mistakes."""
        results = self.memory.search(task, user_id=self.user_id, limit=3)
        memories = [f"- {m['memory']}" for m in results['results']] if results and 'results' in results else []
        return "\n".join(memories) if memories else "No relevant past memories found."

    def generate_draft(self, task, context):
        """Step 1: Generate an initial draft based on task and memory."""
        print(f"\n--- Generating Draft for: '{task}' ---")
        prompt = f"""
        You are an AI assistant.
        Task: {task}
        
        Past Lessons Learned (Memory):
        {context}
        
        Generate an initial solution or response.
        """
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

    def reflect(self, task, draft):
        """Step 2: Critique the draft to find errors or areas for improvement."""
        print("\n--- Reflecting on Draft ---")
        prompt = f"""
        Review the following draft for the task: '{task}'
        
        Draft:
        {draft}
        
        Identify any specific errors, security issues, or inefficiencies. 
        Be critical. If it's perfect, say "No issues found."
        """
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

    def refine(self, task, draft, critique):
        """Step 3: Improve the draft based on the critique."""
        print("\n--- Refining Output ---")
        prompt = f"""
        Task: {task}
        
        Original Draft:
        {draft}
        
        Critique/Feedback:
        {critique}
        
        Rewrite the draft to address the critique and provide a polished final version.
        """
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

    def store_lesson(self, task, critique):
        """Step 4: Save the lesson learned to Mem0 for future use."""
        print("\n--- Storing Lesson in Memory ---")
        lesson = f"Task: {task} | Lesson: {critique}"
        self.memory.add(lesson, user_id=self.user_id)
        print("Lesson saved!")

    def run(self, task):
        # 1. Retrieve Context
        context = self.get_memories(task)
        print(f"DEBUG: Retrieved Context:\n{context}")

        # 2. Generate
        draft = self.generate_draft(task, context)
        print(f"DEBUG: Draft generated (len: {len(draft)})")

        # 3. Reflect
        critique = self.reflect(task, draft)
        print(f"DEBUG: Critique: {critique}")

        # 4. Refine
        final_version = self.refine(task, draft, critique)
        
        # 5. Learn (only if there was constructive criticism)
        if "No issues found" not in critique:
            self.store_lesson(task, critique)
        
        return final_version

if __name__ == "__main__":
    agent = ReflectionAgent()
    
    # Example Task: Writing a secure Python function
    task_query = input("Enter a task for the Reflection Agent")
    
    print(f"Running Agent on Task: {task_query}")
    final_output = agent.run(task_query)
    
    print("\n================ FINAL OUTPUT ================")
    print(final_output)
    print("==============================================")