# Memory

## Introduction

Canso Memory is a powerful memory abstraction within the Canso AI Agentic System that enables AI agents to store, retrieve, update, and delete information from vector databases. This system serves as a long-term memory for AI agents, allowing them to maintain context across conversations and tasks. This guide will help you set up and start using Canso Memory with your Canso AI agents.

CansoMemory provides:

* A standardized interface for interacting with vector databases
* Automatic embedding generation for text content
* Support for multiple embedding types and vector database backends
* Collection management for organizing different types of memory
* Seamless integration with Canso AI agents

## Key Features

* Persistent Memory Storage: Store information that persists across agent restarts and sessions
* Semantic Search: Retrieve information based on semantic similarity rather than exact matches
* Flexible Data Organization: Organize information into collections for different use cases
* Simple API: Store and retrieve memory with just a few lines of code
* Integration with AI Agents: Seamlessly incorporate memory capabilities into your Canso AI agents

## Supported Technologies

| Component        | Currently Supported |
| ---------------- | ------------------- |
| Vector Databases | Milvus              |
| Embedding Models | OpenAI Embeddings   |

## What is Collection in Memory ?

A collection in Milvus DB is similar to a table in traditional databases. It's a logical grouping of data entities that serves as the basic unit for data management. Collections help organize and store related information in a structured way that allows for efficient vector similarity search, which is essential for AI Agent memory.

## Why Do We Need Collections ?

These collections, for the time being are being used for Text to SQL - helping the agent convert natural language questions into SQL queries by leveraging structured knowledge. Collections allow the AI agent to:

1. **Store and organize different types of information** - Each collection is designed to hold specific types of data relevant to the agent's operations.
2. **Perform semantic search** - By storing vector embeddings alongside text data, the agent can find information based on meaning, not just keywords.
3. **Maintain context awareness** - Collections help the agent understand the database structure, domain knowledge, and query patterns needed to generate accurate responses.

## Supported Collections

Below is a summary of the collections we currently support, with detailed field specifications for each:

### 1. Table Metadata Collection

**Purpose**: Stores information about your database structure to help the agent understand the schema and generate accurate SQL queries.

**Collection Name**: `canso_table_metadata`

| Field Name          | Data Type      | Notes                                                                                                                                                                |
| ------------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `table_name`        | `VARCHAR`      | Primary key, max length 200 characters, i.e., `"customers"`                                                                                                          |
| `schema`            | `VARCHAR`      | JSON representation of table schema, max length 65535 characters, i.e., `{"columns": [{"name": "customer_id", "type": "INT"}, {"name": "name", "type": "VARCHAR"}]}` |
| `schema_embeddings` | `FLOAT_VECTOR` | Vector embeddings of table\_name, schema information, dimension depends on embedding model, i.e., `[0.1, 0.2, ..., 0.5]`                                             |

**Index**: `IVF_FLAT` with `L2` metric type on schema\_embeddings field

[Click here](https://milvus.io/docs/index.md?tab=floating) to learn more about index in Milvus DB

### 2. Domain Knowledge Collection

**Purpose**: Contains context-specific information about your business domain to enable the agent to understand domain-specific concepts and translate them into SQL.

**Collection Name**: `canso_domain_knowledge`

| Field Name    | Data Type      | Notes                                                                                                                                                          |
| ------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `fact`        | `VARCHAR`      | Primary key, max length 200 characters, i.e., `"Premium customers receive 10% discount"`                                                                       |
| `explanation` | `VARCHAR`      | Detailed explanation of the domain fact, max length 65535 characters, i.e., `"Our loyalty program offers a 10% discount to all customers with Premium status"` |
| `logic`       | `VARCHAR`      | Business logic related to the fact, max length 65535 characters, i.e., `"IF customer.status = 'Premium' THEN apply_discount(0.1)"`                             |
| `embeddings`  | `FLOAT_VECTOR` | Vector embeddings of domain knowledge, fact, explanation, logic, dimension depends on embedding model, i.e., `[0.4, 0.1, 0.8, ..., 0.3]`                       |

**Index**: `IVF_FLAT` with `L2` metric type on embeddings field

### 3. Example Queries Collection

**Purpose**: Stores successful query patterns and examples to help the agent learn from past interactions and improve future query generation.

**Collection Name**: `canso_examples`

| Field Name    | Data Type      | Notes                                                                                                                                            |
| ------------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name`        | `VARCHAR`      | Primary key, max length 200 characters, i.e., `"monthly_sales_report"`                                                                           |
| `description` | `VARCHAR`      | Description of the example query, max length 65535 characters, i.e., `"Query to generate monthly sales report by product category"`              |
| `content`     | `VARCHAR`      | Actual query content, max length 65535 characters, i.e., `"SELECT category, SUM(amount) FROM sales GROUP BY category ORDER BY SUM(amount) DESC"` |
| `embeddings`  | `FLOAT_VECTOR` | Vector embeddings of example queries, dimension depends on embedding model, i.e., `[0.7, 0.2, 0.1, ..., 0.6]`                                    |

**Index**: `IVF_FLAT` with `L2` metric type on embeddings field

### 4. Column Metadata Collection

**Purpose**: Stores metadata information for the columns in your database including possible values for columns with low cardinality. This helps the agent to generate more accurate queries when exact values need to be used in queries. The collection also has an optional metadata field which can be used to provide additional information like aliases, synonyms for column values etc.

**Collection Name**: `canso_column_metadata`

| Field Name                   | Data Type        | Notes                                                                                                                                                              |
| ---------------------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `table_column_composite_key` | `VARCHAR`        | Primary key; A composite of `table_name` and `column_name` for identification; max length 400 characters; Ex `"alerts\|severity"`                                  |
| `table_name`                 | `VARCHAR`        | Name of the table; max length 200 characters, Ex "alerts"                                                                                                          |
| `column_name`                | `VARCHAR`        | Name of the column; max length 200 characters, Ex "severity"                                                                                                       |
| `candidate_values`           | `ARRAY<VARCHAR>` | Array of candidate values; each element is VARCHAR, max\_length=100, max\_capacity=2048; Ex \["high","low","medium"]                                               |
| `metadata`                   | `JSON`           | Stores additional metadata in JSON format. Since this is a JSON field it can be used as a catchall for any additional info; Ex {"aliases": \["acceptable", "mid"]} |
| `embeddings`                 | `FLOAT_VECTOR`   | Vector embeddings of a string concatenation of `table_name` and `column_name`, dimension depends on embedding model; Ex \`\[0.7, 0.2, 0.1, ..., 0.6]               |

**Index**: `IVF_FLAT` with `L2` metric type on embeddings field

## Collection Limitations and Some Notes

1. **Field Length**: Maximum length for any VARCHAR field in `Milvus DB` is `65535` characters. Similarly for an ARRAY field in `Milvus DB` we can have a maximum of `2048`elements.
2. **Primary Keys**: Each collection has a designated primary key field for unique identification.
3. **Vector Embeddings**: Each collection contains at least one vector field that stores the semantic representation of text data.
4. **Index Types**: Collections use the `IVF_FLAT` index type, which balances search speed and recall rate.

## Memory Tool Integration

CansoMemory integrates with various tools including:

* Memory Retrieval Tool: Allows agents to search and retrieve relevant memory during conversations
* Text to SQL Tool: Uses memory to generate user-specific SQL queries based on schema information and examples stored in memory

## Prerequisites

* Access to a supported vector database (currently Milvus)
* Appropriate credentials for embedding generation (OpenAI API key)

## Setting Up the Vector Database

{% hint style="warning" %}

> Before using CansoMemory, you need to deploy a vector database to store the embeddings. Canso currently supports Milvus as its vector database backend. This can also be an external vector database.
> {% endhint %}

1. Configure the Vector Database

Create a configuration file (e.g., config.yaml) with the following content:

```yaml
vector_db:
  type: milvus
  name: canso-prod-vdb-4-feb-v2
  size: 4Gi
  image_pull_secret: docker-secret-cred-agents
```

2. Deploy the Vector Database

Use the Canso CLI to deploy the vector database to your cluster:

```bash
gru component setup --cluster-name <name_of_your_cluster> --config-file config.yaml
```

This command will provision a Milvus instance in your cluster with the specified configuration.

## Basic Setup

### 1. Initialize a Memory Instance

```python
from gru.agents import CansoMemory
from langchain_openai import ChatOpenAI

# Initialize the model
model = ChatOpenAI(model="gpt-4o", temperature=0)

# Create a memory instance
memory = CansoMemory(client=model.client)
```

### 2. Connect Memory to an Agent

```python
from gru.agents import CansoLanggraphAgent

# ... agent setup code ...

# Connect memory to the agent
canso_agent = CansoLanggraphAgent(
    stateGraph=graph, 
    memory=memory,
    interrupt_before=["sensitive_tools"]
)

# Run the agent
canso_agent.run()
```

### Deploying the Agent

Register and Deploy the Agent Using the [Deployment Documentation](https://docs.canso.ai/getting-started#register-and-deploy-agent)

## Using the CLI for Memory Management

CansoMemory can also be managed using the Canso CLI:

### Storing Memory

```bash
gru agent memory insert --agent-name my-agent --file data_file.json
```

Where `data_file.json` contains:

```json
{
  "collection": "sql_schemas",
  "data": {"text": "Table: customers, Description: Stores customer data", "type": "schema"},
  "tags": ["SQL", "database"]
}
```

### Updating Memory

```bash
gru agent memory update --agent-name my-agent --file memory_file.json
```

### Deleting Memory

```bash
gru agent memory delete --agent-name my-agent --expr <delete-expr> --collection <collection-name>
```

### Converse with Agent with Memory Enhanced Context

```bash
gru agent converse --agent-name my-agent 
```

## Detailed Documentation

For more detailed information about CansoMemory, please refer to:

* [Memory API Reference](https://docs.canso.ai/ai-agents/api-summary/memory-api)
* [Examples](https://docs.canso.ai/ai-agents/use-cases/examples)
