In this section, we present how to integrate LangChain with MindsDB.

LangChain allows users to connect multiple large language models in a logical way using chains, leveraging the strengths of each model and expanding the capabilities of language processing systems. It also enables the usage of mechanisms like memories and output parsers, which can improve the robustness of LLM-based applications. By employing chains, memories, and output parsers, LangChain empowers users to create, among other things, so-called “agents” that emerge out of forcing LLMs into a structured loop to achieve reasoning. These agents can understand and interact with data, making communication with data more efficient and intuitive.

Read on to find out how to use LangChain with MinsdDB.

Setup

MindsDB provides the LangChain handler that enables you to use LangChain within MindsDB.

AI Engine

Before creating a model, it is required to create an AI engine based on the provided handler.

If you installed MindsDB locally, make sure to install all LangChain dependencies by running pip install .[langchain] or from the requirements.txt file.

You can create a LangChain engine using this command and providing at least one of OpenAI or Anthropic API keys:

CREATE ML_ENGINE langchain_engine
FROM langchain
USING
    openai_api_key = 'your-openai-api-key',
    anthropic_api_key = 'your-anthropic-api-key',
    serper_api_key = 'your-serper-api-key'; -- this is optional

The name of the engine (here, langchain_engine) should be used as a value for the engine parameter in the USING clause of the CREATE MODEL statement.

AI Model

The CREATE MODEL statement is used to create, train, and deploy models within MindsDB.

CREATE MODEL model_name
PREDICT column_to_be_predicted
USING
    engine = 'langchain_engine',
    model_name = 'anthropic-or-openai-model',
    -- openai_api_key = 'your-openai-api-key', -- if not provided in the ML_ENGINE
    -- anthropic_api_key = 'your-anthropic-api-key', -- if not provided in the ML_ENGINE
    -- serper_api_key = 'if-serper-dev-search-is-used', -- if not provided in the ML_ENGINE
    prompt_template = 'message to the model that may include some {{input}} columns as variables',
    ...;

The available models include OpenAI models, like gpt-3 or gpt-4, and Anthropic models, like claude-1 or claude-2. Please note that you need to provide an API key for the model you choose.

Currently, this integration supports exposing OpenAI and Anthrophic LLMs with normal text completion support. They are then wrapped in a zero shot react description agent that offers a few third party tools out of the box, with support for additional ones (like Serper) if an API key is provided. Ongoing memory is also provided.

You can provide other parameters specific for OpenAI and Anthropic models, such as temperature or max_tokens.

Agents and Tools are some of the main abstractions that LangChain offers. You can read more about them in the LangChain documentation ([1], [2]).

There are three different tools utilized by this agent:

  • MindsDB is the internal MindsDB executor.
  • Metadata fetches the metadata information for the available tables.
  • Write is able to write agent responses into a MindsDB data source.

Each tool exposes the internal MindsDB executor in a different way to perform its tasks, effectively enabling the agent model to read from (and potentially write to) data sources or models available in the active MindsDB project.

Examples

Let’s create the model that will be used by the following use cases.

CREATE MODEL tool_based_agent
PREDICT completion
USING
    engine = 'langchain',
    prompt_template = 'Answer the users input in a helpful way: {{input}}';

Here, we create the tool_based_agent model using the LangChain engine, as defined in the engine parameter. This model answers users’ questions in a helpful way, as defined in the prompt_template parameter, which specifies input as the input column when calling the model.

Describing Connected Data Sources

We can ask questions about data sources connected to MindsDB.

SELECT input, completion
FROM tool_based_agent
WHERE input = 'Could you describe the `mysql_demo_db.house_sales` table please?'
USING
    verbose = True,
    tools = [],
    max_iterations = 10;

On execution, we get:

+------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| input                                                            | completion                                                                                                                                                  |
+------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Could you describe the `mysql_demo_db.house_sales` table please? | The `mysql_demo_db.house_sales` table has four columns: `saledate` (text), `house_price_moving_average` (integer), `type` (text), and `bedrooms` (integer). |
+------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

To get information about the mysql_demo_db.house_sales table, the agent uses the Metadata tool. Then the agent prepares the response.

The Write tool only comes into play when writing to other data sources. By default, the agent is able to write an answer back to the user (through standard output), like in this case.

Analyzing Data

We can ask questions to analyze the available data.

SELECT input, completion
FROM tool_based_agent
WHERE input = 'I want to know the average number of beds in the downtown neighborhood as per the `mysql_demo_db.home_rentals` table'
USING
    verbose = True,
    tools = [],
    max_iterations = 10;

On execution, we get:

+-----------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| input                                                                                                                 | completion                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| I want to know the average number of beds in the downtown neighborhood as per the `mysql_demo_db.home_rentals` table | The average number of beds in the downtown neighborhood as per the `mysql_demo_db.home_rentals` table is 1.6. |
+-----------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+

Here, the model uses the Metadata tool again to fetch the column information. As there is no beds column in the mysql_demo_db.home_rentals table, it uses the number_of_rooms column and writes the following query:

SELECT AVG(number_of_rooms)
FROM mysql_demo_db.home_rentals
WHERE neighborhood = 'downtown';

This query returns the value of 1.6, which is then used to write an answer.

Retrieving Data

We can ask the model to retrieve specific data.

SELECT input, completion
FROM tool_based_agent
WHERE input = 'There is a property in the south_side neighborhood with an initial price of 2543 the `mysql_demo_db.home_rentals` table. What are some other details of this listing?'
USING
    verbose = True,
    tools = [],
    max_iterations = 10;

On execution, we get:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| input                                                                                                                                                                 | completion                                                                                                                                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| There is a property in the south_side neighborhood with an initial price of 2543 the `mysql_demo_db.home_rentals` table. What are some other details of this listing? | The property in the south_side neighborhood with an initial price of 2543 has 1 bedroom, 1 bathroom, is 630 sqft, has been on the market for 11 days, and has a rental price of 2543.0. |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Here, the model uses the Metadata tool again to fetch information about the table. Then, it creates and executes the following query:

SELECT *
FROM mysql_demo_db.home_rentals
WHERE neighborhood = 'south_side'
AND initial_price = 2543;

On execution, the model gets this output:

+---------------+-------------------+----+--------+--------------+-------------+------------+------------+
|number_of_rooms|number_of_bathrooms|sqft|location|days_on_market|initial_price|neighborhood|rental_price|
+---------------+-------------------+----+--------+--------------+-------------+------------+------------+
|1              |1                  |630 |great   |11            |2543         |south_side  |2543        |
+---------------+-------------------+----+--------+--------------+-------------+------------+------------+

Consequently, it takes the query output and writes an answer.

Inserting Data

We can ask the model to insert data into a table, assuming we have sufficient privileges on that database.

SELECT input, completion
FROM tool_based_agent
WHERE input = 'I want to insert a new record into the `local_database.home_rentals` table. Its details are: number_of_rooms=4, number_of_bathrooms=2, sqft=950'
USING
    verbose = True,
    tools = [],
    max_iterations = 10;

On execution, we get:

+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| input                                                                                                                                                    | completion                                                                                                                                                   |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| I want to insert a new record into the `local_database.home_rentals_inserted` table. Its details are: number_of_rooms=4, number_of_bathrooms=2, sqft=950 | The record has been successfully inserted into the `local_database.home_rentals` table with the details: number_of_rooms=4, number_of_bathrooms=2, sqft=950. |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+

The agent uses the Write tool to INSERT INTO the local_database database.