Tinkerwell 4 is out now! Get the most popular PHP Scratchpad application. Learn more

Go back to Blog

Query your Laravel database using natural language

Marcel Pociot

Query your Laravel database using natural language

In November 2022 I started experimenting a lot with OpenAI's large language models, such as DaVinci or Codex. One of those experiments was the ability to query your database using natural language.

Back then, I built a quick proof of concept and tweeted about it, but ultimately didn't spend additional time working on wrapping this as a package, app or anything similar. This was the tweet:

The reaction to this was pretty mind-blowing. It gained more than 4,000 likes on Twitter and more than 500,000 impressions!

Months later, after spending a lot of time working on What The Diff, an AI-powered code review tool, I'm coming back to this idea and I've built it into a package that is ready to install and use!

In this blog post, I'm going to explain how you can make use of the package, as well as how it works internally.

If you are only interested in the internals, and not the general usage, you can read about it here.

Installing the package

The package is called laravel-ask-database and you can install it via Composer, using:

composer require beyondcode/laravel-ask-database

Once installed, you may publish the configuration file using

php artisan vendor:publish --tag="ask-database-config"

This will create a file called config/ask-database.php with the following contents:

return [
    /**
     * The database connection name to use. Depending on your
     * use case, you might want to limit the database user
     * to have read-only access to the database.
     */
    'connection' => env('ASK_DATABASE_DB_CONNECTION', 'mysql'),

    /**
     * Strict mode will throw an exception when the query
     * would perform a write/alter operation on the database.
     *
     * If you want to allow write operations - or if you are using a read-only
     * database user - you may disable strict mode.
     */
    'strict_mode' => env('ASK_DATABASE_STRICT_MODE', true),

    /**
     * The maximum number of tables to use before performing an additional
     * table name lookup call to OpenAI.
     * If you have a lot of database tables and columns, they might not fit
     * into a single request to OpenAI. In that case, we will perform a
     * lookup call to OpenAI to get the matching table names for the
     * provided question.
     */
    'max_tables_before_performing_lookup' => env('ASK_DATABASE_MAXIMUM_TABLES', 15),
];

If you want to get started with asking your database, you need to create an account at OpenAI and provide your API key in your .env file:

OPENAI_API_KEY=sk-...

Now you're ready to go.

Asking your database

Now you're pretty much ready to go and you can ask your database questions using natural language. The package registers a macro on Laravel's DB facade, which you can use to query the database.

$answer = DB::ask('How many users do we have?');

The answer will be provided as a natural language response as well, so you won't just get 10 as a result (as an example), but a full sentence. Something like: We have a total of 10 users..

Retrieving queries from natural language

In addition to getting a sentence as a response to a question, we can also use the askForQuery macro to use natural language to build a database query instead.

For example:

$query = DB::askForQuery('How many users do we have?');

Which would return the raw query, like: SELECT COUNT(*) FROM users.

What about security?!

I'm sure some of you read this and think to yourself: Wait a minute, Marcel! This offers a completely new approach to SQL injection! Now I can just ask the database to drop all tables, delete a user, insert new data, etc.

Because of this, there are a couple security mechanisms in place that you should make use of.

Strict Mode

This package has a configurable "strict mode", which is enabled by default. This mode checks all generated queries for the occurrences of altering/destructive SQL grammar (such as DROP, UPDATE, INSERT, ALTER, etc.). If this kind of syntax can be found in the generated SQL query, the package will throw a PotentiallyUnsafeQuery exception.

If you want to disable strict mode, you can set the ASK_DATABASE_STRICT_MODE environment key to false.

Configurable database connection

The second, and preferred, way to deal with security using this package is by using a custom, read-only, database connection. The package is going to use the Laravel database connection that you have configured in the ASK_DATABASE_DB_CONNECTION environment key (it defaults to mysql).

To ensure that you do not perform any destructive or data-altering queries, it's advised to create a custom database connection in your config/database.php file that uses a read-only database user. In this case, you can also disable the strict mode, as there's no need to check for this in PHP when the database server takes care of it.

Internals

So you might be wondering how this works internally. If you read my blog post explaining AI powered error solutions, you might already know that it comes down to finding the right prompts.

If you think about it, the process of asking a question using natural language and getting the answer in natural language actually requires some kind of chaining.

The process looks something like this:

  1. Ask for a database query for a given question
  2. Actually evaluate the query locally
  3. Turn the database result into an answer to the question

Let's see how we can do this.

Creating the query

First of all, we will need to create a database query for a given question. We can start by providing some context to our prompt:

Given an input question, first create a syntactically correct {{ $dialect }} query to run, then look at the results of the query and return the answer.
Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

This already sets up the basic idea behind our prompt. One important part is still missing though, which is: What kind of tables and columns are available?

Retrieving the database tables and columns in Laravel

To retrieve the tables and its columns using Laravel, we can make use of the Doctrine schema manager that is available to our database connections.

Using the schema manager we can retrieve all database tables using the listTables method:

DB::connection($this->connection)
    ->getDoctrineSchemaManager()
    ->listTables();

This returns an array of Table objects, which also allows us to access their individual columns.

Appending the data to our prompt

We can then take the tables array and append this information to our prompt:

Only use the following tables and columns:

@foreach($tables as $table)
"{{ $table->getName() }}" has columns: {{ collect($table->getColumns())->map(fn($column) => $column->getName() . ' ('.$column->getType()->getName().')')->implode(', ') }}
@endforeach

So for each table, we will create a new row with the table name, as well as all of its columns and data types.

This is an example of what the prompt ends up looking like, using the default Laravel migrations:

Only use the following tables and columns:

"failed_jobs" has columns: id (integer), uuid (string), connection (text), queue (text), payload (text), exception (text), failed_at (datetime)
"migrations" has columns: id (integer), migration (string), batch (integer)
"password_resets" has columns: email (string), token (string), created_at (datetime)
"users" has columns: id (integer), name (string), email (string), email_verified_at (datetime), password (string), remember_token (string), created_at (datetime), updated_at (datetime)

As you can see, we provide a list of all of our tables and columns to the prompt, so that the AI model can figure out which tables and columns to use for the query.

But this also introduces an interesting edge case.

Having too many tables

If your Laravel application has a lot of database tables and columns, your prompt will start to grow. It can grow so much, that it does not fit into the token limit of the DaVinci GPT3 prompt (reminder, a "token" is approximately 3-4 characters of text).

To prevent this from happening, we can add an additional OpenAI call. In the package, you can configure the maximum number of tables before this additional call will be made using the ASK_DATABASE_MAXIMUM_TABLES environment variable.

If your database has more tables than configured, we will perform another prompt that looks like this:

Given the below input question and list of potential tables, output a comma separated list of the table names that may be necessary to answer this question.
Question: {{ $question }}
Table Names: @foreach($tables as $table){{ $table->getName() }},@endforeach

Relevant Table Names:

The AI model is then going to return a comma separated list of potential table names, which we use to filter the tables that we send to the first prompt.

Running the query

Now that we know about which tables and columns can be used to build a query, we still need to provide the question to our prompt:

Question: "{!! $question !!}"
SQLQuery: "

With this, we have provided all the information needed to get the first part of answering the question: our prompt.

The prompt stops with SQLQuery: " as that's what we want to retrieve next.

We perform an OpenAI API call with the prompt and use a newline character as the stopword.

As this gives us the actual database query, all we need to do now is evaluate it. For this, I'm using the DB facade.

DB::select($query)

The result of this call will be JSON serialized, so that we can add it to our prompt. This leads us to the next OpenAI API call, which will be used for formulating an actual answer.

Getting the answer

At this point we know the raw SQL query, we evaluated it and we know the result of the query. We add this information to our prompt, so that it looks something like this:

Given an input question, first create a syntactically correct Sqlite query to run, then look at the results of the query and return the answer.
Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables and columns:

"users" has columns: id (integer), name (string), email (string), email_verified_at (datetime), password (string), remember_token (string), created_at (datetime), updated_at (datetime)

Question: "How many users do you have?"
SQLQuery: "SELECT COUNT(*) FROM users;"
SQLResult: "{"COUNT(*)":0}"
Answer: "

Once again we send this prompt to OpenAI and retrieve our final part of the puzzle, which is the actual answer to our question.

And that's it!

If you want to check out the full source code of the package, you can find it on GitHub.

Your AI-powered code review assistant

What The Diff analyzes the changes of your pull requests and gives you and your team a summary of all the changes in plain english – no need to do this yourself.

Get started Learn more
whatthediff.ai