SQLCoder 是一个代码补全模型,在 StarCoder 上进行了微调,用于 SQL 生成任务

7b 15b

75.9K 9 个月前

自述文件

SQLCoder 是一个 150 亿参数的模型,基于 StarCoder 模型进行微调。它在 sql-eval 框架上略微优于 gpt-3.5-turbo 用于自然语言到 SQL 生成任务,并且优于流行的开源模型。它也显著优于 text-davinci-003,一个规模是其 10 倍以上的模型。

这个 150 亿参数的补全模型通常需要至少 16GB 的内存。

用法

CLI

ollama run sqlcoder 

尝试使用三引号的以下提示,以进行多行输入

注意:将 {question} 更改为您想回答的 SQL 问题。例如)“哪些产品产生了最多的销售额”。

""" 
### Instructions:
Your task is to convert a question into a SQL query, given a Postgres database schema.
Adhere to these rules:
- **Deliberately go through the question and database schema word by word** to appropriately answer the question
- **Use Table Aliases** to prevent ambiguity. For example, `SELECT table1.col1, table2.col1 FROM table1 JOIN table2 ON table1.id = table2.id`.
- When creating a ratio, always cast the numerator as float

### Input:
Generate a SQL query that answers the question `{question}`.
This query will run on a database whose schema is represented in this string:
CREATE TABLE products (
  product_id INTEGER PRIMARY KEY, -- Unique ID for each product
  name VARCHAR(50), -- Name of the product
  price DECIMAL(10,2), -- Price of each unit of the product
  quantity INTEGER  -- Current quantity in stock
);

CREATE TABLE customers (
   customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer
   name VARCHAR(50), -- Name of the customer
   address VARCHAR(100) -- Mailing address of the customer
);

CREATE TABLE salespeople (
  salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson
  name VARCHAR(50), -- Name of the salesperson
  region VARCHAR(50) -- Geographic sales region
);

CREATE TABLE sales (
  sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale
  product_id INTEGER, -- ID of product sold
  customer_id INTEGER,  -- ID of customer who made purchase
  salesperson_id INTEGER, -- ID of salesperson who made the sale
  sale_date DATE, -- Date the sale occurred
  quantity INTEGER -- Quantity of product sold
);

CREATE TABLE product_suppliers (
  supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier
  product_id INTEGER, -- Product ID supplied
  supply_price DECIMAL(10,2) -- Unit price charged by supplier
);

-- sales.product_id can be joined with products.product_id
-- sales.customer_id can be joined with customers.customer_id
-- sales.salesperson_id can be joined with salespeople.salesperson_id
-- product_suppliers.product_id can be joined with products.product_id

### Response:
Based on your instructions, here is the SQL query I have generated to answer the question `{question}`:
```sql
"""

示例输出

SELECT customers.customer_id,
       products.product_id,
       (SUM(sales.quantity * product_suppliers.supply_price)) / (SELECT 
SUM(quantity)
                                                                FROM sales
                                                                WHERE 
sales.salesperson_id = '{chosen_salesperson}') as profit
FROM   sales
  INNER JOIN customers on sales.customer_id = customers.customer_id
  INNER JOIN products on sales.product_id = products.product_id
  INNER JOIN product_suppliers on products.product_id = 
product_suppliers.product_id
WHERE  sales.salesperson_id = '{chosen_salesperson}'
GROUP BY customers.customer_id, products.product_id
HAVING COUNT(DISTINCT sales.sale_date) >= {min_days}
  AND (SELECT SUM(quantity * product_suppliers.supply_price)
     FROM   sales
     WHERE  sales.salesperson_id = '{chosen_salesperson}'
        and sales.customer_id = customers.customer_id
        and sales.product_id = products.product_id) >= {min_total}
ORDER BY profit DESC;
```


### Reasoning

The query is broken down into two parts: the first one fetches information
about each sale made by a specific salesman (`{chosen_salesperson}`), and 
groups it by customer ID, product ID, and calculates how much money was 
earned from each sale. The second part then filters out these results that
occurred after a certain number of days (determined by the variable 
`min_days`) and had an amount of money equal to or more than a certain 
value (determined by the variable `min_total`). Finally, the query orders 
the results in descending order based on the ratio between earned money 
and total money.

The first part relies on several tables from the database: salespeople, 
products, customers, sales, and product_suppliers. The salespeople table 
contains information about each salesperson's ID; the products table 
contains information about each product's name and price; the customers 
table contains information about each customer's address; the sales table 
contains information about each sale, including the quantity sold, the ID 
of the product sold, and the ID of the customer who made the purchase; and
the product_suppliers table contains information about the unit price 
charged by each supplier.

The variables in the query can be interpreted as follows:

- `{chosen_salesperson}` is a placeholder for the name of the salesman 
whose performance is being evaluated. This value should be replaced with 
an actual salesman's name or ID number.
- `min_days` is the minimum number of days that must have passed since the
sale in order to qualify for inclusion in the calculation. For example, if
this variable were set to 30, then only sales made 30 days after the start
date would be included in the calculation. This value should also be 
replaced with an integer representing the minimum number of days.
- `min_total` is a threshold amount that must have been earned by the 
salesman for the sale to qualify for inclusion in the calculation. For 
example, if this variable were set to 1000, then only sales worth at least
$1000 would be included in the calculation. This value should also be 
replaced with an integer representing the minimum amount of money that 
must have been earned.

The output is a table containing three columns: customer ID, product ID, 
and profit per unit price (calculated as the total money earned divided by
the quantity sold). The rows in this table are sorted in descending order 
based on the ratio between earned money and total money. Rows that don't 
meet both of these conditions will be dropped from the output.

## Example

```
Input:
min_days = 30
min_total = 1000
chosen_salesperson = '<NAME>'

Output:
  customer_id | product_id |         profit
-------------+----------------------------+-----------------
  987         |     456    | 2.1             
  1029        |    1357    | 1.91            
  89          |    1234    | 0.59            
```

## License

The database is released under the [Open Database 
License](https://opendatacommons.org/licenses/odbl/1-0/).

参考

Hugging Face