Legacy25/ValkyrieDB

语言: C++

git: https://github.com/Legacy25/ValkyrieDB

基于LLVM的SQL到本机代码编译器
A SQL-to-native-code compiler based on LLVM
README.md (中文)

ValkyrieDB

基于LLVM的SQL到本机代码编译器

ValkyrieDB是一个简单的查询处理引擎,可以采用JSON格式的任何查询计划,并将其转换为LLVM汇编代码。现在,系统支持Selection,Projection和Equi-joins,以及基于numeric数据类型的表达式。

ValkyrieDB使用名为Mushroom Cloud的Java项目将SQL转换为JSON格式的查询计划。 Mushroom Cloud基于这个GitHubber的SQL-Evaluator项目。它使用JSqlParser来解析SQL,并构建关系代数树,在该树上应用优化传递并对其进行求值。蘑菇云摆脱了评估,而只是打印出送入ValkyrieDB的优化计划。

建立

$> make

这将构建MushroomCloud和Valkyrie。二进制文件将在bin /中构建。

您可以通过两种方式运行此项目。使用shell脚本触发一次查询。

$> ./ValkyrieDB.sh <path/to/sqlfile.sql>

您还可以使用交互式python shell来输入查询。

$> ./Shell.py
valkyrie> SELECT * FROM nation;

当运行python shell时,Valkyrie除了查询结果外,还会输出两条调试信息。

  1. 收到的查询计划
  2. 生成的LLVM

数据和查询

默认情况下,项目会在测试/数据中查找数据。要针对名为foo的表运行查询,请将foo的数据作为| test / data文件夹中的文件名为foo.tbl,Valkyrie会找到它。

可以在test / sql / testcases中找到一些基于支持的TPC-H模式的示例查询。

要运行这些查询,首先使用任何tpch-dbgen生成TPC-H表,然后放入数据文件夹中的.tbl文件。或者,您也可以使用test / tpch-data-setup.py <scale-factor>脚本。比例因子1将生成1 GB数据集。此脚本还用于设置SQLite数据库test / tpch.db,该测试可用于验证Valkyrie的结果,并在性能方面进行比较。

为了让Valkyrie从其他目录中查找数据,我们必须单独运行Mushroom Cloud和Valkyrie才能访问Mushroom Cloud的标志。

$> java -jar ./bin/MushroomCloud.jar --data <path/to/data> [sqlfile1.sql sqlfile2.sql ...] > [tempfile.json]
$> ./bin/llvmruntime < [tempfile.json]

要让Mushroom Cloud评估查询计划,请使用--exec标志。

示例查询运行

让我们首先启动shell。

$> ./Shell.py

现在你得到这样的提示 -

valkyrie>

现在让我们输入一个查询 -

valkyrie> select nation.name, region.name from nation, region where nation.regionkey = region.regionkey;

这会产生这个输出 -

PRINT[]
    PROJECT[nation.name, region.name]
        JOIN[nation.regionkey = region.regionkey]
            TABLE[NATION=/home/arindam/scratch/ValkyrieDB/./test/data/nation.tbl, nationkey=int, name=char, regionkey=int, comment=varchar]
            TABLE[REGION=/home/arindam/scratch/ValkyrieDB/./test/data/region.tbl, regionkey=int, name=char, comment=varchar]

; ModuleID = 'LLVM'

@0 = private unnamed_addr constant [5 x i8] c"%lu|\00"
@1 = private unnamed_addr constant [5 x i8] c"%lf|\00"
@2 = private unnamed_addr constant [4 x i8] c"%s|\00"
@3 = private unnamed_addr constant [4 x i8] c"%s|\00"
@4 = private unnamed_addr constant [2 x i8] c"\0A\00"

declare i32 @printf(i8*, ...)

define i32 @llvmStart() {
entry:
  br label %blockloader0

blockloader0:                                     ; preds = %loopBody4, %entry
  %0 = call i64 @schemaController(i64 22214032, i64 2)
  %1 = icmp ne i64 %0, 0
  br i1 %1, label %loophead2, label %afterblockloader1

afterblockloader1:                                ; preds = %blockloader0
  %2 = call i64 @schemaController(i64 22214032, i64 3)
  br label %blockloader5

loophead2:                                        ; preds = %blockloader0
  %3 = call i64 @schemaController(i64 22214032, i64 0)
  %4 = inttoptr i64 %3 to i64**
  %loopVar3 = alloca i64
  store i64 0, i64* %loopVar3
  br label %loopBody4

loopBody4:                                        ; preds = %loopBody4, %loophead2
  %5 = load i64* %loopVar3
  %6 = getelementptr inbounds i64** %4, i64 %5
  %7 = load i64** %6
  %8 = getelementptr inbounds i64* %7, i32 0
  %9 = load i64* %8
  store i64 %9, i64* inttoptr (i64 22218864 to i64*)
  %10 = getelementptr inbounds i64* %7, i32 1
  %11 = load i64* %10
  store i64 %11, i64* getelementptr (i64* inttoptr (i64 22218864 to i64*), i32 1)
  %12 = getelementptr inbounds i64* %7, i32 2
  %13 = load i64* %12
  store i64 %13, i64* getelementptr (i64* inttoptr (i64 22218864 to i64*), i32 2)
  %14 = getelementptr inbounds i64* %7, i32 0
  %15 = load i64* %14
  store i64 %15, i64* inttoptr (i64 22219792 to i64*)
  call void @hasher(i64 22280736, i64* inttoptr (i64 22219792 to i64*), i32 1, i64* inttoptr (i64 22218864 to i64*), i32 3)
  %16 = add i64 %5, i32 1
  store i64 %16, i64* %loopVar3
  %17 = icmp slt i64 %16, %0
  br i1 %17, label %loopBody4, label %blockloader0

blockloader5:                                     ; preds = %loopBody9, %afterblockloader1
  %18 = call i64 @schemaController(i64 22281360, i64 2)
  %19 = icmp ne i64 %18, 0
  br i1 %19, label %loophead7, label %afterblockloader6

afterblockloader6:                                ; preds = %blockloader5
  %20 = call i64 @schemaController(i64 22281360, i64 3)
  %21 = call i64 @schemaController(i64 22285248, i64 0)
  %22 = inttoptr i64 %21 to i64**
  %23 = call i64 @schemaController(i64 22285248, i64 1)
  %loopVar10 = alloca i64
  store i64 0, i64* %loopVar10
  br label %condCheck11

loophead7:                                        ; preds = %blockloader5
  %24 = call i64 @schemaController(i64 22281360, i64 0)
  %25 = inttoptr i64 %24 to i64**
  %loopVar8 = alloca i64
  store i64 0, i64* %loopVar8
  br label %loopBody9

loopBody9:                                        ; preds = %loopBody9, %loophead7
  %26 = load i64* %loopVar8
  %27 = getelementptr inbounds i64** %25, i64 %26
  %28 = load i64** %27
  %29 = getelementptr inbounds i64* %28, i32 0
  %30 = load i64* %29
  store i64 %30, i64* inttoptr (i64 22247328 to i64*)
  %31 = getelementptr inbounds i64* %28, i32 1
  %32 = load i64* %31
  store i64 %32, i64* getelementptr (i64* inttoptr (i64 22247328 to i64*), i32 1)
  %33 = getelementptr inbounds i64* %28, i32 2
  %34 = load i64* %33
  store i64 %34, i64* getelementptr (i64* inttoptr (i64 22247328 to i64*), i32 2)
  %35 = getelementptr inbounds i64* %28, i32 3
  %36 = load i64* %35
  store i64 %36, i64* getelementptr (i64* inttoptr (i64 22247328 to i64*), i32 3)
  %37 = getelementptr inbounds i64* %28, i32 2
  %38 = load i64* %37
  store i64 %38, i64* inttoptr (i64 22246256 to i64*)
  call void @joiner(i64 22280736, i64* inttoptr (i64 22246256 to i64*), i32 1, i64* inttoptr (i64 22247328 to i64*), i32 4)
  %39 = add i64 %26, i32 1
  store i64 %39, i64* %loopVar8
  %40 = icmp slt i64 %39, %18
  br i1 %40, label %loopBody9, label %blockloader5

condCheck11:                                      ; preds = %loopBody12, %afterblockloader6
  %41 = load i64* %loopVar10
  %42 = icmp slt i64 %41, %23
  br i1 %42, label %loopBody12, label %afterloop13

loopBody12:                                       ; preds = %condCheck11
  %43 = load i64* %loopVar10
  %44 = getelementptr inbounds i64** %22, i64 %43
  %45 = load i64** %44
  %46 = getelementptr inbounds i64* %45, i32 4
  %47 = load i64* %46
  %48 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @2, i32 0, i32 0), i64 %47)
  %49 = getelementptr inbounds i64* %45, i32 1
  %50 = load i64* %49
  %51 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @2, i32 0, i32 0), i64 %50)
  %52 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([2 x i8]* @4, i32 0, i32 0))
  %53 = add i64 %43, 1
  store i64 %53, i64* %loopVar10
  br label %condCheck11

afterloop13:                                      ; preds = %condCheck11
  %54 = call i64 @schemaController(i64 22285248, i64 3)
  ret i32 0
}

declare void @hasher(i64, i64, i32, i64, i32)

declare void @joiner(i64, i64, i32, i64, i32)

declare i64 @schemaController(i64, i32)

Compilation successful, now executing...

ALGERIA|AFRICA|
ARGENTINA|AMERICA|
BRAZIL|AMERICA|
CANADA|AMERICA|
EGYPT|MIDDLE EAST|
ETHIOPIA|AFRICA|
FRANCE|EUROPE|
GERMANY|EUROPE|
INDIA|ASIA|
INDONESIA|ASIA|
IRAN|MIDDLE EAST|
IRAQ|MIDDLE EAST|
JAPAN|ASIA|
JORDAN|MIDDLE EAST|
KENYA|AFRICA|
MOROCCO|AFRICA|
MOZAMBIQUE|AFRICA|
PERU|AMERICA|
CHINA|ASIA|
ROMANIA|EUROPE|
SAUDI ARABIA|MIDDLE EAST|
VIETNAM|ASIA|
RUSSIA|EUROPE|
UNITED KINGDOM|EUROPE|
UNITED STATES|AMERICA|

valkyrie> 

本文使用googletrans自动翻译,仅供参考, 原文来自github.com

en_README.md

ValkyrieDB

A SQL-to-native-code compiler based on LLVM

ValkyrieDB is a simple query processing engine that can take any query plan in JSON format and turn it into LLVM assembly code. Right now the system has support for Selection, Projection and Equi-joins, and expressions based on numeric datatypes.

ValkyrieDB uses a Java project called Mushroom Cloud to convert SQL to a JSON-formatted query plan. Mushroom Cloud is based on this GitHubber's SQL-Evaluator project. It uses JSqlParser to parse the SQL, and builds a relational-algebra tree, applies optimization passes on that tree and evaluates it. Mushroom Cloud gets rid of the evaluation and instead just prints out the optimized plan that is fed into ValkyrieDB.

Build

$> make

This will build both MushroomCloud and Valkyrie. The binaries will get built in bin/.

Run

You can run this project in two ways. Use a shell script to fire off one time queries.

$> ./ValkyrieDB.sh <path/to/sqlfile.sql>

You can also use an interactive python shell where you can type in your queries.

$> ./Shell.py
valkyrie> SELECT * FROM nation;

When running the python shell, Valkyrie will in additon to the results of the query, also output two pieces of debugging information.

  1. The query plan received
  2. The generated LLVM

Data and queries

The project by default looks for data in test/data. To run queries against a table called foo drop in foo's data as a | separated file in the test/data folder with the name foo.tbl and Valkyrie will find it.

Some sample queries based on the TPC-H Schemas that are supported can be found in test/sql/testcases.

To run these queries, first generate the TPC-H tables using any tpch-dbgen and drop in the .tbl files in the data folder. Alternatively, you could also use the test/tpch-data-setup.py <scale-factor> script. Scale-factor 1 will generate a 1 GB dataset. This script is also used to setup a SQLite database test/tpch.db which can be used to validate Valkyrie's results and have a point of comparison with regards to performance.

To get Valkyrie to find data from other directories, we have to run Mushroom Cloud and Valkyrie separately to have access to Mushroom Cloud's flags.

$> java -jar ./bin/MushroomCloud.jar --data <path/to/data> [sqlfile1.sql sqlfile2.sql ...] > [tempfile.json]
$> ./bin/llvmruntime < [tempfile.json]

To get Mushroom Cloud to evaluate the query plan, use the --exec flag.

Sample query run

Let's first fire up the shell.

$> ./Shell.py

Now you get a prompt like this -

valkyrie>

Now let's type in a query -

valkyrie> select nation.name, region.name from nation, region where nation.regionkey = region.regionkey;

This generates this output -

PRINT[]
    PROJECT[nation.name, region.name]
        JOIN[nation.regionkey = region.regionkey]
            TABLE[NATION=/home/arindam/scratch/ValkyrieDB/./test/data/nation.tbl, nationkey=int, name=char, regionkey=int, comment=varchar]
            TABLE[REGION=/home/arindam/scratch/ValkyrieDB/./test/data/region.tbl, regionkey=int, name=char, comment=varchar]

; ModuleID = 'LLVM'

@0 = private unnamed_addr constant [5 x i8] c"%lu|\00"
@1 = private unnamed_addr constant [5 x i8] c"%lf|\00"
@2 = private unnamed_addr constant [4 x i8] c"%s|\00"
@3 = private unnamed_addr constant [4 x i8] c"%s|\00"
@4 = private unnamed_addr constant [2 x i8] c"\0A\00"

declare i32 @printf(i8*, ...)

define i32 @llvmStart() {
entry:
  br label %blockloader0

blockloader0:                                     ; preds = %loopBody4, %entry
  %0 = call i64 @schemaController(i64 22214032, i64 2)
  %1 = icmp ne i64 %0, 0
  br i1 %1, label %loophead2, label %afterblockloader1

afterblockloader1:                                ; preds = %blockloader0
  %2 = call i64 @schemaController(i64 22214032, i64 3)
  br label %blockloader5

loophead2:                                        ; preds = %blockloader0
  %3 = call i64 @schemaController(i64 22214032, i64 0)
  %4 = inttoptr i64 %3 to i64**
  %loopVar3 = alloca i64
  store i64 0, i64* %loopVar3
  br label %loopBody4

loopBody4:                                        ; preds = %loopBody4, %loophead2
  %5 = load i64* %loopVar3
  %6 = getelementptr inbounds i64** %4, i64 %5
  %7 = load i64** %6
  %8 = getelementptr inbounds i64* %7, i32 0
  %9 = load i64* %8
  store i64 %9, i64* inttoptr (i64 22218864 to i64*)
  %10 = getelementptr inbounds i64* %7, i32 1
  %11 = load i64* %10
  store i64 %11, i64* getelementptr (i64* inttoptr (i64 22218864 to i64*), i32 1)
  %12 = getelementptr inbounds i64* %7, i32 2
  %13 = load i64* %12
  store i64 %13, i64* getelementptr (i64* inttoptr (i64 22218864 to i64*), i32 2)
  %14 = getelementptr inbounds i64* %7, i32 0
  %15 = load i64* %14
  store i64 %15, i64* inttoptr (i64 22219792 to i64*)
  call void @hasher(i64 22280736, i64* inttoptr (i64 22219792 to i64*), i32 1, i64* inttoptr (i64 22218864 to i64*), i32 3)
  %16 = add i64 %5, i32 1
  store i64 %16, i64* %loopVar3
  %17 = icmp slt i64 %16, %0
  br i1 %17, label %loopBody4, label %blockloader0

blockloader5:                                     ; preds = %loopBody9, %afterblockloader1
  %18 = call i64 @schemaController(i64 22281360, i64 2)
  %19 = icmp ne i64 %18, 0
  br i1 %19, label %loophead7, label %afterblockloader6

afterblockloader6:                                ; preds = %blockloader5
  %20 = call i64 @schemaController(i64 22281360, i64 3)
  %21 = call i64 @schemaController(i64 22285248, i64 0)
  %22 = inttoptr i64 %21 to i64**
  %23 = call i64 @schemaController(i64 22285248, i64 1)
  %loopVar10 = alloca i64
  store i64 0, i64* %loopVar10
  br label %condCheck11

loophead7:                                        ; preds = %blockloader5
  %24 = call i64 @schemaController(i64 22281360, i64 0)
  %25 = inttoptr i64 %24 to i64**
  %loopVar8 = alloca i64
  store i64 0, i64* %loopVar8
  br label %loopBody9

loopBody9:                                        ; preds = %loopBody9, %loophead7
  %26 = load i64* %loopVar8
  %27 = getelementptr inbounds i64** %25, i64 %26
  %28 = load i64** %27
  %29 = getelementptr inbounds i64* %28, i32 0
  %30 = load i64* %29
  store i64 %30, i64* inttoptr (i64 22247328 to i64*)
  %31 = getelementptr inbounds i64* %28, i32 1
  %32 = load i64* %31
  store i64 %32, i64* getelementptr (i64* inttoptr (i64 22247328 to i64*), i32 1)
  %33 = getelementptr inbounds i64* %28, i32 2
  %34 = load i64* %33
  store i64 %34, i64* getelementptr (i64* inttoptr (i64 22247328 to i64*), i32 2)
  %35 = getelementptr inbounds i64* %28, i32 3
  %36 = load i64* %35
  store i64 %36, i64* getelementptr (i64* inttoptr (i64 22247328 to i64*), i32 3)
  %37 = getelementptr inbounds i64* %28, i32 2
  %38 = load i64* %37
  store i64 %38, i64* inttoptr (i64 22246256 to i64*)
  call void @joiner(i64 22280736, i64* inttoptr (i64 22246256 to i64*), i32 1, i64* inttoptr (i64 22247328 to i64*), i32 4)
  %39 = add i64 %26, i32 1
  store i64 %39, i64* %loopVar8
  %40 = icmp slt i64 %39, %18
  br i1 %40, label %loopBody9, label %blockloader5

condCheck11:                                      ; preds = %loopBody12, %afterblockloader6
  %41 = load i64* %loopVar10
  %42 = icmp slt i64 %41, %23
  br i1 %42, label %loopBody12, label %afterloop13

loopBody12:                                       ; preds = %condCheck11
  %43 = load i64* %loopVar10
  %44 = getelementptr inbounds i64** %22, i64 %43
  %45 = load i64** %44
  %46 = getelementptr inbounds i64* %45, i32 4
  %47 = load i64* %46
  %48 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @2, i32 0, i32 0), i64 %47)
  %49 = getelementptr inbounds i64* %45, i32 1
  %50 = load i64* %49
  %51 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @2, i32 0, i32 0), i64 %50)
  %52 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([2 x i8]* @4, i32 0, i32 0))
  %53 = add i64 %43, 1
  store i64 %53, i64* %loopVar10
  br label %condCheck11

afterloop13:                                      ; preds = %condCheck11
  %54 = call i64 @schemaController(i64 22285248, i64 3)
  ret i32 0
}

declare void @hasher(i64, i64, i32, i64, i32)

declare void @joiner(i64, i64, i32, i64, i32)

declare i64 @schemaController(i64, i32)

Compilation successful, now executing...

ALGERIA|AFRICA|
ARGENTINA|AMERICA|
BRAZIL|AMERICA|
CANADA|AMERICA|
EGYPT|MIDDLE EAST|
ETHIOPIA|AFRICA|
FRANCE|EUROPE|
GERMANY|EUROPE|
INDIA|ASIA|
INDONESIA|ASIA|
IRAN|MIDDLE EAST|
IRAQ|MIDDLE EAST|
JAPAN|ASIA|
JORDAN|MIDDLE EAST|
KENYA|AFRICA|
MOROCCO|AFRICA|
MOZAMBIQUE|AFRICA|
PERU|AMERICA|
CHINA|ASIA|
ROMANIA|EUROPE|
SAUDI ARABIA|MIDDLE EAST|
VIETNAM|ASIA|
RUSSIA|EUROPE|
UNITED KINGDOM|EUROPE|
UNITED STATES|AMERICA|

valkyrie>