This repository was archived by the owner on Nov 19, 2023. It is now read-only.

PRQL/prql-query

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
TODO.md		TODO.md

Repository files navigation

prql-query (pq)

Query and transform data with PRQL

PRQL is a modern language for transforming data -- a simple, powerful, pipelined SQL replacement

pq allows you to use PRQL to easily query and transform your data. It is powered by Apache Arrow DataFusion and DuckDB and is written in Rust (so it's "blazingly fast" (tm))!

Licensed under Apache or MIT.

Examples

$ pq --from albums.csv "take 5" +----------+---------------------------------------+-----------+ | album_id | title | artist_id | +----------+---------------------------------------+-----------+ | 1 | For Those About To Rock We Salute You | 1 | | 2 | Balls to the Wall | 2 | | 3 | Restless and Wild | 2 | | 4 | Let There Be Rock | 1 | | 5 | Big Ones | 3 | +----------+---------------------------------------+-----------+ $ pq -f i=invoices.csv -f c=customers.csv --to invoices_with_names.parquet \ 'from i | join c [customer_id] | derive [name = f"{first_name} {last_name}"]' $ pq -f invoices_with_names.parquet --format json \ 'group name (aggregate [spend = sum total]) | sort [-spend] | take 10' {"name":"Helena Holy","spend":49.620000000000005} {"name":"Richard Cunningham","spend":47.620000000000005} {"name":"Luis Rojas","spend":46.62} {"name":"Hugh O'Reilly","spend":45.62} {"name":"Ladislav Kovacs","spend":45.62} {"name":"Julia Barnett","spend":43.620000000000005} {"name":"Fynn Zimmermann","spend":43.62} {"name":"Frank Ralston","spend":43.62} {"name":"Astrid Gruber","spend":42.62} {"name":"Victor Stevens","spend":42.62}

Installation

Download a binary from Github Releases

Binaries are built for Windows, macOS and Linux for every release and can be dowloaded from Releases (latest).

For example on linux you could download and install pq with:

VERSION=v0.0.14 wget https://github.com/prql/prql-query/releases/download/$VERSIO N/pq-x86_64-unknown-linux-gnu.tar.gz && \ tar xvzf pq-x86_64-unknown-linux-gnu.tar.gz --directory ~/.local/bin && \ rm pq-x86_64-unknown-linux-gnu.tar.gz

Run as a container image (Docker)

docker pull ghcr.io/prql/prql-query alias pq="docker run --rm -it -v $(pwd):/data -e HOME=/tmp -u $(id -u):$(id -g) ghcr.io/prql/prql-query" pq --help

Please note that if you want to build the container image yourself with Docker then you will need at least 10 GB of memory available to the Docker VM, otherwise libduckdb-sys will fail to compile.

Via Homebrew

brew tap prql/homebrew-prql-query brew install prql-query

Via Rust toolchain (Cargo)

cargo install prql-query

Usage

Generating SQL

At its simplest pq takes PRQL queries and transpiles them to SQL queries:

$ pq "from a | select b" SELECT b FROM a

Input can also come from stdin:

$ cat examples/queries/invoice_totals.prql | pq

For convenience, queries ending in ".prql" are assumed to be paths to PRQL query files and will be read in so this produces the same as above:

$ pq examples/queries/invoice_totals.prql

Both of these produce the output:

SELECT STRFTIME('%Y-%m', i.invoice_date) AS month, STRFTIME('%Y-%m-%d', i.invoice_date) AS day, COUNT(DISTINCT i.invoice_id) AS num_orders, SUM(ii.quantity) AS num_tracks, SUM(ii.unit_price * ii.quantity) AS total_price, SUM(SUM(ii.quantity)) OVER ( PARTITION BY STRFTIME('%Y-%m', i.invoice_date) ORDER BY STRFTIME('%Y-%m-%d', i.invoice_date) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total_num_tracks, LAG(SUM(ii.quantity), 7) OVER ( ORDER BY STRFTIME('%Y-%m-%d', i.invoice_date) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS num_tracks_last_week FROM invoices AS i JOIN invoice_items AS ii USING(invoice_id) GROUP BY STRFTIME('%Y-%m', i.invoice_date), STRFTIME('%Y-%m-%d', i.invoice_date) ORDER BY day

Querying data from a database (using CLI clients)

With the functionality described above, you should be able to query your favourite SQL RDBMS using your favourite CLI client and pq. For example with the psql client for PostgreSQL:

$ pq "from my_table | take 5" | psql postgresql://username:password@host:port/database

Or using the mysql client for MySQL with a PRQL query stored in a file:

$ pq my_query.prql | mysql -h myhost -d mydb -u myuser -p mypassword

Similarly for MS SQL Server and other databases.

Querying data in files (csv, parquet, json)

For querying and transforming data stored on the local filesystem, pq comes in with a number of built-in backend query processing engines. The default backend is Apache Arrow DataFusion. However DuckDB and SQLite (planned) are also supported.

When --from arguments are supplied which specify data files, the PRQL query will be applied to those files. The files can be referenced in the queries by the filenames without the extensions, e.g. customers.csv can be referenced as the table customers. For convenience, unless a query already begins with a from ... step, a from pipeline step will automatically be inserted at the beginning of the query referring to the last --from argument encountered, i.e. the following two are equivalent: $ pq --from examples/data/chinook/csv/invoices.csv "from invoices|take 5" $ pq --from examples/data/chinook/csv/invoices.csv "take 5" +------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+ | invoice_id | customer_id | invoice_date | billing_address | billing_city | billing_state | billing_country | billing_postal_code | total | +------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+ | 1 | 2 | 2009-01-01T00:00:00.000000000 | Theodor-Heuss-Strasse 34 | Stuttgart | | Germany | 70174 | 1.98 | | 2 | 4 | 2009-01-02T00:00:00.000000000 | Ullevalsveien 14 | Oslo | | Norway | 0171 | 3.96 | | 3 | 8 | 2009-01-03T00:00:00.000000000 | Gretrystraat 63 | Brussels | | Belgium | 1000 | 5.94 | | 4 | 14 | 2009-01-06T00:00:00.000000000 | 8210 111 ST NW | Edmonton | AB | Canada | T6G 2C7 | 8.91 | | 5 | 23 | 2009-01-11T00:00:00.000000000 | 69 Salem Street | Boston | MA | USA | 2113 | 13.86 | +------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+ You can also assign an alias for source file with the following form --from = and then refer to it by that alias in your queries. So the following is another equivalent form of the queries above: $ pq --from i=examples/data/chinook/csv/invoices.csv "from i|take 5" This works with multiple files which means that the extended example above can be run as follows: $ pq -b duckdb -f examples/data/chinook/csv/invoices.csv -f examples/data/chinook/csv/invoice_items.csv examples/queries/invoice_totals.prql Transforming data with pq and writing the output to files When a --to argument is supplied, the output will be written there in the appropriate file format instead of stdout (the "" query is equivalent to select * and is required because select * currently does not work): $ pq --from examples/data/chinook/csv/invoices.csv --to invoices.parquet "" Currently csv, parquet and json file formats are supported for both readers and writers: $ cat examples/queries/customer_totals.prql group [customer_id] ( aggregate [ customer_total = sum total, ]) $ pq -f invoices.parquet -t customer_totals.json examples/queries/customer_totals.prql $ pq -f customer_totals.json "sort [-customer_total] | take 10" +-------------+--------------------+ | customer_id | customer_total | +-------------+--------------------+ | 6 | 49.620000000000005 | | 26 | 47.620000000000005 | | 57 | 46.62 | | 46 | 45.62 | | 45 | 45.62 | | 28 | 43.620000000000005 | | 37 | 43.62 | | 24 | 43.62 | | 7 | 42.62 | | 25 | 42.62 | +-------------+--------------------+ Querying data in a DuckDB database DuckDB is natively supported and can be queried by supplying a database URI beginning with "duckdb://". $ pq --database duckdb://examples/chinook/duckdb/chinook.duckdb \ 'from albums | join artists [artist_id] | group name (aggregate [num_albums = count]) | sort [-num_albums] | take 10' Querying Sqlite databases Sqlite is currently supported through the sqlite_scanner DuckDB extension. In order to query a SQLite database, a database URI beginning with "sqlite://" needs to be supplied. $ pq --database sqlite://examples/chinook/sqlite/chinook.sqlite \ 'from albums | take 10' Querying PostgreSQL databases PostgreSQL is currently supported through the postgres-scanner DuckDB extension. (See the announcement blog post for a good introduction.) $ pq -d postgresql://username:password@host:port/database \ 'from table | take 10' One noteworthy limitation of this approach is that you can only query tables in the postgres database and not views. By default you will be connected to the "public" schema and can reference tables there within your query. You can specify a different schema to connect to using the "?currentSchema=schema" paramter. If you want to query tables from another schema outside of that then you currently have to reference these through aliased --from parameters like so: $ pq -d postgresql://username:password@host:port/database?currentSch ema=schema \ --from alias=other_schema.table 'from alias | take 10' Environment Variables If you plan to work with the same database repeatedly, then specifying the details each time quickly becomes tedious. pq allows you to supply all command line arguments from environment variables with a PQ_ prefix. So for example the same query from above could be achieved with: $ export PQ_DATABASE="postgresql://username:password@host:port/database" $ pq --from alias=schema.table 'take 10' .env files Environment variables can also be read from a .env files. Since you probably don't want to expose your database credentials at the shell, it makes sense to put these in a .env file. This also allows you to set up directories with configuration for common environments together with common queries for that environment, for example: .env $ pq 'from my_schema.my_table | take 5'">$ echo 'PQ_DATABASE="postgresql://username:password@host:port/datab ase"' > .env $ pq 'from my_schema.my_table | take 5' Or say that you have a status_query.prql that you need to run for a number of environments with .env files set up in subdirectories: $ for e in prod uat dev; do cd $e && pq ../status_query.prql; done Roadmap 0.1.0 Tests Publish to crates.io 0.2.0 Support for object stores 0.3.0 Support for other databases through connectorx About Query and transform data with PRQL Topics csv-converter json data query csv database csv-files postgresql json-files querydsl parquet parquet-files parquet-tools parquet-viewer datafusion duckdb prql Resources Readme License Apache-2.0, MIT licenses found Licenses found Apache-2.0 LICENSE-APACHE MIT LICENSE-MIT Uh oh! There was an error while loading. Please reload this page. Activity Custom properties Stars 137 stars Watchers 3 watching Forks 7 forks Report repository Releases 15 v0.0.15 Latest Mar 27, 2023 + 14 releases Packages 0 Uh oh! There was an error while loading. Please reload this page. Uh oh! There was an error while loading. Please reload this page. Contributors Uh oh! There was an error while loading. Please reload this page. Languages Rust 96.9% Dockerfile 3.1% Footer (c) 2026 GitHub, Inc. Footer navigation Terms Privacy Security Status Community Docs Contact You can't perform that action at this time.

Folders and files

Latest commit

History

Repository files navigation

prql-query (pq)

Query and transform data with PRQL

Examples

Installation

Download a binary from Github Releases

Run as a container image (Docker)

Via Homebrew

Via Rust toolchain (Cargo)

Usage

Generating SQL

Querying data from a database (using CLI clients)

Querying data in files (csv, parquet, json)

Transforming data with pq and writing the output to files

Querying data in a DuckDB database

Querying Sqlite databases

Querying PostgreSQL databases

Environment Variables

.env files

Roadmap

0.1.0

0.2.0

0.3.0

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Transforming data with `pq` and writing the output to files

Packages