How to export data in different formats

This guide shows you which format to use and what to expect during each export.

Formats at a glance

Format	Extension	Best for
JSONL	`.jsonl`	Scripts, Python, `jq` pipelines, maximum field coverage
Compressed JSONL	`.jsonl.gz`	Storage, large transfers, feeding into data pipelines
CSV	`.csv`	Excel, Google Sheets, BI tools, human review
Parquet	`.parquet`	Analytics, DuckDB, Pandas, columnar processing

How to specify the format

Pass the desired extension to --download:

# JSONL
landbase-cli "find fintech companies in NYC" --download=results.jsonl

# CSV
landbase-cli "find fintech companies in NYC" --download=results.csv

# Compressed JSONL
landbase-cli "find fintech companies in NYC" --download=results.jsonl.gz

# Parquet
landbase-cli "find fintech companies in NYC" --download=results.parquet

For dataset downloads, use:

landbase-cli datasets download <dataset-id> ./output.csv
landbase-cli datasets download <dataset-id> ./output.parquet

What happens during each export

JSONL and CSV

Both JSONL and CSV trigger a publish workflow before downloading. Landbase runs the publish step asynchronously — it formats and flattens the data, then makes it available for download. This usually takes 30–90 seconds. You will see a progress indicator while the publish step runs. The CLI polls until the file is ready.

Compressed JSONL

Same as JSONL — triggers a publish workflow. The output is compressed with gzip. Unzip with:

gunzip results.jsonl.gz

Or read without unzipping:

zcat results.jsonl.gz | jq .

Parquet

Parquet downloads the native dataset bytes directly — no publish step. This is the fastest format for large datasets. Use it when feeding results into DuckDB, Pandas, or Spark:

import pandas as pd
df = pd.read_parquet("results.parquet")

Choosing between JSONL and CSV

Use JSONL when:

You are writing a script or piping through jq
You want every available field (CSV may truncate or flatten nested fields)
You are loading data into Python or a database

Use CSV when:

You are opening in Excel or Google Sheets
You are handing off to someone non-technical
A BI tool requires it

Publishing a dataset you already have

If you already have a dataset from a prior run and want to download it in a new format:

landbase-cli workflow publish <dataset-id> --format=csv --wait
CHILD=$(landbase-cli datasets lineage <dataset-id> --direction=children \
  | jq -r '[.[] | select(.workflow_type == "publish")] | last | .id')
landbase-cli datasets download $CHILD ./output.csv

How to chain workflows — publish as part of a full pipeline
datasets reference — datasets download flags
workflow reference — workflow publish flags

Get Started

Tutorials

How-to Guides

Capabilities

Explanation

Resources

Support

How to export data in different formats

How to export data in different formats

Formats at a glance

How to specify the format

What happens during each export

JSONL and CSV

Compressed JSONL

Parquet

Choosing between JSONL and CSV

Publishing a dataset you already have

​How to export data in different formats

​Formats at a glance

​How to specify the format

​What happens during each export

​JSONL and CSV

​Compressed JSONL

​Parquet

​Choosing between JSONL and CSV

​Publishing a dataset you already have

​Related

How to export data in different formats

Formats at a glance

How to specify the format

What happens during each export

JSONL and CSV

Compressed JSONL

Parquet

Choosing between JSONL and CSV

Publishing a dataset you already have

Related