Skip to main content

How to export data in different formats

This guide shows you which format to use and what to expect during each export.

Formats at a glance

FormatExtensionBest for
JSONL.jsonlScripts, Python, jq pipelines, maximum field coverage
Compressed JSONL.jsonl.gzStorage, large transfers, feeding into data pipelines
CSV.csvExcel, Google Sheets, BI tools, human review
Parquet.parquetAnalytics, DuckDB, Pandas, columnar processing

How to specify the format

Pass the desired extension to --download:
# JSONL
landbase-cli "find fintech companies in NYC" --download=results.jsonl

# CSV
landbase-cli "find fintech companies in NYC" --download=results.csv

# Compressed JSONL
landbase-cli "find fintech companies in NYC" --download=results.jsonl.gz

# Parquet
landbase-cli "find fintech companies in NYC" --download=results.parquet
For dataset downloads, use:
landbase-cli datasets download <dataset-id> ./output.csv
landbase-cli datasets download <dataset-id> ./output.parquet

What happens during each export

JSONL and CSV

Both JSONL and CSV trigger a publish workflow before downloading. Landbase runs the publish step asynchronously — it formats and flattens the data, then makes it available for download. This usually takes 30–90 seconds. You will see a progress indicator while the publish step runs. The CLI polls until the file is ready.

Compressed JSONL

Same as JSONL — triggers a publish workflow. The output is compressed with gzip. Unzip with:
gunzip results.jsonl.gz
Or read without unzipping:
zcat results.jsonl.gz | jq .

Parquet

Parquet downloads the native dataset bytes directly — no publish step. This is the fastest format for large datasets. Use it when feeding results into DuckDB, Pandas, or Spark:
import pandas as pd
df = pd.read_parquet("results.parquet")

Choosing between JSONL and CSV

Use JSONL when:
  • You are writing a script or piping through jq
  • You want every available field (CSV may truncate or flatten nested fields)
  • You are loading data into Python or a database
Use CSV when:
  • You are opening in Excel or Google Sheets
  • You are handing off to someone non-technical
  • A BI tool requires it

Publishing a dataset you already have

If you already have a dataset from a prior run and want to download it in a new format:
landbase-cli workflow publish <dataset-id> --format=csv --wait
CHILD=$(landbase-cli datasets lineage <dataset-id> --direction=children \
  | jq -r '[.[] | select(.workflow_type == "publish")] | last | .id')
landbase-cli datasets download $CHILD ./output.csv