How to enrich an uploaded dataset end to end
This guide shows you how to take a raw CSV of company or contact records, upload it to Landbase, run a sequence of dataset workflows to match and enrich the records, and download the finished output. This is the right approach when you have your own data (a CRM export, a spreadsheet, a scrape) and want Landbase to match records to its database and add enrichment fields.Prerequisites
- landbase-cli installed and authenticated
- A CSV or Excel file with at least one identifier column (e.g. company name, domain, or LinkedIn URL)
Step 1: Upload your file
upload command returns a JSON object with an id field. Capturing it in $DATASET_ID avoids having to copy-paste it for every subsequent command.
Step 2: Run the onboard workflow
Theonboard workflow normalizes your data, maps columns to Landbase’s schema, and prepares it for downstream workflows.
--wait blocks until the workflow completes. For large datasets this can take a few minutes.
Step 3: Run the match workflow
Thematch workflow compares each row against the Landbase database and adds a confidence score and a Landbase record ID to matched rows.
Step 4: Run the enrich workflow
Theenrich workflow pulls additional fields (industry, employee count, HQ location, LinkedIn, etc.) for matched records.
Step 5: Publish and find the output dataset
Thepublish workflow generates a downloadable file from the dataset.
Step 6: Download the output
Full pipeline as a script
enrich-pipeline.sh, make it executable with chmod +x enrich-pipeline.sh, and run it with:
Troubleshooting
Workflow fails at onboard: Check that your CSV has recognizable column headers (name, domain, email, first_name, last_name, etc.). The onboard step maps your columns — ambiguous headers can cause it to fail. No child dataset found after publish: Runlandbase-cli datasets lineage $DATASET_ID --direction=children without the jq filter to see all child datasets and their workflow types.
Timeout: Add --timeout=600 to any --wait command to allow up to 10 minutes. Large datasets take longer.
