Version: 9.2

Base Datasets

Base datasets are a high-performance, scalable dataset type in Qrvey, designed to solve slow data loads when a dataset acts as a source for other datasets. Instead of extracting data from Elasticsearch indexes, base datasets store their output as files in a data lake (such as S3 or Blob Storage), allowing direct, fast retrieval for joins, unions, and managed dataset creation. Base datasets cannot be used directly for analytics, dashboards, or reports—they are only used as sources for managed datasets.

Note: Managed datasets, dataset views, and base datasets are only available in Qrvey Ultra.

Why Use Base Datasets?

Base datasets are pre-processed datasets optimized for:

Fast, scalable data loading from files (not indexes).
Efficient joins and unions when creating managed datasets.
Handling large data volumes and complex transformations.

Key Features

Data is stored in a lake as files, enabling direct, high-speed access.
All dataset capabilities (formatting, sync, transformation) are applied before storage.
Improved performance for managed dataset creation and syncs.
Support for large-scale data loads (for example, 1M+ records).
Robust error handling and data integrity during reloads.

Use Case

Intended as efficient join sources for managed datasets; mainly used for performance optimization.

How Base Datasets Work

Traditionally, Qrvey stored dataset information in Elasticsearch indexes, which made extraction slow when using datasets as sources for other datasets. Base datasets solve this by storing the output as files in a data lake, leveraging the fastest data processing path in Qrvey.

Performance Enhancements

Data is split into files and "baskets" for parallel processing and fast loading.
Managed datasets created from base datasets can join or union multiple sources directly from the lake, eliminating slow extraction steps.
Future syncs and reloads are faster due to optimized file storage and incremental updates.

Create a Base Dataset

Go to Data > Datasets.
Select Create New Dataset > New Base Dataset.
Select a data source (existing connection or new connection).
Configure data sync and transformation options as needed.
Select Save to create the base dataset.
Load the dataset to begin processing data. The output is stored in the data lake as files.

Note: Base datasets are not available for direct analytics or reporting. They are only used as sources for managed datasets.

Base datasets can be shared. To share a base dataset, use the sharing options in the dataset settings. Shared base datasets can be referenced in other applications, but note the limitation regarding dataset views.

Save as Base Dataset (from Managed Dataset)

You can use the Save as Base Dataset feature to create a base dataset from any managed dataset, including those with geolocation or internationalization enabled, or those that are shared. If you save a shared managed dataset as a base dataset, the new base dataset is not be shared by default.

Saving as a base dataset optimizes storage and performance for downstream managed datasets.

Note: Dataset views cannot be created from shared base datasets.

Use Base Datasets to Create Managed Datasets

Managed datasets can be created by joining or unioning one or more base datasets.
The join process is highly optimized for speed and efficiency, as data is read directly from the lake files.
Future syncs and reloads of managed datasets are faster, as the underlying base datasets are optimized for fast loading and incremental updates.

Limitations and Considerations

Base datasets are available in Qrvey v9.1 and later.
In Qrvey v9.2 and later, Join Lake Optimization improves file maintenance and reduces duplicate reads for base datasets.
This dataset type cannot be used for analytics, dashboards, or reports.
Shared base datasets cannot be used as sources for dataset views.
Saving a shared managed dataset as a base dataset does not make the base dataset shared by default.
Data lake management is required for storage, especially for full reloads and compaction.
Sync options (append/update) are supported, but maintenance of many files and duplicate reads should be considered.
Column discovery can differ when consuming data directly from files.
The Analyze tab is hidden for base datasets, but the data can still be reviewed in a table view (only the first records are shown, not the entire dataset).
The following features are not available for base datasets:
- Advanced Tab
- Visualization format
- Geolocations
- Internationalization

Why Use Base Datasets?​

Key Features​

Use Case​

How Base Datasets Work​

Performance Enhancements​

Create a Base Dataset​

Share a Base Dataset​

Save as Base Dataset (from Managed Dataset)​

Use Base Datasets to Create Managed Datasets​

Limitations and Considerations​