Production Configuration for Azure
This page describes recommended configuration settings for running a production-grade Qrvey environment on Azure. The default installation settings are designed for getting started, but a production environment requires adjustments to ensure high availability, performance, and reliability under load.
Elasticsearch Cluster Configuration
Azure deployments use in-cluster Elasticsearch managed by the Elastic Cloud on Kubernetes (ECK) operator, configured through the es_config variable.
Qrvey recommends the following minimum Elasticsearch cluster configuration for production:
| Setting | Recommended Value |
|---|---|
| Size | large |
| Node Count | 3 |
The default installation uses count: 1, which provides no redundancy. A single-node cluster cannot survive a node failure, making it unsuitable for a production high availability environment.
- Three nodes are required to avoid the 'split-brain' problem, when a cluster cannot elect a primary node if quorum is lost.
- The
largesize provides sufficient JVM heap memory and pod resources for production-level data volumes and indexing throughput.
To apply the production Elasticsearch configuration, update the es_config variable in your config.json. For the full list of size options and their resource allocations, see Azure Deployment Input Variables.
"es_config": {
"size": "large",
"count": 3
}
PostgreSQL Configuration
Azure Database for PostgreSQL Flexible Server supports multiple compute tiers. The Burstable tier (B-series) accumulates CPU credits during idle periods and consumes those credits under load. When data syncs run continuously, CPU credit balances are depleted and the instance is throttled to its baseline CPU performance, which can significantly impact data load throughput and query response times.
For production workloads with continuous or high-concurrency sync activity, use the General Purpose (Standard_D series) or Memory Optimized (Standard_E series) compute tier instead. These tiers provide consistent CPU performance without relying on credit accumulation.
The PostgreSQL instance tier is not configurable through config.json. After deployment, configure the compute tier directly in the Azure Portal:
- Navigate to your Azure Database for PostgreSQL Flexible Server resource in the Azure Portal.
- Select Compute + storage under Settings.
- Change the compute tier from Burstable to General Purpose or Memory Optimized.
- Select an appropriate SKU (for example,
Standard_D4ds_v5) based on your expected workload. - Save your changes.
Note: Changing the compute tier requires a server restart, which causes brief downtime. Plan this change during a maintenance window.
Storage Account Replication
The default storage account replication type is LRS (Locally Redundant Storage), which stores three copies of your data within a single data center. If that data center experiences an outage, data might be unavailable.
For production environments, Qrvey recommends using ZRS (Zone-Redundant Storage), which replicates data across three availability zones within the same region. This ensures data remains accessible even if one availability zone experiences an outage.
To configure the replication type, set storage_account_replication_type in your config.json:
"storage_account_replication_type": "ZRS"
For deployments requiring cross-region redundancy, use GRS (Geo-Redundant Storage) or RAGRS (Read-Access Geo-Redundant Storage).
VNet and Subnet Sizing
Each data sync operation can spin up a new Kubernetes pod to handle that sync process. In environments with many concurrent syncs running in parallel, the number of pods can increase rapidly and exhaust the available IP addresses in your subnet.
The default subnet_address_prefixes value (10.220.0.0/20) already provides a /20 block with 4,091 usable IP addresses, which is sufficient for most production workloads. If you are customizing your network configuration, ensure your subnets are at least /20.
You can use your own CIDR ranges as long as they do not conflict with any other subnets in your VNet. This is especially important if you are using VNet Peering, where overlapping address spaces between peered VNets will cause routing failures.
"network_address_space": ["10.208.0.0/12"],
"subnet_address_prefixes": ["10.220.0.0/20"]
Note: Subnet sizes can only be set at initial deployment time. Plan your network address space before deploying to production.
CoreDNS Scaling
In environments with a high number of concurrent sync operations, Kubernetes relies heavily on DNS resolution to route traffic between services. The default CoreDNS deployment (typically 2 replicas) can become a bottleneck under heavy load, causing slow or failed DNS lookups that affect sync performance and reliability.
If you observe DNS resolution issues or degraded performance during periods of high concurrency, scale up the number of CoreDNS replicas.
-
Retrieve credentials for your AKS cluster, replacing
<resource-group>and<aks-cluster-name>with your values. You can find the AKS cluster name in the Azure Portal under your resource group:az aks get-credentials --resource-group <resource-group> --name <aks-cluster-name> -
Scale the CoreDNS deployment to the desired number of replicas. Five replicas are a common starting point for high-concurrency environments:
kubectl scale deployment coredns -n kube-system --replicas=5 -
Verify the pods are running:
kubectl get pods -n kube-system -l k8s-app=kube-dns
Adjust the replica count based on the volume of concurrent activity in your environment.
Additional Resources
- Configure Monitoring and Logging — Enable Prometheus, Grafana, and Loki for observability in your production environment.
- Azure Deployment Input Variables — Tune
dataload_configto control autoscaling and resource limits for dataset loading microservices. - Synapse Configuration — Configure Azure Synapse Analytics for data joining operations.