Skip to main content
Version: 9.3

Troubleshooting the Qrvey Instance

This troubleshooting guide is intended for customers who have deployed the Qrvey Platform in their own AWS or Azure cloud accounts. The Qrvey Platform runs as a Kubernetes cluster, providing a scalable and resilient environment for analytics workloads. This document will help you connect to your Qrvey Kubernetes cluster and use basic troubleshooting commands to investigate and resolve common issues.

Connect to Your Kubernetes Cluster

To troubleshoot your Qrvey Platform, you need to connect to the Kubernetes cluster running in your cloud environment. The Qrvey Platform cluster is named qrvey-eks<prefix>, where <prefix> is a unique set of characters representing your instance ID.

AWS (Amazon Web Services)

  1. Navigate to the AWS CloudShell in your AWS account.

  2. Ensure your AWS CLI is configured with the correct region and permissions.

  3. Run the following command to update your kubeconfig for the Qrvey cluster:

    aws eks update-kubeconfig --region <region> --name qrvey-eks<prefix>

    Replace <region> with your AWS region and <prefix> with your instance ID.

  4. Test your connection:

    kubectl get nodes

For more information, see the Access cluster with kubectl or AWS EKS documentation.

Azure

  1. Open the Azure Cloud Shell in your Azure portal.

  2. Log into Azure with the correct subscription.

  3. Get credentials for your Qrvey cluster:

    az aks get-credentials --resource-group <resource-group> --name qrvey-eks<prefix>

    Replace <resource-group> and <prefix> with your values.

  4. Test your connection with:

    kubectl get nodes

For more information, see the Azure AKS documentation.

Qrvey Platform Namespaces

Namespaces in Kubernetes help organize and isolate resources within your cluster. A Qrvey deployment uses several namespaces to separate different components and workloads, making it easier to manage and troubleshoot your Qrvey Platform deployment.

Tip: You can list all namespaces in your cluster with:

kubectl get namespaces

A Qrvey deployment includes the following main namespaces:

  • qrveyapps: Primary namespace where all Qrvey microservices are deployed and running.
  • qrveyapps-cronjobs: Used to run dataset syncs and export schedules.
  • qrveyapp-jobs: Used to run export jobs, long-running queries, and garbage collection tasks.
  • elastic-system: Contains the nodes for Elasticsearch (if deployed as part of your platform).
  • rabbitmq: Hosts the RabbitMQ message broker, used for inter-service communication.
  • kong: Contains the Kong API gateway, which manages and secures API traffic.
  • redis: Hosts the Redis in-memory data store, used for caching and other purposes.

Basic Investigation Steps

When debugging issues in your Qrvey Platform deployment, check the following general areas:

Qrvey Microservice Pods

List all pods in the main Qrvey namespace and review their status:

kubectl get pods -n qrveyapps

All pods should be in the Running or Completed state. If any pods are in CrashLoopBackOff, Error, or Pending states, further investigation is needed.

Pod States

  • CrashLoopBackOff: The pod is repeatedly crashing after starting. This usually indicates a problem with the application or its configuration.
  • Error: The pod has encountered an error and cannot start or run properly. This can be due to misconfiguration, missing dependencies, or application bugs.
  • Pending: The pod is waiting to be scheduled, often due to insufficient resources or unsatisfied scheduling requirements.

Debug Pods

To investigate a pod that is not running as expected, you can identify the pod name from the output of the previous command. Use the following commands to get more information:

  • View pod logs:

    kubectl logs -n qrveyapps <pod-name>
  • Describe the pod (for events and details):

    kubectl describe pod -n qrveyapps <pod-name>

Check the logs for error messages and the describe output for recent events, resource issues, or other clues about why the pod is not running.

CronJobs

List all cronjobs in the qrveyapps namespace and ensure none are suspended:

kubectl get cronjobs -n qrveyapps

Look for the SUSPEND column. All cronjobs should show False under this column. If any are suspended, they will not run as scheduled.

External Dependencies

Verify the health of external tools that Qrvey depends on. Check the pods in the following namespaces:

  • Kong API Gateway

    kubectl get pods -n kong
  • RabbitMQ

    kubectl get pods -n rabbitmq
  • Redis

    kubectl get pods -n redis
  • Elasticsearch (if deployed)

    kubectl get pods -n elastic-system

All pods in these namespaces should be running. If you see issues, check pod logs and events for more details.

Restart Qrvey Pods (Deployments)

If you need to restart one or more containers (pods) in your Qrvey Platform, you should restart the corresponding Kubernetes deployment. This will cause Kubernetes to terminate the existing pods and create new ones.

Step 1: List Deployments

List all deployments in the relevant namespace (for example, qrveyapps):

kubectl get deployments -n <namespace>

Identify the deployment that matches the pod you want to restart.

Step 2: Restart a Deployment

To restart a specific deployment:

kubectl rollout restart deploy <deployment-name> -n <namespace>

To restart all deployments in a namespace:

kubectl rollout restart deploy -n <namespace>

Step 3: Check Rollout Status

After restarting, monitor the rollout status to ensure the new pods are coming up successfully:

kubectl rollout status deploy <deployment-name> -n <namespace>

Wait for the rollout to complete and verify that the pods are running as expected.