Skip to main content
Version: 9.2

Troubleshooting Guide

Overview

This troubleshooting guide is intended for customers who have deployed the Qrvey Platform in their own AWS or Azure cloud accounts. The Qrvey Platform runs as a Kubernetes cluster, providing a scalable and resilient environment for analytics workloads. This document will help you connect to your Qrvey Kubernetes cluster and use basic troubleshooting commands to investigate and resolve common issues.

Connecting to Your Kubernetes Cluster

To troubleshoot your Qrvey Platform, you first need to connect to the Kubernetes cluster running in your cloud environment. The Qrvey Platform cluster is named qrvey-eks<prefix>, where <prefix> is a unique set of characters representing your instance ID.

AWS (Amazon Web Services)

  1. Open AWS CloudShell: Navigate to the AWS CloudShell in your AWS account.
  2. Configure AWS CLI: Ensure your AWS CLI is configured with the correct region and permissions.
  3. Update kubeconfig: Run the following command to update your kubeconfig for the Qrvey cluster:
    aws eks update-kubeconfig --region <region> --name qrvey-eks<prefix>
    Replace <region> with your AWS region and <prefix> with your instance ID.
  4. Verify Connection: Test your connection with:
    kubectl get nodes
  5. For more details, see the Access cluster with kubectl or AWS EKS documentation.

Azure

  1. Open Azure Cloud Shell: Go to the Azure Cloud Shell in your Azure portal.
  2. Login to Azure: Make sure you are logged in with the correct subscription.
  3. Get Credentials: Run the following command to get credentials for your Qrvey cluster:
    az aks get-credentials --resource-group <resource-group> --name qrvey-eks<prefix>
    Replace <resource-group> and <prefix> with your values.
  4. Verify Connection: Test your connection with:
    kubectl get nodes
  5. For more details, see the Azure AKS documentation.

Qrvey Platform Namespaces

Namespaces in Kubernetes help organize and isolate resources within your cluster. In a Qrvey deployment, several namespaces are used for different purposes.

Tip: You can list all namespaces in your cluster with:

kubectl get namespaces

The main namespaces you will see in a Qrvey deployment are:

  • qrveyapps: This is the primary namespace where all Qrvey microservices are deployed and running.
  • qrveyapps-cronjobs: Used to run dataset syncs and export schedules.
  • qrveyapp-jobs: Used to run export jobs, long-running queries, and garbage collection tasks.
  • elastic-system: Contains the nodes for Elasticsearch (if deployed as part of your platform).
  • rabbitmq: Hosts the RabbitMQ message broker, used for inter-service communication.
  • kong: Contains the Kong API gateway, which manages and secures API traffic.
  • redis: Hosts the Redis in-memory data store, used for caching and other purposes.

These namespaces help separate different components and workloads, making it easier to manage and troubleshoot your Qrvey Platform deployment.

Basic Investigation Steps

When starting to debug or investigate issues in your Qrvey Platform deployment, begin with these basic checks:

1. Check Qrvey Microservice Pods

List all pods in the main Qrvey namespace and review their status:

kubectl get pods -n qrveyapps

All pods should be in the Running or Completed state. If any pods are in CrashLoopBackOff, Error, or Pending states, further investigation is needed.

Understanding Pod States

  • CrashLoopBackOff: The pod is repeatedly crashing after starting. This usually indicates a problem with the application or its configuration.
  • Error: The pod has encountered an error and cannot start or run properly. This may be due to misconfiguration, missing dependencies, or application bugs.
  • Pending: The pod is waiting to be scheduled, often due to insufficient resources or unsatisfied scheduling requirements.

Investigating Problematic Pods

To investigate a pod that is not running as expected, first identify the pod name from the output of the previous command. Then, use the following commands to get more information:

  • View pod logs:
    kubectl logs -n qrveyapps <pod-name>
  • Describe the pod (for events and details):
    kubectl describe pod -n qrveyapps <pod-name>

Check the logs for error messages and the describe output for recent events, resource issues, or other clues about why the pod is not running.

2. Check CronJobs

List all cronjobs in the qrveyapps namespace and ensure none are suspended:

kubectl get cronjobs -n qrveyapps

Look for the SUSPEND column. All cronjobs should show False under this column. If any are suspended, they will not run as scheduled.

3. Check External Dependencies

Verify the health of external tools that Qrvey depends on. Check the pods in the following namespaces:

  • Kong API Gateway
    kubectl get pods -n kong
  • RabbitMQ
    kubectl get pods -n rabbitmq
  • Redis
    kubectl get pods -n redis
  • Elasticsearch (if deployed)
    kubectl get pods -n elastic-system

All pods in these namespaces should be running. If you see issues, check pod logs and events for more details.

Restarting Qrvey Pods (Deployments)

If you need to restart one or more containers (pods) in your Qrvey Platform, you should restart the corresponding Kubernetes deployment. This will cause Kubernetes to terminate the existing pods and create new ones.

1. List Deployments

First, list all deployments in the relevant namespace (for example, qrveyapps):

kubectl get deployments -n <namespace>

Identify the deployment that matches the pod you want to restart.

2. Restart a Deployment

To restart a specific deployment:

kubectl rollout restart deploy <deployment-name> -n <namespace>

To restart all deployments in a namespace:

kubectl rollout restart deploy -n <namespace>

3. Check Rollout Status

After restarting, monitor the rollout status to ensure the new pods are coming up successfully:

kubectl rollout status deploy <deployment-name> -n <namespace>

Wait for the rollout to complete and verify that the pods are running as expected.