Table of contents
Overview
This Standard Operating Procedure (SOP) provides a comprehensive guide for downloading data from KoboCollect (KoBoToolbox) to Excel format using the API v2. The process involves authentication, project selection, data export, and file download.
Prerequisites
Before starting the data export process, ensure you have the following:
- Active KoboCollect account with appropriate permissions
- API token or username/password credentials
- Knowledge of your project/form ID
- Basic understanding of REST API concepts
- Tool for making API requests (cURL, Postman, Python, etc.)
Authentication setup
KoboCollect API v2 supports multiple authentication methods. The most secure method is using API tokens.
Generate API token
- Log in to your KoboCollect account
- Navigate to Account Settings
- Go to the "API Token" section
- Click "Generate New Token"
- Copy and securely store the token
Test authentication
Test your authentication using the following cURL command:
curl -H "Authorization: Token YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
https://kf.kobotoolbox.org/api/v2/assets/
Get project list
Retrieve a list of all your projects/forms to identify the specific form you want to export data from.
curl -H "Authorization: Token YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
https://kf.kobotoolbox.org/api/v2/assets/
import requests
# Configuration
API_TOKEN = "YOUR_API_TOKEN"
BASE_URL = "https://kf.kobotoolbox.org/api/v2"
headers = {
"Authorization": f"Token {API_TOKEN}",
"Content-Type": "application/json"
}
# Get projects list
response = requests.get(f"{BASE_URL}/assets/", headers=headers)
projects = response.json()
# Display projects
for project in projects['results']:
print(f"Name: {project['name']}")
print(f"UID: {project['uid']}")
print(f"Asset Type: {project['asset_type']}")
print("---")
Response parameters
| Parameter | Description | Example |
|---|---|---|
uid |
Unique identifier for the form | aBcD1234EfGh5678 |
name |
Human-readable name of the form | Household Survey 2024 |
asset_type |
Type of asset (survey, template, etc.) | survey |
deployment__submission_count |
Number of submissions | 156 |
Access form data
Once you have identified your form UID, you can access the form's submissions data. This step shows you how to preview the data before exporting.
curl -H "Authorization: Token YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
"https://kf.kobotoolbox.org/api/v2/assets/FORM_UID/data/?format=json"
import requests
import json
# Configuration
API_TOKEN = "YOUR_API_TOKEN"
BASE_URL = "https://kf.kobotoolbox.org/api/v2"
FORM_UID = "YOUR_FORM_UID"
headers = {
"Authorization": f"Token {API_TOKEN}",
"Content-Type": "application/json"
}
# Get form data
params = {
"format": "json",
"limit": 10 # Limit to first 10 records for preview
}
response = requests.get(
f"{BASE_URL}/assets/{FORM_UID}/data/",
headers=headers,
params=params
)
data = response.json()
print(f"Total submissions: {data['count']}")
print(f"Showing first {len(data['results'])} records:")
print(json.dumps(data['results'][0], indent=2)) # Pretty print first record
limit parameter to preview a small number of records before downloading the entire dataset.
Export data
Create an export job to generate the Excel file. KoboCollect uses an asynchronous export system where you first create an export request, then download the file once it's ready.
Export parameters
| Parameter | Description | Options |
|---|---|---|
type |
Export format | xls, xlsx, csv, json, xml |
source |
Data source | submissions (default) |
lang |
Language for column headers | en, fr, es, etc. |
hierarchy_in_labels |
Include group names in headers | true, false |
group_sep |
Group separator character | /, -, _, etc. |
curl -X POST \
-H "Authorization: Token YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "xlsx",
"source": "submissions",
"lang": "en",
"hierarchy_in_labels": true,
"group_sep": "/"
}' \
https://kf.kobotoolbox.org/api/v2/assets/FORM_UID/exports/
import requests
import time
# Configuration
API_TOKEN = "YOUR_API_TOKEN"
BASE_URL = "https://kf.kobotoolbox.org/api/v2"
FORM_UID = "YOUR_FORM_UID"
headers = {
"Authorization": f"Token {API_TOKEN}",
"Content-Type": "application/json"
}
# Create export request
export_data = {
"type": "xlsx",
"source": "submissions",
"lang": "en",
"hierarchy_in_labels": True,
"group_sep": "/"
}
response = requests.post(
f"{BASE_URL}/assets/{FORM_UID}/exports/",
headers=headers,
json=export_data
)
export_info = response.json()
export_uid = export_info['uid']
print(f"Export created with UID: {export_uid}")
print(f"Status: {export_info['status']}")
# Check export status
def check_export_status(export_uid):
response = requests.get(
f"{BASE_URL}/assets/{FORM_UID}/exports/{export_uid}/",
headers=headers
)
return response.json()
# Wait for export to complete
while True:
status_info = check_export_status(export_uid)
print(f"Export status: {status_info['status']}")
if status_info['status'] == 'complete':
download_url = status_info['result']
print(f"Export complete! Download URL: {download_url}")
break
elif status_info['status'] == 'error':
print("Export failed!")
break
else:
print("Export in progress... waiting 5 seconds")
time.sleep(5)
Download Excel file
Once the export is complete, you can download the Excel file using the provided download URL.
curl -H "Authorization: Token YOUR_API_TOKEN" \
-o "exported_data.xlsx" \
"DOWNLOAD_URL"
import requests
from datetime import datetime
# Configuration
API_TOKEN = "YOUR_API_TOKEN"
DOWNLOAD_URL = "YOUR_DOWNLOAD_URL" # Obtained from export status
headers = {
"Authorization": f"Token {API_TOKEN}"
}
# Download the file
response = requests.get(DOWNLOAD_URL, headers=headers)
if response.status_code == 200:
# Generate filename with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"kobocollect_export_{timestamp}.xlsx"
# Save the file
with open(filename, 'wb') as f:
f.write(response.content)
print(f"File downloaded successfully: {filename}")
print(f"File size: {len(response.content)} bytes")
else:
print(f"Download failed with status code: {response.status_code}")
print(f"Error: {response.text}")
Automation scripts
For regular data exports, you can use these complete automation scripts that handle the entire process from authentication to file download.
Complete Python script
#!/usr/bin/env python3
"""
KoboCollect Data Export Script
Automatically exports data from KoboCollect to Excel format
"""
import requests
import time
import json
import os
from datetime import datetime
import argparse
class KoboExporter:
def __init__(self, api_token, base_url="https://kf.kobotoolbox.org/api/v2"):
self.api_token = api_token
self.base_url = base_url
self.headers = {
"Authorization": f"Token {api_token}",
"Content-Type": "application/json"
}
def get_projects(self):
"""Get list of all projects/forms"""
try:
response = requests.get(f"{self.base_url}/assets/", headers=self.headers)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error fetching projects: {e}")
return None
def list_projects(self):
"""Display available projects"""
projects = self.get_projects()
if not projects:
return
print("\nAvailable projects:")
print("-" * 80)
for i, project in enumerate(projects['results'], 1):
submission_count = project.get('deployment__submission_count', 0)
print(f"{i:2d}. {project['name']}")
print(f" UID: {project['uid']}")
print(f" Submissions: {submission_count}")
print(f" Type: {project['asset_type']}")
print()
def create_export(self, form_uid, export_type="xlsx", **kwargs):
"""Create an export job"""
export_data = {
"type": export_type,
"source": "submissions",
"lang": kwargs.get("lang", "en"),
"hierarchy_in_labels": kwargs.get("hierarchy_in_labels", True),
"group_sep": kwargs.get("group_sep", "/")
}
try:
response = requests.post(
f"{self.base_url}/assets/{form_uid}/exports/",
headers=self.headers,
json=export_data
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error creating export: {e}")
return None
def check_export_status(self, form_uid, export_uid):
"""Check the status of an export job"""
try:
response = requests.get(
f"{self.base_url}/assets/{form_uid}/exports/{export_uid}/",
headers=self.headers
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error checking export status: {e}")
return None
def download_file(self, download_url, filename=None):
"""Download the exported file"""
if not filename:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"kobocollect_export_{timestamp}.xlsx"
try:
response = requests.get(download_url, headers=self.headers)
response.raise_for_status()
with open(filename, 'wb') as f:
f.write(response.content)
return filename
except requests.exceptions.RequestException as e:
print(f"Error downloading file: {e}")
return None
def export_data(self, form_uid, output_dir="./downloads", **kwargs):
"""Complete export process"""
print(f"Starting export for form: {form_uid}")
# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)
# Create export
print("Creating export job...")
export_info = self.create_export(form_uid, **kwargs)
if not export_info:
return False
export_uid = export_info['uid']
print(f"Export job created: {export_uid}")
# Wait for completion
print("⏳ Waiting for export to complete...")
max_attempts = 60 # 5 minutes maximum wait time
attempt = 0
while attempt < max_attempts:
status_info = self.check_export_status(form_uid, export_uid)
if not status_info:
return False
status = status_info['status']
print(f" Status: {status}")
if status == 'complete':
download_url = status_info['result']
break
elif status == 'error':
print("Export failed!")
return False
else:
time.sleep(5)
attempt += 1
if attempt >= max_attempts:
print("⏰ Export timeout!")
return False
# Download file
print("Downloading file...")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = os.path.join(output_dir, f"export_{form_uid}_{timestamp}.xlsx")
downloaded_file = self.download_file(download_url, filename)
if downloaded_file:
file_size = os.path.getsize(downloaded_file)
print(f"Download complete: {downloaded_file}")
print(f"File size: {file_size:,} bytes")
return True
return False
def main():
parser = argparse.ArgumentParser(description="Export data from KoboCollect")
parser.add_argument("--token", required=True, help="API Token")
parser.add_argument("--form-uid", help="Form UID to export")
parser.add_argument("--list-projects", action="store_true", help="List available projects")
parser.add_argument("--output-dir", default="./downloads", help="Output directory")
parser.add_argument("--format", default="xlsx", choices=["xlsx", "xls", "csv", "json"], help="Export format")
args = parser.parse_args()
# Initialize exporter
exporter = KoboExporter(args.token)
if args.list_projects:
exporter.list_projects()
return
if not args.form_uid:
print("Please provide --form-uid or use --list-projects to see available forms")
return
# Export data
success = exporter.export_data(
args.form_uid,
output_dir=args.output_dir,
export_type=args.format
)
if success:
print("Export completed successfully!")
else:
print("Export failed!")
if __name__ == "__main__":
main()
# Usage examples:
# python kobo_export.py --token YOUR_TOKEN --list-projects
# python kobo_export.py --token YOUR_TOKEN --form-uid YOUR_FORM_UID
# python kobo_export.py --token YOUR_TOKEN --form-uid YOUR_FORM_UID --format csv
Batch export script
#!/usr/bin/env python3
"""
Batch Export Script for Multiple Forms
"""
import json
from kobo_export import KoboExporter
import time
def batch_export(api_token, form_uids, output_dir="./batch_downloads"):
"""Export multiple forms in batch"""
exporter = KoboExporter(api_token)
results = {
"successful": [],
"failed": []
}
print(f"Starting batch export of {len(form_uids)} forms...")
for i, form_uid in enumerate(form_uids, 1):
print(f"\nProcessing form {i}/{len(form_uids)}: {form_uid}")
success = exporter.export_data(form_uid, output_dir=output_dir)
if success:
results["successful"].append(form_uid)
else:
results["failed"].append(form_uid)
# Add delay between exports to avoid rate limiting
if i < len(form_uids):
print("⏳ Waiting 10 seconds before next export...")
time.sleep(10)
# Summary
print("\n" + "="*50)
print("BATCH EXPORT SUMMARY")
print("="*50)
print(f"Successful: {len(results['successful'])}")
print(f"Failed: {len(results['failed'])}")
if results['failed']:
print("\nFailed exports:")
for form_uid in results['failed']:
print(f" - {form_uid}")
return results
# Example usage
if __name__ == "__main__":
API_TOKEN = "YOUR_API_TOKEN"
FORM_UIDS = [
"form_uid_1",
"form_uid_2",
"form_uid_3"
]
batch_export(API_TOKEN, FORM_UIDS)
Troubleshooting
Common issues and their solutions when working with KoboCollect API v2:
Authentication issues
Solutions:
- Verify your API token is correct and active
- Check if the token has expired
- Ensure proper Authorization header format:
Token YOUR_API_TOKEN - Try regenerating your API token
Form access issues
Solutions:
- Verify the form UID is correct
- Check if you have permission to access the form
- Ensure the form is deployed and has submissions
- Contact the form owner for access permissions
Export issues
Solutions:
- Wait longer - large datasets can take 10-30 minutes
- Check server status on KoboToolbox status page
- Try exporting smaller date ranges
- Contact KoboToolbox support if persistent
Data quality issues
Solutions:
- Check export parameters (language, format settings)
- Verify data exists in the web interface
- Try different export formats (CSV vs Excel)
- Check if filters are applied in the export request
Network issues
# Robust error handling example
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_robust_session():
session = requests.Session()
# Define retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
)
# Mount adapter with retry strategy
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
# Usage
session = create_robust_session()
try:
response = session.get(url, headers=headers, timeout=30)
response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
Common error codes
| Error Code | Description | Solution |
|---|---|---|
| 400 | Bad Request | Check request parameters and JSON format |
| 401 | Unauthorized | Verify API token and permissions |
| 403 | Forbidden | Check form access permissions |
| 404 | Not Found | Verify form UID and endpoint URL |
| 429 | Rate Limited | Reduce request frequency, add delays |
| 500 | Server Error | Try again later, contact support if persistent |