VOT Data Analysis User Guide

1. 📊 Overview

This guide will help you understand and use the VOT (Victim of Trafficking) data analysis script. The script is designed to:

Clean and standardize trafficking victim screening data
Analyze missing values to identify data quality issues
Transform data from numeric codes to human-readable labels
Export clean datasets for further analysis

This script processes sensitive data about trafficking victims. Always ensure proper data protection and privacy protocols are followed.

2. 🔧 Prerequisites

Required Software

Python 3.7+ - Programming language
Jupyter Notebook - Interactive development environment
Excel - For viewing output files

Required Python Libraries

pip install pandas numpy seaborn matplotlib plotly openpyxl xlsxwriter scikit-learn

Data Requirements

VOT screening data in Excel format (.xlsx)
Data should contain the standard VOT survey fields
Proper file permissions to read input and write output files

3. ⚙️ Initial Setup

Create Folder Structure

The script expects a specific folder structure:

_VoT/
├── data/          # Input data files
├── config/        # Configuration files
├── output/        # Processed data files
└── html/          # Interactive visualizations
                

Configure File Paths

Set up your working directories:

# Define working folders
data_dir = '_VoT/data'       # Input folder for raw data
cfg_dir  = '_VoT/config'     # Configuration files
out_dir  = '_VoT/output'     # Output folder for processed data
html_dir = '_VoT/html'       # Interactive visualizations

# Set country and month for your analysis
which_country = 'MMR'        # Country code
which_month = 'June'         # Month of data collection
                

Import Required Libraries

import pandas as pd
import numpy as np
import seaborn as sbn
import matplotlib.pyplot as plt
import os
from pathlib import Path
                

4. 📥 Data Loading

Loading Your Dataset

The script loads data from an Excel file and performs initial preprocessing:

# Load the Excel file
in_df = pd.read_excel(infile)

# Convert column names to lowercase for consistency
in_df.columns = [col.lower() for col in in_df.columns]

# Display basic information about the dataset
print('Number of data points = %d' % in_df.shape[0])
        

Make sure your Excel file is named correctly and placed in the data folder. The file should be named: Raw_VoT_[Month]_screening_Analysis.xlsx

What This Step Does

Reads Excel data into a pandas DataFrame
Standardizes column names by converting to lowercase
Reports dataset size for verification

5. 🧹 Data Cleaning

Column Renaming

The script uses a comprehensive dictionary to rename columns to more manageable names:

# Example of column renaming dictionary
dictionary = {
    'survery start time': 'survey_start_time',
    'full name of the interviewer/organisation': 'interviewer_organization',
    'type of referring entity': 'referring_entity_type',
    # ... many more mappings
}

# Apply the renaming
in_df.rename(columns=dictionary, inplace=True)
        

Removing Unnecessary Columns

System metadata and administrative columns are removed:

# Drop system columns that aren't needed for analysis
columns_to_drop = [
    'survey_start_time', 'survey_end_time', 'record_id', 
    'record_uuid', 'submission_time', 'validation_status',
    'system_notes', 'record_status', 'submitted_by',
    'form_version', 'record_tags', 'record_index'
]

vot_df = in_df.drop(columns=columns_to_drop, errors='ignore')
        

The errors='ignore' parameter ensures the script continues even if some columns don't exist in your dataset.

6. 🔍 Missing Values Analysis

Why Analyze Missing Values?

Missing values in VOT data can indicate:

Sensitive information victims are reluctant to share
Data collection gaps that need attention
Fields that may not be applicable to all cases

Creating Missing Values Report

def missing_zero_values_table(df):
    # Calculate missing values and percentages
    zero_val = (df == 0.00).astype(int).sum(axis=0)
    mis_val = df.isnull().sum()
    mis_val_percent = 100 * df.isnull().sum() / len(df)
    
    # Create comprehensive table
    mz_table = pd.concat([zero_val, mis_val, mis_val_percent], axis=1)
    mz_table = mz_table.rename(columns={
        0: 'Zero Values', 
        1: 'Missing Values', 
        2: '% of Total Values'
    })
    
    return mz_table

# Generate the missing values report
missing_table = missing_zero_values_table(results)
        

Understanding the Output

Column	Description
Zero Values	Count of fields with value 0
Missing Values	Count of completely empty fields
% of Total Values	Percentage of missing data
Data Type	Type of data (text, number, etc.)

7. 🗺️ Data Mapping

Converting Codes to Labels

VOT survey data often uses numeric codes (0/1) that need to be converted to meaningful labels (No/Yes):

def apply_individual_mappings_notnull(df):
    # List of columns that need Yes/No mapping
    yes_no_columns = [
        'referring_entity_ngo',
        'referring_entity_embassy',
        'physical_abuse',
        'psychological_abuse',
        # ... many more fields
    ]
    
    # Mapping dictionary
    yes_no_mapping = {
        '1': 'Yes', '0': 'No',
        '1.0': 'Yes', '0.0': 'No',
        'yes': 'Yes', 'no': 'No'
    }
    
    # Apply mappings while preserving null values
    for column in yes_no_columns:
        if column in df.columns:
            mask = df[column].notna()
            df.loc[mask, column] = df.loc[mask, column].map(yes_no_mapping)
    
    return df

# Apply the mappings
df_mapped = apply_individual_mappings_notnull(in_df)
        

What Gets Mapped

Consent fields - Data sharing permissions
Referring entity types - How victim was referred
Abuse indicators - Types of exploitation experienced
Activity fields - What work victims were forced to do
Support fields - Willingness to accept assistance

The mapping process preserves missing/null values, ensuring data integrity while making the dataset more readable.

8. 💾 Exporting Results

Multiple Output Files

The script generates several output files for different purposes:

# Export missing values analysis
with pd.ExcelWriter("missing_values.xlsx", engine='openpyxl') as writer:  
    missing_table.to_excel(writer, sheet_name='missing_values', index=True)
    missing_data.to_excel(writer, sheet_name='mv_percentage', index=True)

# Export clean dataset with all data
with pd.ExcelWriter(outfile, engine='openpyxl') as dataset:  
    df_mapped.to_excel(dataset, sheet_name='vot_cleaned_data', 
                      header=True, index=True, index_label='index')

# Export dataset with mapped values (nulls preserved)
with pd.ExcelWriter(outfileNotNull, engine='openpyxl') as dataset1:  
    df.to_excel(dataset1, sheet_name='vot_cleaned_data', 
               header=True, index=True, index_label='index')
        

Output Files Explained

File	Purpose	Content
missing_values.xlsx	Data quality assessment	Missing values analysis and percentages
Cleaned_VoT_[Month]_all.xlsx	Complete clean dataset	All data with readable labels
Cleaned_VoT_[Month]_NotNull_Mapped.xlsx	Mapped dataset	Data with Yes/No labels, nulls preserved

9. 🔧 Troubleshooting

Common Issues and Solutions

File Not Found Error:
• Check that your Excel file is in the correct folder
• Verify the filename matches the expected pattern
• Ensure you have read permissions for the file

Excel Export Errors:
• Install openpyxl: pip install openpyxl
• Close any open Excel files with the same name
• Check write permissions in the output folder

Column Not Found Errors:
• Your dataset may have different column names
• Check the dictionary mapping matches your data
• Some columns may not exist in all datasets

Debugging Tips

# Check your dataset structure
print("Dataset shape:", in_df.shape)
print("Column names:", in_df.columns.tolist())

# Check for missing columns
expected_columns = ['consent_data_sharing', 'victim_of_trafficking']
missing_cols = [col for col in expected_columns if col not in in_df.columns]
print("Missing columns:", missing_cols)

# Check data types
print("Data types:")
print(in_df.dtypes)
        

10. 📚 Additional Resources

🔗 Helpful Links

Python & Pandas Learning:

Pandas User Guide - Official documentation
Kaggle Pandas Course - Free interactive course
Jupyter Notebook Documentation - Getting started with Jupyter

Data Analysis Resources:

Python Installation - Download Python
OpenPyXL Documentation - Excel file handling
Matplotlib Tutorials - Data visualization

Data Privacy & Ethics:

UNODC Ethics Guidelines - Data ethics in trafficking research
GDPR Compliance - Data protection regulations

Best Practices

Data Security: Always work with anonymized data when possible
Version Control: Keep backup copies of your original data
Documentation: Document any changes or assumptions you make
Validation: Cross-check results with domain experts
Regular Updates: Keep your Python libraries updated

Remember: This script is a tool to help process and understand VOT data. Always validate results with subject matter experts and follow your organization's data handling protocols.

📞 Support

If you encounter issues not covered in this guide:

Review the error messages carefully - they often contain helpful information
Check that all required libraries are installed
Verify your data format matches the expected structure
Consider reaching out to your technical support team

This guide covers the basic usage of the VOT data analysis script. As you become more comfortable with the tool, you can explore advanced features and customizations.