Back to blog

CI/CD for Localization: Integrating Translations into Development Pipelines

Localization is too often a separate process from development. Learn how to build CI/CD pipelines that treat translations as first-class citizens alongside code.

January 12, 2024
7 min read
#localization#ci-cd#devops#automation#engineering#tools

The typical localization workflow looks like this: Code is complete. Engineering hands off strings to a localization vendor. Two weeks later, translations come back. Someone manually drops them into the codebase. A build is triggered. Maybe it works. Maybe there are encoding issues. Maybe placeholders don't match.

This is medieval.

Modern development uses CI/CD pipelines to automate testing, builds, and deployments. Those same principles apply to localization—but most teams haven't made the connection.

The Problem: Localization as a Silo

When localization happens outside the development pipeline, critical issues emerge:

  • Integration breaks — Developers don't know strings have changed
  • Format mismatches — Placeholders in translations don't match source code
  • Encoding problems — Files get corrupted during manual transfers
  • Version conflicts — Which translation version matches this code version?
  • No validation — Broken translations aren't caught until production
  • Delayed releases — Waiting for translations blocks the entire release cycle

Meanwhile, your developers have built sophisticated CI/CD systems for everything else. Your code is tested. Your containers are built. Your infrastructure is automated. But strings? Still a manual copy-paste operation.

What CI/CD for Localization Looks Like

A proper CI/CD pipeline for localization integrates four stages:

Stage 1: Source String Extraction (Automated)

When code is committed, the pipeline automatically:

# GitHub Actions workflow example
name: Extract Localization Strings

on: [push, pull_request]

jobs:
  extract-strings:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Extract strings from source code
        run: |
          python scripts/extract_strings.py \
            --source src/ \
            --output localization/source.json \
            --format json \
            --validate
      
      - name: Compare with existing strings
        run: |
          python scripts/check_string_changes.py \
            --old localization/source.json.previous \
            --new localization/source.json \
            --report changes.txt
      
      - name: Upload changes for translation
        run: |
          if [ -s changes.txt ]; then
            curl -X POST https://api.localizationvendor.com/push \
              -H "Authorization: Bearer ${{ secrets.VENDOR_API_KEY }}" \
              -F "file=@localization/source.json"
          fi

What happens:

  • Every code commit triggers string extraction
  • New or modified strings are identified automatically
  • Changes are sent directly to your translation vendor's API
  • Translation work can begin immediately — not weeks later

Stage 2: Translation Management (Vendor Integration)

Rather than manual file exchanges, integrate with your translation vendor's API:

# Python script for pulling translations
import requests
import json
from datetime import datetime

def fetch_translations(vendor_api_key):
    """Pull completed translations from vendor"""
    
    headers = {"Authorization": f"Bearer {vendor_api_key}"}
    
    # Get translation status
    status = requests.get(
        "https://api.vendor.com/projects/localization-portal/status",
        headers=headers
    ).json()
    
    # Fetch completed languages only
    translations = {}
    for language in status['languages']:
        if language['completion'] >= 100:
            response = requests.get(
                f"https://api.vendor.com/projects/localization-portal/export/{language['code']}",
                headers=headers
            )
            translations[language['code']] = response.json()
    
    # Save with timestamp
    with open(f"localization/translations_{datetime.now().isoformat()}.json", 'w') as f:
        json.dump(translations, f, ensure_ascii=False, indent=2)
    
    return translations

if __name__ == "__main__":
    fetch_translations(api_key="your-vendor-key")

What happens:

  • CI/CD pipeline periodically checks for completed translations
  • When translations reach certain completion threshold (e.g., 80%), they're pulled automatically
  • Translations are stored with version control (every pull is tracked)
  • No manual intervention needed

Stage 3: Validation (Automated Testing)

This is critical. Before translations ever make it into a build, they're validated:

# Comprehensive translation validation
import json, re
from pathlib import Path

def validate_translations(translation_file, source_file):
    """Validate translations against source and rules"""
    
    with open(translation_file) as f:
        translations = json.load(f)
    
    with open(source_file) as f:
        source = json.load(f)
    
    errors = []
    warnings = []
    
    for key, source_text in source.items():
        # Check if translation exists
        if key not in translations:
            errors.append(f"Missing translation: {key}")
            continue
        
        translation = translations[key]
        
        # Placeholder validation
        source_placeholders = re.findall(r'\{\{(\w+)\}\}', source_text)
        trans_placeholders = re.findall(r'\{\{(\w+)\}\}', translation)
        
        if set(source_placeholders) != set(trans_placeholders):
            errors.append(
                f"Placeholder mismatch in {key}. "
                f"Expected: {source_placeholders}, Got: {trans_placeholders}"
            )
        
        # Length check (catch obvious mistakes)
        if len(translation) > len(source_text) * 1.5:
            warnings.append(f"Translation unusually long: {key}")
        
        # Encoding check
        try:
            translation.encode('utf-8')
        except UnicodeEncodeError:
            errors.append(f"Encoding error in translation: {key}")
        
        # Terminology check
        if check_terminology_consistency(translation, key):
            warnings.append(f"Possible terminology issue: {key}")
    
    return errors, warnings

def check_terminology_consistency(translation, key):
    """Validate against approved terminology"""
    # Connect to terminology database
    # Return True if issues found
    pass

What happens:

  • Every translation is tested for format, encoding, placeholders, length, and terminology
  • Translations with critical errors are rejected and don't progress further
  • Warnings are flagged for human review
  • Tests prevent broken translations from entering production

Stage 4: Build and Deployment (With Translations)

Only after validation passes, translations are built into the application:

# Build stage that includes translations
name: Build with Localization

on: 
  workflow_run:
    workflows: ["Validate Translations"]
    types: [completed]

jobs:
  build:
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Download validated translations
        uses: actions/download-artifact@v3
        with:
          name: validated-translations
          path: src/localization/
      
      - name: Build application
        run: npm run build:all
      
      - name: Test localization
        run: npm run test:localization
      
      - name: Deploy
        run: |
          for language in $(ls src/localization/); do
            npm run deploy:version -- --language $language
          done

What happens:

  • Build only proceeds if all translations passed validation
  • Translations are bundled with the code version they match
  • Each language gets a separate deployment (or deployment variant)
  • Rollback is version-aware — you can always go back to a translation that worked

Benefits of CI/CD for Localization

BenefitImpact
Automated extractionStrings are ready for translation the moment code changes
Continuous translationTranslations start immediately, not after release is locked
Validation before productionBroken translations are caught before they reach users
Version matchingYou know exactly which translations go with which code
Audit trailEvery translation change is logged and traceable
Reduced release delaysLocalization doesn't block releases—it runs in parallel
Fewer production issuesValidation catches encoding, placeholder, and format issues early
Scaled localizationThe same pipeline works whether you support 2 languages or 50

Real-World Implementation Example

Here's a minimal setup for a React/Next.js application:

# Project structure
localization/
  source.json          # Master source strings
  translations/
    es.json           # Spanish
    fr.json           # French
    de.json           # German
  scripts/
    extract.py        # Extract strings from code
    validate.py       # Validate translations
    vendor-sync.py    # Sync with translation vendor API
  ci/
    extract.yml       # GitHub Actions workflow
    validate.yml
    build.yml

Your extraction script reads source code:

# Extract from React components
def extract_from_react(directory):
    strings = {}
    
    for filepath in Path(directory).rglob("*.tsx"):
        content = filepath.read_text()
        
        # Find t() function calls
        matches = re.findall(r"t\(['\"]([^'\"]+)['\"]\)", content)
        for match in matches:
            strings[match] = match
    
    return strings

Your validation script ensures quality:

# Validate before build
errors = validate_translations("localization/es.json", "localization/source.json")

if errors:
    print(f"Validation failed: {len(errors)} errors")
    exit(1)

print("All translations valid")
exit(0)

Your GitHub Actions workflow ties it together:

on: [push, pull_request]

jobs:
  extract-and-validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: python localization/scripts/extract.py
      - run: python localization/scripts/validate.py
      - run: npm run build

Common Objections (And Solutions)

"Our translations aren't done until release day" → CI/CD enables continuous translation. Translations can start immediately when code changes are committed, not when releases are locked.

"We can't validate translations programmatically" → Start simple: format, encoding, placeholder matching. Add terminology checking. You don't need perfect validation to catch 80% of issues.

"Our vendor doesn't have an API" → Set up SFTP/file polling as a fallback. Or use a translation management platform (TMS) that does have APIs.

"We only localize for specific markets at specific times" → CI/CD pipelines are flexible. You can filter by market, language, or region. Only validated translations for your target markets move forward.

The Shift in Mindset

Traditional localization: Strings → Localization Vendor → Manual Integration → Pray it works

CI/CD localization: Code Changes → Automatic Extraction → Continuous Translation → Validation → Automated Build

The second approach treats localization as a technical discipline, not a manual process. It's the same shift that happened in development when teams moved from nightly builds and manual testing to CI/CD and automated testing.

Your development pipeline should be continuously building, testing, and validating. Your localization pipeline should be doing the same — validating translations, catching errors, and ensuring quality at every stage.

If you're still manually managing translation files, you're leaving massive gains on the table: faster releases, fewer bugs, better quality, and a team that isn't doing repetitive manual work.

The tools exist. The techniques are proven. The question is: are you ready to apply development's best practices to localization?