Architecture Documentation#
Overview#
mypy-boto3-builder is a code generator that creates type annotations for AWS SDK libraries (boto3, aioboto3, aiobotocore). It introspects AWS service definitions and generates Python type stubs, enabling static type checking for AWS SDK usage.
High-Level Architecture#
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ AWS Service │ │ Parser │ │ Generator │
│ Definitions │───▶│ (Extract) │───▶│ (Transform) │
│ (botocore) │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Generated │ │ Templates │ │ Structures │
│ Type Stubs │◀───│ (Jinja2) │◀───│ (Data Model) │
│ (.pyi files) │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Core Components#
1. Parsers (/parsers/)#
Purpose: Extract and interpret AWS service definitions from botocore
Key Classes:
- ShapeParser - Converts botocore shapes to internal type representations
- ClientParser - Extracts client methods and their signatures
- ResourceParser - Handles boto3 resource definitions
- ServicePackageParser - Orchestrates parsing for a complete service
Flow:
# Example parsing flow
service_model = boto3.Session().get_service_model('s3')
parser = ServicePackageParser(service_model)
service_package = parser.parse() # Returns ServicePackage structure
2. Structures (/structures/)#
Purpose: Internal data model representing AWS services and their components
Key Classes:
- ServicePackage - Complete service definition (client, resources, paginators, etc.)
- Client - Service client with methods and attributes
- Method - Individual service operation with arguments and return types
- TypeAnnotation - Type system for generating Python type hints
Design Pattern: Data Transfer Objects (DTOs) with behavior
3. Generators (/generators/)#
Purpose: Convert internal structures to target library formats
Key Classes:
- BaseGenerator - Common generation logic and template management
- TypesBoto3Generator - types-boto3 packages generation
- Boto3Generator - obsolete boto3-stubs packages generation
- AioBoto3Generator - types-aioboto3 packages generation
- AioBotocoreGenerator - types-aioboto3 packages generation
Strategy Pattern: Each generator implements the same interface but produces different outputs
4. Templates (/templates/)#
Purpose: Jinja2 templates for generating Python code
Organization:
templates/
├── common/ # Shared templates
│ ├── service/ # Individual service templates
│ └── wrapper/ # Multi-service package templates
├── boto3/ # boto3-specific templates
├── types-boto3/ # types-boto3 stub templates
└── ...
5. Type Annotations (/type_annotations/)#
Purpose: Abstract type system for generating Python type hints
Key Classes:
- TypeAnnotation - Base type representation
- TypeSubscript - Generic types (List[str], Dict[str, Any])
- TypeUnion - Union types (str | int)
- TypeTypedDict - Structured dictionary types
Data Flow#
1. Service Discovery#
# Discover available AWS services
service_names = get_available_service_names()
# Returns: ['s3', 'ec2', 'lambda', ...]
2. Parsing Phase#
# For each service:
service_model = session.get_service_model(service_name)
parser = ServicePackageParser(service_model, service_name)
service_package = parser.parse()
3. Generation Phase#
# Transform to target format:
generator = Boto3Generator()
package = generator.generate_service_package(service_package)
4. Template Rendering#
# Render to Python code:
writer = PackageWriter(output_path)
writer.write_service_package(package, templates_path)
Configuration & Extensibility#
Adding New AWS Services#
- Service is automatically discovered via botocore
- Parser extracts service definition
- Generator applies library-specific transformations
- Templates render the final Python code
Adding New Target Libraries#
- Create new generator inheriting from
BaseGenerator - Implement library-specific logic
- Create corresponding Jinja2 templates
- Register in CLI options
Customizing Type Generation#
Type mappings are configured in /type_maps/:
- shape_type_map.py - Maps AWS shapes to Python types
- method_type_map.py - Override method signatures
- literal_type_map.py - Define literal value types
Performance Considerations#
Template Caching#
Templates are cached per process to avoid re-parsing Jinja2:
@lru_cache(maxsize=None)
def get_template(template_path: Path) -> Template:
return jinja_env.get_template(str(template_path))
Parallel Processing#
Services can be processed independently:
# Current: Sequential
for service in services:
generate_product(product)
# Potential: Parallel
with ThreadPoolExecutor() as executor:
futures = [executor.submit(generate_product, s) for s in products]
Memory Management#
Large service definitions are processed incrementally to manage memory usage.
Testing Strategy#
Unit Tests#
- Parsers: Test shape conversion and type mapping
- Generators: Test output format correctness
- Structures: Test data model consistency
Integration Tests#
- End-to-end: Generate complete packages and verify compilability
- Type checking: Ensure generated stubs pass mypy/pyright validation
- Runtime: Test generated code works with actual AWS services
Test Data#
- Mock AWS service definitions for consistent testing
- Real service samples for integration validation
Error Handling#
Parsing Errors#
- Graceful degradation when service definitions are incomplete
- Detailed error messages with service/shape context
- Fallback to basic types when complex shapes fail
Generation Errors#
- Template rendering failures with clear error context
- Type resolution errors with suggested fixes
- Validation of generated code before output
CLI Architecture#
Command Structure#
mypy-boto3-builder
├── --product <product_name> --services <AWS SDK services> # Generate type stubs
└── chat # Interactive configuration
Interactive Mode#
Uses questionary for complex configuration:
- Service selection
- Output format choice
- Custom type mappings
Future Considerations#
Potential Improvements#
- Async processing for faster generation
- Incremental builds to avoid regenerating unchanged services
- Caching layer for parsed service definitions
- Botocore support for usage without
boto3
Scalability#
- Current architecture handles 400+ AWS services
- Memory usage scales linearly with service count
- Generation time is dominated by template rendering
Contributing Guidelines#
Adding Features#
- Start with unit tests for new functionality
- Update relevant parsers/generators/structures
- Add templates if new output format is needed
- Update integration tests
- Document changes in architecture docs
Code Style#
- Follow existing patterns and naming conventions
- Use type hints for all public APIs
- Add docstrings for complex logic
- Keep functions focused and testable
This architecture supports generating type annotations for 400+ AWS services across multiple Python AWS SDK libraries, enabling static type checking for millions of lines of AWS SDK usage.