Contributing to frame-check¶
Welcome! This section covers how to extend and contribute to frame-check.
Architecture Overview¶
The frame-check-core package is built around these key components:
frame-check-core/
├── checker.py # Main AST visitor (entry point)
├── tracker.py # Column dependency tracking
├── refs.py # Type guards and ColumnRef dataclass
├── handlers/ # Operation handlers (what columns are CREATED/MODIFIED)
│ ├── models.py # PD/DF registries for operation handlers
│ ├── pandas.py # pd.* function handlers
│ └── dataframe.py # df.* method handlers
├── extractors/ # Column extractors (what columns are READ)
│ ├── registry.py # Extractor registry
│ ├── column.py # df['col'] patterns
│ └── binop.py # df['A'] + df['B'] patterns
├── diagnostic/ # Error message generation
└── config/ # Configuration management
Handlers vs Extractors¶
These two modules serve complementary purposes:
| Module | Purpose | What they handle | Example |
|---|---|---|---|
| Handlers | Track column state changes | Pandas API calls that create/modify/delete columns | pd.DataFrame({'A': [1]}) creates 'A' |
| Extractors | Identify column references | Operations between columns and expressions | df['A'] + df['B'] reads 'A' and 'B' |
Handlers (Pandas core methods) answer: "What columns exist after this operation?"
- Creating DataFrames: pd.DataFrame(), pd.read_csv()
- Adding columns: df.assign(B=1), df.insert()
- Removing columns: df.drop('A')
- Resizing/reshaping frames: df.melt(), df.pivot()
Extractors (Operations between columns) answer: "What columns are being accessed here?"
- Column access: df['A'], df[['A', 'B']]
- Binary operations: df['A'] + df['B'], df['A'] * 2
- Comparisons: df['A'] > df['B']
- Assignments: df['C'] = df['A'] + df['B'] (RHS is extracted)
How they work together:
The checker uses handlers to build a model of what columns should exist, then uses extractors to validate that accessed columns are actually present. For example, in df['C'] = df['A'] + df['B'], the extractor identifies that columns 'A' and 'B' are being read, while the handler knows that column 'C' is being created.
Extension Points¶
frame-check is designed to be extensible. There are three main ways to add features:
| Extension Type | Registration | Use Case | Difficulty |
|---|---|---|---|
| Pandas Function | @PD.register() decorator |
Add support for pd.read_excel(), pd.concat(), etc. |
⭐ Easy |
| DataFrame Method | @DF.register() decorator |
Add support for df.drop(), df.rename(), etc. |
⭐ Easy |
| Extractor | Add to EXTRACTORS list |
Handle new column reference patterns | ⭐ Easy |
Registry Patterns¶
Pandas Functions and DataFrame Methods¶
These use decorator-based registration:
# Pandas functions
@PD.register("read_excel")
def pd_read_excel(args, keywords) -> PDFuncResult:
...
# DataFrame methods
@DF.register("drop")
def df_drop(columns, args, keywords) -> DFFuncResult:
...
This means: - No manual registration - decorators handle it automatically - Automatic discovery - just import the module - Easy testing - registries can be cleared/modified in tests
Extractors¶
Extractors use explicit list-based registration in registry.py:
# In registry.py
EXTRACTORS: list[ExtractorFunc] = [
extract_column_ref, # df['col'] - most common
extract_column_refs_from_binop, # df['A'] + df['B']
extract_method_call, # Add your extractor here
]
This means: - All in one place - see all extractors and their order - Clear ordering - earlier in the list = tried first - No sorting overhead - list is already in order - Simple to add - just import and add to the list
Quick Start¶
-
Clone the repository
-
Set up development environment
-
Run tests
-
Make your changes following the guides above
-
Add tests for your new feature (see Test Structure below)
-
Submit a PR 🎉
Test Structure¶
Tests are organized to mirror the source structure:
frame-check-core/tests/
├── conftest.py # Pytest configuration and fixtures
├── test_checker.py # Core checker tests
├── config/ # Tests for config module
│ ├── test_config.py
│ └── test_paths.py
├── diagnostic/ # Tests for diagnostic module
│ ├── test_diagnostics.py
│ └── test_output.py
├── extractors/ # Tests for extractors module
│ ├── test_binop.py
│ ├── test_column.py
│ └── test_registry.py
├── features/ # Feature/API completeness tests
│ ├── test_column_assignment_methods.py # CAM-* features
│ └── test_dataframe_creation_methods.py # DCMS-* features
└── util/ # Tests for utility module
└── test_similarity.py
Where to Add Tests¶
| Test Type | Location | Example |
|---|---|---|
| Core checker functionality | tests/test_checker.py |
Import detection, DataFrame tracking |
| Extractor unit tests | tests/extractors/test_*.py |
AST pattern matching |
| Config tests | tests/config/test_*.py |
Config loading, path handling |
| Diagnostic tests | tests/diagnostic/test_*.py |
Error messages, formatting |
| Feature completeness | tests/features/test_*.py |
Tests with @pytest.mark.support |
Feature Tests¶
Tests in tests/features/ track API completeness and are organized by categories from scripts/features.toml:
test_dataframe_creation_methods.py- DCMS-* (DataFrame creation)test_column_assignment_methods.py- CAM-* (column assignment)
Use the @pytest.mark.support(code="#DCMS-1") marker to link tests to features.
Design Principles¶
When contributing, keep these principles in mind:
- Fail gracefully: Return
Nonewhen a pattern isn't recognized rather than crashing - Be conservative: Only report errors when you're confident something is wrong
- Compose existing tools: Reuse extractors and utilities where possible
- Use the registries: Don't hardcode - use
@PD.register(),@DF.register(), or add to theEXTRACTORSlist - Test thoroughly: Each feature should have corresponding tests
- Document clearly: Add docstrings and update relevant documentation
What to Contribute¶
High Impact, Easy to Add¶
- Pandas functions:
pd.read_excel,pd.read_json,pd.read_parquet,pd.concat - DataFrame methods:
df.drop,df.rename,df.copy,df.reset_index - Extractors: Method calls (
df['A'].fillna(df['B'])), comparisons (df['A'] > df['B'])
Medium Effort¶
- Method chaining support (
df.assign(A=1).drop('B')) from pandas import DataFrameimports- Groupby result column inference
Advanced¶
- Control flow analysis (if/else branches)
- Function boundary analysis (parameters and returns)
- Polars support