-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Fix connection retrieval in DagProcessorManager for bundle initialization#57459
Fix connection retrieval in DagProcessorManager for bundle initialization#57459kaxil merged 1 commit intoapache:mainfrom
DagProcessorManager for bundle initialization#57459Conversation
The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id isn't defined
Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.
The DagProcessorManager (parent process) needs database access for connection
retrieval during bundle initialization (e.g., GitDagBundle.__init__ - GitHook
needs git credentials). Child DagFileProcessorProcess instances run user code
and should remain isolated from direct database access.
This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent):
EnvironmentVariablesBackend-MetastoreBackend(database access) - Parser (child):
EnvironmentVariablesBackend
Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.
Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.
|
cc: @kaxil this is a critical bug and we should aim for 3.1.2 |
The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id `
Root cause: dag_processor was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.
The dag_processor is a server-side component that needs database access
for connection retrieval, similar to the scheduler and API server.
Fix: Set _AIRFLOW_PROCESS_CONTEXT=server in DagProcessorJobRunner._execute()
to enable MetastoreBackend, matching the pattern already used in:
- SchedulerJobRunner (scheduler_job_runner.py:1064)
- API FastAPI server (api_fastapi/main.py:24)
This ensures the dag_processor uses the correct secrets backend chain:
EnvironmentVariablesBackend - MetastoreBackend (database access)
Affects: DAG bundle processing with GitHook and any other hooks that
rely on database-stored connections during DAG parsing.
Signed-off-by: Kaxil Naik
218f394 to
85dec24
Compare
DagProcessorManager for bundle initialization
The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id `
Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.
The `DagProcessorManager` (parent process) needs database access for connection
retrieval during bundle initialization (e.g., `GitDagBundle.__init__` - `GitHook`
needs git credentials). Child `DagFileProcessorProcess` instances run user code
and should remain isolated from direct database access.
This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent): `EnvironmentVariablesBackend` - `MetastoreBackend` (database access)
- Parser (child): `EnvironmentVariablesBackend`
Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.
Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.
(cherry picked from commit ae2a4fd)
|
Thanks @kaxil ! |
The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id `
Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.
The `DagProcessorManager` (parent process) needs database access for connection
retrieval during bundle initialization (e.g., `GitDagBundle.__init__` - `GitHook`
needs git credentials). Child `DagFileProcessorProcess` instances run user code
and should remain isolated from direct database access.
This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent): `EnvironmentVariablesBackend` - `MetastoreBackend` (database access)
- Parser (child): `EnvironmentVariablesBackend`
Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.
Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.