Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix connection retrieval in DagProcessorManager for bundle initialization#57459

Merged
kaxil merged 1 commit intoapache:mainfrom
dheerajturaga:bugfix/connections-missing-in-dag-processor
Oct 30, 2025
Merged

Fix connection retrieval in DagProcessorManager for bundle initialization#57459
kaxil merged 1 commit intoapache:mainfrom
dheerajturaga:bugfix/connections-missing-in-dag-processor

Conversation

Copy link
Member

dheerajturaga commented Oct 29, 2025 *
edited by kaxil
Loading

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id isn't defined

Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The DagProcessorManager (parent process) needs database access for connection
retrieval during bundle initialization (e.g., GitDagBundle.__init__ - GitHook
needs git credentials). Child DagFileProcessorProcess instances run user code
and should remain isolated from direct database access.

This ensures correct secrets backend chains (when no external secrets backend is configured):

  • Manager (parent): EnvironmentVariablesBackend - MetastoreBackend (database access)
  • Parser (child): EnvironmentVariablesBackend

Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.

Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.

dheerajturaga requested review from XD-DENG and ashb as code owners October 29, 2025 00:10
boring-cyborg bot added the area:DAG-processing label Oct 29, 2025
Copy link
Member Author

dheerajturaga commented Oct 29, 2025

cc: @kaxil this is a critical bug and we should aim for 3.1.2

kaxil reviewed Oct 29, 2025
kaxil added this to the Airflow 3.1.2 milestone Oct 30, 2025
...ontext

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id `` isn't defined

Root cause: dag_processor was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The dag_processor is a server-side component that needs database access
for connection retrieval, similar to the scheduler and API server.

Fix: Set _AIRFLOW_PROCESS_CONTEXT=server in DagProcessorJobRunner._execute()
to enable MetastoreBackend, matching the pattern already used in:
- SchedulerJobRunner (scheduler_job_runner.py:1064)
- API FastAPI server (api_fastapi/main.py:24)

This ensures the dag_processor uses the correct secrets backend chain:
EnvironmentVariablesBackend - MetastoreBackend (database access)

Affects: DAG bundle processing with GitHook and any other hooks that
rely on database-stored connections during DAG parsing.

Signed-off-by: Kaxil Naik
kaxil force-pushed the bugfix/connections-missing-in-dag-processor branch from 218f394 to 85dec24 Compare October 30, 2025 21:46
kaxil requested review from ephraimbuddy and jedcunningham as code owners October 30, 2025 21:46
kaxil changed the title Fix GitHook connection retrieval in dag_processor by setting server context Fix connection retrieval in DagProcessorManager for bundle initialization Oct 30, 2025
kaxil approved these changes Oct 30, 2025
kaxil merged commit ae2a4fd into apache:main Oct 30, 2025
62 checks passed
kaxil pushed a commit that referenced this pull request Oct 30, 2025
...zation (#57459)

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id `` isn't defined

Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The `DagProcessorManager` (parent process) needs database access for connection
retrieval during bundle initialization (e.g., `GitDagBundle.__init__` - `GitHook`
needs git credentials). Child `DagFileProcessorProcess` instances run user code
and should remain isolated from direct database access.

This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent): `EnvironmentVariablesBackend` - `MetastoreBackend` (database access)
- Parser (child): `EnvironmentVariablesBackend`

Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.

Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.

(cherry picked from commit ae2a4fd)
dheerajturaga deleted the bugfix/connections-missing-in-dag-processor branch October 30, 2025 23:19
Copy link
Member Author

dheerajturaga commented Oct 30, 2025

Thanks @kaxil !

ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 10, 2025
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
...zation (apache#57459)

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id `` isn't defined

Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The `DagProcessorManager` (parent process) needs database access for connection
retrieval during bundle initialization (e.g., `GitDagBundle.__init__` - `GitHook`
needs git credentials). Child `DagFileProcessorProcess` instances run user code
and should remain isolated from direct database access.

This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent): `EnvironmentVariablesBackend` - `MetastoreBackend` (database access)
- Parser (child): `EnvironmentVariablesBackend`

Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.

Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

kaxil kaxil approved these changes

ashb Awaiting requested review from ashb

XD-DENG Awaiting requested review from XD-DENG

jedcunningham Awaiting requested review from jedcunningham jedcunningham is a code owner

ephraimbuddy Awaiting requested review from ephraimbuddy ephraimbuddy is a code owner

Assignees

No one assigned

Labels

Projects

None yet

Milestone

Airflow 3.1.2

Development

Successfully merging this pull request may close these issues.

3 participants