Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add tracking of triggering user to Dag runs#51738

Merged
potiuk merged 8 commits intoapache:mainfrom
jscheffl:feature/track-user-who-triggered-dag-runs
Jul 1, 2025
Merged

Add tracking of triggering user to Dag runs#51738
potiuk merged 8 commits intoapache:mainfrom
jscheffl:feature/track-user-who-triggered-dag-runs

Conversation

Copy link
Contributor

jscheffl commented Jun 14, 2025 *
edited
Loading

This PR adds a feature to Airflow that trigering user is tracked on the Dag run level. So far if you have usages where users trigger manual you needed to find-out which user it was by looking into audit log.

The user is tracked with unix user name when using CLI or airflowctl, in web UI or REST cases the authenticated user is used.
In case a backfill is started via UI the user who started the backfill propagates the runs of Dags.

FYI @clellmann @wolfdn @AutomationDev85

rawwar and r-richmond reacted with thumbs up emoji r-richmond and dheerajturaga reacted with hooray emoji
Copy link
Contributor Author

jscheffl commented Jun 14, 2025

Note: Marking as DRAFT as I assume some pytests fail and need to be adjusted.

jscheffl marked this pull request as draft June 14, 2025 18:40
jscheffl modified the milestones: Airflow 3.1+, Airflow 3.1.0 Jun 14, 2025
jscheffl force-pushed the feature/track-user-who-triggered-dag-runs branch 2 times, most recently from 21f2921 to 53e8254 Compare June 14, 2025 21:18
Copy link
Member

potiuk commented Jun 15, 2025

nice! Highly requested feature.

rawwar reviewed Jun 15, 2025
jscheffl force-pushed the feature/track-user-who-triggered-dag-runs branch from 3ebb1ad to 2390c87 Compare June 28, 2025 10:52
Copy link
Member

dheerajturaga commented Jun 28, 2025 *
edited by jason810496
Loading

Just for reference, in Airflow 3, retrieving the triggering user from event logs requires the following approach. I've encapsulated the access_token handling within the get_airflow_client_configuration method for clarity.

That said, there may be scenarios where the logical date does not align as expected, which introduces additional complexity--particularly in our unit tests where we need to mock API responses. Given that over 50% of our DAGs rely on this functionality, its absence could significantly delay our adoption of Airflow 3.

I hope this concern is understandable and that accommodating this request is feasible. I truly appreciate your consideration.

get_event_logs: %s\n" % e)">def _find_owner_v3(dag_run=None) -> str | None:
"""
This is only for Airflow3, use the Airflow Client API
to fetch the event logs
"""
# Only run for manual runs
if dag_run.run_type.name.upper() != 'MANUAL':
logger.error(f"Not manually triggered. run_type: {dag_run.run_type}")
return

# Cant co-relate if logical date is missing for dag run
if not dag_run.logical_date:
logger.error(f"No logical date available for this run, cant find owner")
return

logical_date = dag_run.logical_date.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'

import airflow_client.client
from airflow_client.client.rest import ApiException
from custom.api.utils import get_airflow_client_configuration

configuration = get_airflow_client_configuration()

# Enter a context with an instance of the API client
with airflow_client.client.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = airflow_client.client.EventLogApi(api_client)
event = 'trigger_dag_run'
try:
logger.info("#################################################")
logger.info(f"DAG ID: {dag_run.dag_id}")
logger.info(f"RUN ID: {dag_run.run_id}")
logger.info(f"Logical Date: {logical_date}")
logger.info("#################################################")

# Get Event Logs
api_response = api_instance.get_event_logs(dag_id=dag_run.dag_id, event=event)

if not api_response:
logger.error("No trigger events found!")
return

for event in reversed(api_response.event_logs):
logger.info(event)
if event.extra:
event_info = {}
event_info = json.loads(event.extra)
if "logical_date" in event_info:
if logical_date == event_info["logical_date"]:
logger.info(f"Matching Event: {event}")
logger.info(f"Dag triggered by: {event.owner}")
return event.owner
except Exception as e:
raise AirflowException("Exception when calling EventLogApi->get_event_logs: %s\n" % e)

jscheffl requested review from amoghrajesh and jason810496 June 28, 2025 19:49
jason810496 approved these changes Jun 29, 2025
potiuk approved these changes Jun 29, 2025
dheerajturaga approved these changes Jun 29, 2025
pierrejeambrun approved these changes Jun 30, 2025
Copy link
Member

pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits, questions, but looks good to me.

jscheffl force-pushed the feature/track-user-who-triggered-dag-runs branch from 2390c87 to 1ad52d8 Compare June 30, 2025 21:07
Copy link
Member

ashb commented Jul 1, 2025

actually just because adding one field to the schema which consumes a few bytes... many more fields on task instance which also would be candidated to optimize DB. We never had such a long discussion in other PRs adding e.g. "triggered_by" which then is falling into the same category.

That's because it's not about adding a column/it's not about what the code itself but how we do the feature, and it's about what is the most sustainable way to develop Airflow for the long term; is this short term approach right (which we will likely have in place for years), or should we spend more time to build a longer term and more generic approach.

Copy link
Member

potiuk commented Jul 1, 2025

actually just because adding one field to the schema which consumes a few bytes... many more fields on task instance which also would be candidated to optimize DB. We never had such a long discussion in other PRs adding e.g. "triggered_by" which then is falling into the same category.

That's because it's not about adding a column/it's not about what the code itself but how we do the feature, and it's about what is the most sustainable way to develop Airflow for the long term; is this short term approach right (which we will likely have in place for years), or should we spend more time to build a longer term and more generic approach.

I think we are all well aware of that. And we have different views what would be more generic way and apparently slightly different design assumptions. Nobody who wants to add user here does it because they think it's "short term gain" - the arguments here are that this is a good design decision.

There are clearly voices that audit log is a differen thing than data model of the app and they should not be mixed. That's one of the design views here. I think we should jointly make a dcieison based on those different design assumptions.

IMHO design where 'triggering user" is part of the task instance model is a good design decision.

Copy link
Member

ashb commented Jul 1, 2025

I think we are all well aware of that.

@potiuk Are you though? I don't really think you are, else you wouldn't have insulted my intentions by saying I was bike-shedding and having a pointless discussion. My point was not about what color to paint a bike shed, but if we should build a bike shed or a train station.

potiuk merged commit 09438ff into apache:main Jul 1, 2025
101 checks passed
Copy link
Member

potiuk commented Jul 1, 2025

Indeed too many discussions - 4 people approving. several having doubts, and complaints about bikesheddint. Let's just merge and move on.

jscheffl reacted with thumbs up emoji

Copy link
Contributor Author

jscheffl commented Jul 1, 2025

Thanks for merging.

Please calm down the argumentation. I was waiting before merge if there are real objections. I think the current PR was not a workaround and yes there could be cleaner and nicer solutions. I am open for somebody to change and clean-up but the additional field actually is not rocket science. So welcoming other PRs to make a real cool solution but we should focus our energy on the real critical items, I think there are sufficient other places to improve as well.

dheerajturaga and potiuk reacted with thumbs up emoji

dheerajturaga mentioned this pull request Jul 18, 2025
jscheffl deleted the feature/track-user-who-triggered-dag-runs branch October 5, 2025 07:44
kaxil added a commit to apache/airflow-client-python that referenced this pull request Oct 22, 2025
(from https://github.com/apache/airflow/tree/python-client/3.1.0rc1)

## New Features:

- Add `map_index` filter to TaskInstance API queries ([#55614](apache/airflow#55614))
- Add `has_import_errors` filter to Core API GET /dags endpoint ([#54563](apache/airflow#54563))
- Add `dag_version` filter to get_dag_runs endpoint ([#54882](apache/airflow#54882))
- Implement pattern search for event log endpoint ([#55114](apache/airflow#55114))
- Add asset-based filtering support to DAG API endpoint ([#54263](apache/airflow#54263))
- Add Greater Than and Less Than range filters to DagRuns and Task Instance list ([#54302](apache/airflow#54302))
- Add `try_number` as filter to task instances ([#54695](apache/airflow#54695))
- Add filters to Browse XComs endpoint ([#54049](apache/airflow#54049))
- Add Filtering by DAG Bundle Name and Version to API routes ([#54004](apache/airflow#54004))
- Add search filter for DAG runs by triggering user name ([#53652](apache/airflow#53652))
- Enable multi sorting (AIP-84) ([#53408](apache/airflow#53408))
- Add `run_on_latest_version` support for backfill and clear operations ([#52177](apache/airflow#52177))
- Add `run_id_pattern` search for Dag Run API ([#52437](apache/airflow#52437))
- Add tracking of triggering user to Dag runs ([#51738](apache/airflow#51738))
- Expose DAG parsing duration in the API ([#54752](apache/airflow#54752))

## New API Endpoints:

- Add Human-in-the-Loop (HITL) endpoints for approval workflows ([#52868](apache/airflow#52868), [#53373](apache/airflow#53373), [#53376](apache/airflow#53376), [#53885](apache/airflow#53885), [#53923](apache/airflow#53923), [#54308](apache/airflow#54308), [#54310](apache/airflow#54310), [#54723](apache/airflow#54723), [#54773](apache/airflow#54773), [#55019](apache/airflow#55019), [#55463](apache/airflow#55463), [#55525](apache/airflow#55525), [#55535](apache/airflow#55535), [#55603](apache/airflow#55603), [#55776](apache/airflow#55776))
- Add endpoint to watch dag run until finish ([#51920](apache/airflow#51920))
- Add TI bulk actions endpoint ([#50443](apache/airflow#50443))
- Add Keycloak Refresh Token Endpoint ([#51657](apache/airflow#51657))

## Deprecations:

- Mark `DagDetailsResponse.concurrency` as deprecated ([#55150](apache/airflow#55150))

## Bug Fixes:

- Fix dag import error modal pagination ([#55719](apache/airflow#55719))
kaxil added a commit to apache/airflow-client-python that referenced this pull request Oct 23, 2025
(from https://github.com/apache/airflow/tree/python-client/3.1.0rc1)

## New Features:

- Add `map_index` filter to TaskInstance API queries ([#55614](apache/airflow#55614))
- Add `has_import_errors` filter to Core API GET /dags endpoint ([#54563](apache/airflow#54563))
- Add `dag_version` filter to get_dag_runs endpoint ([#54882](apache/airflow#54882))
- Implement pattern search for event log endpoint ([#55114](apache/airflow#55114))
- Add asset-based filtering support to DAG API endpoint ([#54263](apache/airflow#54263))
- Add Greater Than and Less Than range filters to DagRuns and Task Instance list ([#54302](apache/airflow#54302))
- Add `try_number` as filter to task instances ([#54695](apache/airflow#54695))
- Add filters to Browse XComs endpoint ([#54049](apache/airflow#54049))
- Add Filtering by DAG Bundle Name and Version to API routes ([#54004](apache/airflow#54004))
- Add search filter for DAG runs by triggering user name ([#53652](apache/airflow#53652))
- Enable multi sorting (AIP-84) ([#53408](apache/airflow#53408))
- Add `run_on_latest_version` support for backfill and clear operations ([#52177](apache/airflow#52177))
- Add `run_id_pattern` search for Dag Run API ([#52437](apache/airflow#52437))
- Add tracking of triggering user to Dag runs ([#51738](apache/airflow#51738))
- Expose DAG parsing duration in the API ([#54752](apache/airflow#54752))

## New API Endpoints:

- Add Human-in-the-Loop (HITL) endpoints for approval workflows ([#52868](apache/airflow#52868), [#53373](apache/airflow#53373), [#53376](apache/airflow#53376), [#53885](apache/airflow#53885), [#53923](apache/airflow#53923), [#54308](apache/airflow#54308), [#54310](apache/airflow#54310), [#54723](apache/airflow#54723), [#54773](apache/airflow#54773), [#55019](apache/airflow#55019), [#55463](apache/airflow#55463), [#55525](apache/airflow#55525), [#55535](apache/airflow#55535), [#55603](apache/airflow#55603), [#55776](apache/airflow#55776))
- Add endpoint to watch dag run until finish ([#51920](apache/airflow#51920))
- Add TI bulk actions endpoint ([#50443](apache/airflow#50443))
- Add Keycloak Refresh Token Endpoint ([#51657](apache/airflow#51657))

## Deprecations:

- Mark `DagDetailsResponse.concurrency` as deprecated ([#55150](apache/airflow#55150))

## Bug Fixes:

- Fix dag import error modal pagination ([#55719](apache/airflow#55719))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ashb ashb left review comments

potiuk potiuk approved these changes

pierrejeambrun pierrejeambrun approved these changes

jason810496 jason810496 approved these changes

dheerajturaga dheerajturaga approved these changes

ephraimbuddy Awaiting requested review from ephraimbuddy ephraimbuddy is a code owner

bbovenzi Awaiting requested review from bbovenzi bbovenzi is a code owner

ryanahamilton Awaiting requested review from ryanahamilton ryanahamilton is a code owner

shubhamraj-git Awaiting requested review from shubhamraj-git shubhamraj-git is a code owner

bugraoz93 Awaiting requested review from bugraoz93 bugraoz93 is a code owner

kaxil Awaiting requested review from kaxil kaxil is a code owner

XD-DENG Awaiting requested review from XD-DENG XD-DENG is a code owner

rawwar Awaiting requested review from rawwar rawwar is a code owner

amoghrajesh Awaiting requested review from amoghrajesh amoghrajesh is a code owner

+1 more reviewer

molcay molcay left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Projects

None yet

Milestone

Airflow 3.1.0

Development

Successfully merging this pull request may close these issues.

10 participants