[ENH] Remove yfinance as a dependency and implement data_loader#721

Open

Shuvam586 wants to merge 3 commits intoPyPortfolio:mainfrom

Shuvam586:remove-yfinance

Open

[ENH] Remove yfinance as a dependency and implement data_loader#721
Shuvam586 wants to merge 3 commits intoPyPortfolio:mainfrom
Shuvam586:remove-yfinance

Conversation

Copy link

Shuvam586 commented Mar 3, 2026

Closes #716

Replaces yfinance usage in notebooks with bundled static example data (stock_prices.csv and market_caps.csv).
Adds pypfopt.data loaders for stock prices and market caps.
Removes yfinance dependency entirely.


 [ENH] Remove yfinance as a dependancy and implement data_loader

bbc653a

Copy link

Author

Shuvam586 commented Mar 3, 2026

@fkiraly

instead of editing tickers mentioned in the cookbook notebooks, i added 2 csv files to pypfopt/data. stock_prices.csv and market_caps.csv with only data from 2023 onwards. the stock_prices.csv is around 450kb.

also removed yfinance related statements and functions from all notebooks

fkiraly changed the title ~~[ENH] Remove yfinance as a dependancy and implement data_loader~~ [ENH] Remove yfinance as a dependency and implement data_loader

Mar 3, 2026

fkiraly requested changes

Mar 3, 2026

View reviewed changes

Copy link

Collaborator

fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

May I request to not use data downloaded via yfinance from Yahoo services at all? This is due to terms of use, we should not distribute data from Yahoo services at all in the repository or package.

Could you instead use similar data? Either completely randomly generated (Brownian motion random walk or similar, with same column names and time index), or taking some inspiration from the actual data in how you randomize - but it cannot be the exact values.

fkiraly added the documentation label Mar 3, 2026


 replaced yfinance data with synthetic data

Copy link

Author

Shuvam586 commented Mar 5, 2026

@fkiraly

market_caps.csv and stock_prices.csv now contain data produced synthetically with Geometric Brownian Motion.

Copy link

Collaborator

fkiraly commented Mar 5, 2026

Thanks! Could you kindly post here the plots in the notebooks before/after, just to check if they look similar?

Copy link

Collaborator

fkiraly commented Mar 5, 2026

also, code formatting tests are failing, please look at pre-commit


 pre-commit fixes

67a54db

Copy link

Author

Shuvam586 commented Mar 7, 2026

i have run pre-commit and pushed the changes.

Copy link

Author

Shuvam586 commented Mar 7, 2026

plots you asked for:

previous data plot with data from yfinance:

synthetic data plot from brownian motion:

Copy link

Collaborator

fkiraly commented Mar 7, 2026

Thanks!

I suspect the actual data would be closer to exponential Brownian motion - that should be achieved by simply taking np.exp of the Brownian motion.

fkiraly reviewed

Mar 7, 2026

View reviewed changes

pypfopt/data/data_loader.py

		return pd.read_csv(f, **read_csv_kwargs)


		def load_stockdata(tickers: list = None, start: str = None, end: str = None):

Copy link

Collaborator

fkiraly Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add docstrings (numpydoc format)

fkiraly reviewed

Mar 7, 2026

View reviewed changes

pypfopt/data/data_loader.py



		def available_tickers():
		df = _load_raw_data("stock_prices.csv", parse_dates=["date"])

Copy link

Collaborator

fkiraly Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not known in advance? Instad of loading the csv, you could simply load the header, or return the known list

Copy link

Author

Shuvam586 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, i will rewrite it to return only the list of the tickers in the csv file.

fkiraly requested changes

Mar 7, 2026

View reviewed changes

Copy link

Collaborator

fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

can you re-execute the notebooks after a clean reset?
I think the simulated data should be exponential brownian motion to resemble the actual data
please add numpydoc docstrings to the data loaders
further comments above

fkiraly mentioned this pull request Mar 7, 2026

[MNT] homogenize CI workflows with GC.OS repositories #702

Merged

Copy link

Collaborator

fkiraly commented Mar 8, 2026

non-blocking - would it be possible to include the simulation code somewhere in the utils module?

Copy link

Author

Shuvam586 commented Mar 12, 2026

I suspect the actual data would be closer to exponential Brownian motion - that should be achieved by simply taking np.exp of the Brownian motion.

i am confused. the code i used to generate the synthetic data uses exponential brownian motion.

for t in range(1, n_days): z = np.random.normal(size=len(tickers)) prices[t] = prices[t-1] * np.exp( (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * z )

Copy link

Author

Shuvam586 commented Mar 12, 2026

can you re-execute the notebooks after a clean reset?

yes. i re-executed the cookbook notebooks before making the first commit.

Labels

documentation

Conversation

Shuvam586 commented Mar 3, 2026

Uh oh!

Shuvam586 commented Mar 3, 2026

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Shuvam586 commented Mar 5, 2026

Uh oh!

fkiraly commented Mar 5, 2026

Uh oh!

fkiraly commented Mar 5, 2026

Uh oh!

Shuvam586 commented Mar 7, 2026

Uh oh!

Shuvam586 commented Mar 7, 2026

Uh oh!

fkiraly commented Mar 7, 2026

Uh oh!

fkiraly Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

fkiraly Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Shuvam586 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Mar 8, 2026

Uh oh!

Shuvam586 commented Mar 12, 2026

Uh oh!

Shuvam586 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants