# Financial Data Sources for Data Science

> Reliable data sources for crypto and stock market data, organized by use case with tradeoffs noted.

## Stocks / Equities

### Free / low-cost (good for prototyping & research)

- **Yahoo Finance** (via `yfinance` Python lib) — free OHLCV, fundamentals, dividends, splits. Easy to use, but unofficial/scraped, so it breaks occasionally and isn't reliable for production. Great for exploration.
- **Stooq** — free historical EOD data, decent for backtesting.
- **Alpha Vantage** — free tier (rate-limited to ~25 req/day), intraday + daily, technical indicators, some fundamentals. Good API but tight limits.
- **Tiingo** — generous free tier, high-quality EOD + fundamentals, clean data. Popular with quant hobbyists.
- **SEC EDGAR** — official source for fundamentals (10-K/10-Q filings). Free, authoritative, but you parse XBRL yourself. `sec-edgar-api` / `edgartools` help.
- **FRED** (Federal Reserve) — macro/economic series (rates, CPI, etc.), free and authoritative.

### Paid / production-grade

- **Polygon.io** — tick-level, real-time + historical, clean, developer-friendly. Strong choice for serious projects.
- **Nasdaq Data Link** (formerly Quandl) — curated datasets, some free, many premium.
- **IEX Cloud** / **Finnhub** — real-time + fundamentals, reasonable pricing.
- **Refinitiv (LSEG), Bloomberg, FactSet** — institutional-grade, expensive, gold standard for accuracy and survivorship-bias-free data.
- **Norgate Data** — excellent for backtesting (delisted stocks, point-in-time, adjusted), popular in the systematic-trading community.

## Crypto

### Free / API-based

- **CoinGecko** — free tier, broad coverage (prices, market cap, volume, metadata across thousands of coins). Excellent for research; paid tier for higher limits.
- **CoinMarketCap** — similar, free + paid tiers.
- **Exchange APIs directly** — Binance, Coinbase, Kraken, Bybit all offer free historical OHLCV and order-book data. Most accurate for that specific venue's trades. The **CCXT** library unifies 100+ exchange APIs into one interface — the standard tool here.
- **Kaiko** / **CryptoCompare** — aggregated, normalized data across exchanges.

### Paid / institutional

- **Kaiko**, **Amberdata**, **CoinAPI** — professional aggregated market + on-chain data.
- **Glassnode**, **Nansen**, **Dune Analytics** — on-chain analytics (wallet flows, network metrics), valuable for crypto-specific factors.

## Practical Recommendations

| Need | Best starting point |
|------|--------------------|
| Quick prototyping (stocks) | `yfinance` or Tiingo |
| Fundamentals / financials | SEC EDGAR (authoritative) |
| Backtesting equities | Norgate or Polygon (point-in-time, adjusted) |
| Crypto OHLCV / order books | CCXT + exchange APIs |
| Crypto metadata / market overview | CoinGecko |
| On-chain crypto | Glassnode / Dune |
| Macro context | FRED |

## Data-Quality Cautions

- **Survivorship bias** — free stock sources usually omit delisted companies, inflating backtest returns. Use point-in-time datasets (Norgate, CRSP) for rigorous work.
- **Corporate actions** — make sure prices are split/dividend-adjusted; check whether the source gives adjusted vs. raw.
- **Crypto fragmentation** — prices differ across exchanges; decide whether you want a single venue or a volume-weighted aggregate, and beware wash-traded volume on smaller exchanges.
- **Survivorship in crypto** — many tokens die; aggregators often drop them.
- **Look-ahead bias** — fundamentals should be timestamped by *filing/availability* date, not the reporting period.
