Data¶
Imports required by the code examples below
>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> import vectorbtpro as vbt
Feature-oriented data ¶
- The main limitation of the vectorbtpro's data class was that it could only store data in a symbol-oriented format, which means features such as OHLC had to be put into a single DataFrame beforehand. This appears a bit counterproductive as in vectorbtpro we mainly work on these features separately. For example, when we call
data.close
, vectorbtpro actually searches for "close" columns across all symbols, extracts them, and then concatenates them into another DataFrame. Thus, the data class has been redesigned to natively store feature-oriented data as well!
Create a feature-oriented data instance from various portfolio time series
>>> data = vbt.YFData.pull(["AAPL", "MSFT", "GOOG"])
>>> pf = data.run("from_random_signals", n=[10, 20, 30])
>>> pf_data = vbt.Data.from_data(
... vbt.feature_dict({
... "cash": pf.cash,
... "assets": pf.assets,
... "asset_value": pf.asset_value,
... "value": pf.value
... })
... )
>>> pf_data.get(feature="cash", symbol=(10, "AAPL"))
Date
1980-12-12 05:00:00+00:00 100.000000
1980-12-15 05:00:00+00:00 100.000000
1980-12-16 05:00:00+00:00 100.000000
1980-12-17 05:00:00+00:00 100.000000
1980-12-18 05:00:00+00:00 100.000000
...
2023-08-25 04:00:00+00:00 81193.079771
2023-08-28 04:00:00+00:00 81193.079771
2023-08-29 04:00:00+00:00 81193.079771
2023-08-30 04:00:00+00:00 81193.079771
2023-08-31 04:00:00+00:00 81193.079771
Name: (10, AAPL), Length: 10770, dtype: float64
Parallel data¶
- Data fetching and updating can be easily parallelized.
Benchmark fetching multiple symbols serially and concurrently
>>> symbols = ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"]
>>> with vbt.Timer() as timer:
... data = vbt.YFData.pull(symbols)
>>> print(timer.elapsed())
4.52 seconds
>>> with vbt.Timer() as timer:
... data = vbt.YFData.pull(symbols, execute_kwargs=dict(engine="threadpool"))
>>> print(timer.elapsed())
918.54 milliseconds
Trading View¶
- Welcome a new class specialized in pulling data from TradingView!
>>> data = vbt.TVData.pull(
... "NASDAQ:AAPL",
... timeframe="1 minute",
... tz="US/Eastern"
... )
>>> data.get()
Open High Low Close Volume
datetime
2022-12-05 09:30:00-05:00 147.75 148.31 147.50 148.28 37769.0
2022-12-05 09:31:00-05:00 148.28 148.67 148.28 148.49 10525.0
2022-12-05 09:32:00-05:00 148.50 148.73 148.30 148.30 4860.0
2022-12-05 09:33:00-05:00 148.25 148.73 148.25 148.64 5306.0
2022-12-05 09:34:00-05:00 148.62 148.97 148.52 148.97 5808.0
... ... ... ... ... ...
2023-01-17 15:55:00-05:00 135.80 135.91 135.80 135.86 37573.0
2023-01-17 15:56:00-05:00 135.85 135.88 135.80 135.88 18796.0
2023-01-17 15:57:00-05:00 135.88 135.93 135.85 135.91 21019.0
2023-01-17 15:58:00-05:00 135.90 135.97 135.89 135.95 20934.0
2023-01-17 15:59:00-05:00 135.94 136.00 135.84 135.94 86696.0
[11310 rows x 5 columns]
Symbol search¶
- Most data classes can retrieve the full list of symbols available at an exchange and optionally filter the list either using a globbing or regular expression pattern. This works for local data classes as well!
Get all XRP pairs listed on Binance
>>> vbt.BinanceData.list_symbols("XRP*")
{'XRPAUD',
'XRPBEARBUSD',
'XRPBEARUSDT',
'XRPBIDR',
'XRPBKRW',
'XRPBNB',
'XRPBRL',
'XRPBTC',
'XRPBULLBUSD',
'XRPBULLUSDT',
'XRPBUSD',
'XRPDOWNUSDT',
'XRPETH',
'XRPEUR',
'XRPGBP',
'XRPNGN',
'XRPPAX',
'XRPRUB',
'XRPTRY',
'XRPTUSD',
'XRPUPUSDT',
'XRPUSDC',
'XRPUSDT'}
Symbol classes¶
- Thanks to vectorbtpro taking advantage of multi-indexes in Pandas, you can associate each symbol with one to multiple classes, such as sectors. This can allow you to analyze the performance of a trading strategy relative to each class.
Compare equal-weighted portfolios for three sectors
>>> classes = vbt.symbol_dict({
... "MSFT": dict(sector="Technology"),
... "GOOGL": dict(sector="Technology"),
... "META": dict(sector="Technology"),
... "JPM": dict(sector="Finance"),
... "BAC": dict(sector="Finance"),
... "WFC": dict(sector="Finance"),
... "AMZN": dict(sector="Retail"),
... "WMT": dict(sector="Retail"),
... "BABA": dict(sector="Retail"),
... })
>>> data = vbt.YFData.pull(
... list(classes.keys()),
... classes=classes,
... missing_index="drop"
... )
>>> pf = vbt.PF.from_orders(
... data,
... size=vbt.index_dict({0: 1 / 3}), # (1)!
... size_type="targetpercent",
... group_by="sector",
... cash_sharing=True
... )
>>> pf.value.vbt.plot().show()
- There are three assets in each group - allocate 33.3% to each asset at the first bar
Runnable data¶
- Tired of figuring out which arguments are required by an indicator? Data instances can now recognize the arguments of an indicator and just any function in general, map them to the column names, and run the function by passing the required columns. You can also change the mapping, override indicator parameters, and also query indicators by their names - the data instance will search for it in all integrated indicator packages and return the first (and best) one found!
Run Stochastic RSI by data
>>> data = vbt.YFData.pull("BTC-USD")
>>> stochrsi = data.run("stochrsi")
>>> stochrsi.fastd
Date
2014-09-17 00:00:00+00:00 NaN
2014-09-18 00:00:00+00:00 NaN
2014-09-19 00:00:00+00:00 NaN
2014-09-20 00:00:00+00:00 NaN
2014-09-21 00:00:00+00:00 NaN
...
2023-01-15 00:00:00+00:00 96.168788
2023-01-16 00:00:00+00:00 91.733393
2023-01-17 00:00:00+00:00 78.295255
2023-01-18 00:00:00+00:00 48.793133
2023-01-20 00:00:00+00:00 26.242474
Name: Close, Length: 3047, dtype: float64
Data transformation¶
- You've fetched some data, how do you change it? There's a new method that puts all symbols into one single DataFrame and passes this DataFrame to a UDF for transformation.
Remove weekends
>>> data = vbt.YFData.pull(["BTC-USD", "ETH-USD"], start="2020-01", end="2020-14")
>>> new_data = data.transform(lambda df: df[~df.index.weekday.isin([5, 6])])
>>> new_data.close
symbol BTC-USD ETH-USD
Date
2020-01-01 00:00:00+00:00 7200.174316 130.802002
2020-01-02 00:00:00+00:00 6985.470215 127.410179
2020-01-03 00:00:00+00:00 7344.884277 134.171707
2020-01-06 00:00:00+00:00 7769.219238 144.304153
2020-01-07 00:00:00+00:00 8163.692383 143.543991
2020-01-08 00:00:00+00:00 8079.862793 141.258133
2020-01-09 00:00:00+00:00 7879.071289 138.979202
2020-01-10 00:00:00+00:00 8166.554199 143.963776
2020-01-13 00:00:00+00:00 8144.194336 144.226593
Synthetic OHLC¶
- There are new basic models for synthetic OHLC data generation - especially useful for leakage detection.
Generate 3 months of synthetic data using Geometric Brownian Motion
>>> data = vbt.GBMOHLCData.pull("R", start="2022-01", end="2022-04")
>>> data.plot().show()
Data saver¶
- Imagine a script that can periodically pull the latest data from an exchange and save it to disk, all without your intervention? vectorbtpro implements two classes that can do just this: one that saves to CSV and another one that saves to HDF.
BTCUSDT_1m_saver.py
import vectorbtpro as vbt
import logging
logging.basicConfig(level=logging.INFO)
if __name__ == "__main__":
if vbt.CSVDataSaver.file_exists():
csv_saver = vbt.CSVDataSaver.load()
csv_saver.update()
init_save = False
else:
data = vbt.BinanceData.pull(
"BTCUSDT",
start="10 minutes ago UTC",
timeframe="1 minute"
)
csv_saver = vbt.CSVDataSaver(data)
init_save = True
csv_saver.update_every(1, "minute", init_save=init_save)
csv_saver.save() # (1)!
- CSV data saver stores only the latest data update, which acts as a starting point of the next update, thus save it and re-use in the next runtime
Run in console and then interrupt
$ python BTCUSDT_1m_saver.py
2023-02-01 12:26:36.744000+00:00 - 2023-02-01 12:36:00+00:00: : 1it [00:01, 1.22s/it]
INFO:vectorbtpro.data.saver:Saved initial 10 rows from 2023-02-01 12:27:00+00:00 to 2023-02-01 12:36:00+00:00
INFO:vectorbtpro.utils.schedule_:Starting schedule manager with jobs [Every 1 minute do update(save_kwargs=None) (last run: [never], next run: 2023-02-01 13:37:38)]
INFO:vectorbtpro.data.saver:Saved 2 rows from 2023-02-01 12:36:00+00:00 to 2023-02-01 12:37:00+00:00
INFO:vectorbtpro.data.saver:Saved 2 rows from 2023-02-01 12:37:00+00:00 to 2023-02-01 12:38:00+00:00
INFO:vectorbtpro.utils.schedule_:Stopping schedule manager
Run in console again to continue
$ python BTCUSDT_1m_saver.py
INFO:vectorbtpro.utils.schedule_:Starting schedule manager with jobs [Every 1 minute do update(save_kwargs=None) (last run: [never], next run: 2023-02-01 13:42:08)]
INFO:vectorbtpro.data.saver:Saved 5 rows from 2023-02-01 12:38:00+00:00 to 2023-02-01 12:42:00+00:00
INFO:vectorbtpro.utils.schedule_:Stopping schedule manager
Polygon.io¶
- Welcome a new class specialized in pulling data from Polygon.io!
Get one month of 30-minute AAPL data from Polygon.io
>>> vbt.PolygonData.set_custom_settings(
... client_config=dict(
... api_key="YOUR_API_KEY"
... )
... )
>>> data = vbt.PolygonData.pull(
... "AAPL",
... start="2022-12-01", # (1)!
... end="2023-01-01",
... timeframe="30 minutes",
... tz="US/Eastern"
... )
>>> data.get()
Open High Low Close Volume \
Open time
2022-12-01 04:00:00-05:00 148.08 148.08 147.04 147.3700 50886.0
2022-12-01 04:30:00-05:00 147.37 147.37 147.12 147.2600 16575.0
2022-12-01 05:00:00-05:00 147.31 147.51 147.20 147.3800 20753.0
2022-12-01 05:30:00-05:00 147.43 147.56 147.38 147.3800 7388.0
2022-12-01 06:00:00-05:00 147.30 147.38 147.24 147.2400 7416.0
... ... ... ... ... ...
2022-12-30 17:30:00-05:00 129.94 130.05 129.91 129.9487 35694.0
2022-12-30 18:00:00-05:00 129.95 130.00 129.94 129.9500 15595.0
2022-12-30 18:30:00-05:00 129.94 130.05 129.94 130.0100 20287.0
2022-12-30 19:00:00-05:00 129.99 130.04 129.99 130.0000 12490.0
2022-12-30 19:30:00-05:00 130.00 130.04 129.97 129.9700 28271.0
Trade count VWAP
Open time
2022-12-01 04:00:00-05:00 1024 147.2632
2022-12-01 04:30:00-05:00 412 147.2304
2022-12-01 05:00:00-05:00 306 147.3466
2022-12-01 05:30:00-05:00 201 147.4818
2022-12-01 06:00:00-05:00 221 147.2938
... ... ...
2022-12-30 17:30:00-05:00 350 129.9672
2022-12-30 18:00:00-05:00 277 129.9572
2022-12-30 18:30:00-05:00 312 130.0034
2022-12-30 19:00:00-05:00 176 130.0140
2022-12-30 19:30:00-05:00 366 129.9941
[672 rows x 7 columns]
- In the timezone provided via
tz
Alpha Vantage¶
- Welcome a new class specialized in pulling data from Alpha Vantage!
Get Stochastic RSI of IBM from Alpha Vantage
>>> data = vbt.AVData.pull(
... "IBM",
... category="technical-indicators",
... function="STOCHRSI",
... params=dict(fastkperiod=14)
... )
>>> data.get()
FastD FastK
1999-12-07 00:00:00+00:00 100.0000 100.0000
1999-12-08 00:00:00+00:00 100.0000 100.0000
1999-12-09 00:00:00+00:00 77.0255 31.0765
1999-12-10 00:00:00+00:00 43.6922 0.0000
1999-12-13 00:00:00+00:00 12.0197 4.9826
... ... ...
2023-01-26 00:00:00+00:00 11.7960 0.0000
2023-01-27 00:00:00+00:00 3.7773 0.0000
2023-01-30 00:00:00+00:00 4.4824 13.4471
2023-01-31 00:00:00+00:00 7.8258 10.0302
2023-02-01 16:00:01+00:00 13.0966 15.8126
[5826 rows x 2 columns]
Nasdaq Data Link¶
- Welcome a new class specialized in pulling data from Nasdaq Data Link!
Get Index of Consumer Sentiment from Nasdaq Data Link
>>> data = vbt.NDLData.pull("UMICH/SOC1")
>>> data.get()
Index
Date
1952-11-30 00:00:00+00:00 86.2
1953-02-28 00:00:00+00:00 90.7
1953-08-31 00:00:00+00:00 80.8
1953-11-30 00:00:00+00:00 80.7
1954-02-28 00:00:00+00:00 82.0
... ...
2022-08-31 00:00:00+00:00 58.2
2022-09-30 00:00:00+00:00 58.6
2022-10-31 00:00:00+00:00 59.9
2022-11-30 00:00:00+00:00 56.8
2022-12-31 00:00:00+00:00 59.7
[632 rows x 1 columns]
Data merging¶
- Often, there's a need to backtest symbols that are coming from different exchanges by putting them into the same basket. For this, vectorbtpro has got a class method that can merge multiple data instances into a single one. Not only you can combine multiple symbols, but also merge datasets that correspond to a single symbol - all done automatically!
Pull BTC datasets from various exchanges and plot them relative to their mean
>>> binance_data = vbt.CCXTData.pull("BTCUSDT", exchange="binance")
>>> bybit_data = vbt.CCXTData.pull("BTCUSDT", exchange="bybit")
>>> bitfinex_data = vbt.CCXTData.pull("BTC/USDT", exchange="bitfinex")
>>> kucoin_data = vbt.CCXTData.pull("BTC-USDT", exchange="kucoin")
>>> data = vbt.Data.merge([
... binance_data.rename({"BTCUSDT": "Binance"}),
... bybit_data.rename({"BTCUSDT": "Bybit"}),
... bitfinex_data.rename({"BTC/USDT": "Bitfinex"}),
... kucoin_data.rename({"BTC-USDT": "KuCoin"}),
... ], missing_index="drop", silence_warnings=True)
>>> @njit
... def rescale_nb(x):
... return (x - x.mean()) / x.mean()
>>> rescaled_close = data.close.vbt.row_apply(rescale_nb)
>>> rescaled_close = rescaled_close.vbt.rolling_mean(30)
>>> rescaled_close.loc["2020":"2020"].vbt.plot().show()
Alpaca¶
- Welcome a new class specialized in pulling data from Alpaca!
Get one week of adjusted 1-minute AAPL data from Alpaca
>>> vbt.AlpacaData.set_custom_settings(
... client_config=dict(
... api_key="YOUR_API_KEY",
... secret_key="YOUR_API_SECRET"
... )
... )
>>> data = vbt.AlpacaData.pull(
... "AAPL",
... start="one week ago 00:00", # (1)!
... end="15 minutes ago", # (2)!
... timeframe="1 minute",
... adjustment="all",
... tz="US/Eastern"
... )
>>> data.get()
Open High Low Close Volume \
Open time
2023-01-30 04:00:00-05:00 145.5400 145.5400 144.0100 144.0200 5452.0
2023-01-30 04:01:00-05:00 144.0800 144.0800 144.0000 144.0500 3616.0
2023-01-30 04:02:00-05:00 144.0300 144.0400 144.0100 144.0100 1671.0
2023-01-30 04:03:00-05:00 144.0100 144.0300 144.0000 144.0300 4721.0
2023-01-30 04:04:00-05:00 144.0200 144.0200 144.0200 144.0200 1343.0
... ... ... ... ... ...
2023-02-03 19:54:00-05:00 154.3301 154.3301 154.3301 154.3301 347.0
2023-02-03 19:55:00-05:00 154.3300 154.3400 154.3200 154.3400 1438.0
2023-02-03 19:56:00-05:00 154.3400 154.3400 154.3300 154.3300 588.0
2023-02-03 19:58:00-05:00 154.3500 154.3500 154.3500 154.3500 555.0
2023-02-03 19:59:00-05:00 154.3400 154.3900 154.3300 154.3900 3835.0
Trade count VWAP
Open time
2023-01-30 04:00:00-05:00 165 144.376126
2023-01-30 04:01:00-05:00 81 144.036336
2023-01-30 04:02:00-05:00 52 144.035314
2023-01-30 04:03:00-05:00 56 144.012680
2023-01-30 04:04:00-05:00 40 144.021854
... ... ...
2023-02-03 19:54:00-05:00 21 154.331340
2023-02-03 19:55:00-05:00 38 154.331756
2023-02-03 19:56:00-05:00 17 154.338971
2023-02-03 19:58:00-05:00 27 154.343090
2023-02-03 19:59:00-05:00 58 154.357219
[4224 rows x 7 columns]
- In the timezone provided via
tz
- Remove if you have a paid plan
Local data¶
Once remote data has been fetched, you most likely want to persist it on disk. There are two new options for this: either serialize the entire data class, or save the actual data to CSV or HDF5. Each dataset can be stored in a single flat file, which makes it easier to work with than a database. Upon saving, the data can be effortlessly loaded back either by deserializing, or by using data classes that specialize in loading data from CSV and HDF5 files. These classes support a variety of features, including filtering by row and datetime ranges, updating, chunking, and even a smart dataset search that can traverse sub-directories recursively and return datasets that match a specific glob pattern or regular expression
Fetch and save symbols separately, then load them back jointly
>>> btc_data = vbt.BinanceData.pull("BTCUSDT")
>>> eth_data = vbt.BinanceData.pull("ETHUSDT")
>>> btc_data.to_hdf()
>>> eth_data.to_hdf()
>>> data = vbt.BinanceData.from_hdf(start="2020", end="2021")
>>> data.close
symbol BTCUSDT ETHUSDT
Open time
2020-01-01 00:00:00+00:00 7200.85 130.77
2020-01-02 00:00:00+00:00 6965.71 127.19
2020-01-03 00:00:00+00:00 7344.96 134.35
2020-01-04 00:00:00+00:00 7354.11 134.20
2020-01-05 00:00:00+00:00 7358.75 135.37
... ... ...
2020-12-27 00:00:00+00:00 26281.66 685.11
2020-12-28 00:00:00+00:00 27079.41 730.41
2020-12-29 00:00:00+00:00 27385.00 732.00
2020-12-30 00:00:00+00:00 28875.54 752.17
2020-12-31 00:00:00+00:00 28923.63 736.42
[366 rows x 2 columns]