Skip to content

Data

Imports required by the code examples below
>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> import vectorbtpro as vbt

Feature-oriented data

  • The main limitation of the vectorbtpro's data class was that it could only store data in a symbol-oriented format, which means features such as OHLC had to be put into a single DataFrame beforehand. This appears a bit counterproductive as in vectorbtpro we mainly work on these features separately. For example, when we call data.close, vectorbtpro actually searches for "close" columns across all symbols, extracts them, and then concatenates them into another DataFrame. Thus, the data class has been redesigned to natively store feature-oriented data as well!
Create a feature-oriented data instance from various portfolio time series
>>> data = vbt.YFData.pull(["AAPL", "MSFT", "GOOG"])
>>> pf = data.run("from_random_signals", n=[10, 20, 30])

>>> pf_data = vbt.Data.from_data(
...     vbt.feature_dict({
...         "cash": pf.cash,
...         "assets": pf.assets,
...         "asset_value": pf.asset_value,
...         "value": pf.value
...     })
... )
>>> pf_data.get(feature="cash", symbol=(10, "AAPL"))
Date
1980-12-12 05:00:00+00:00      100.000000
1980-12-15 05:00:00+00:00      100.000000
1980-12-16 05:00:00+00:00      100.000000
1980-12-17 05:00:00+00:00      100.000000
1980-12-18 05:00:00+00:00      100.000000
                                      ...
2023-08-25 04:00:00+00:00    81193.079771
2023-08-28 04:00:00+00:00    81193.079771
2023-08-29 04:00:00+00:00    81193.079771
2023-08-30 04:00:00+00:00    81193.079771
2023-08-31 04:00:00+00:00    81193.079771
Name: (10, AAPL), Length: 10770, dtype: float64

Parallel data

  • Data fetching and updating can be easily parallelized.
Benchmark fetching multiple symbols serially and concurrently
>>> symbols = ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"]

>>> with vbt.Timer() as timer:
...     data = vbt.YFData.pull(symbols)
>>> print(timer.elapsed())
4.52 seconds

>>> with vbt.Timer() as timer:
...     data = vbt.YFData.pull(symbols, execute_kwargs=dict(engine="threadpool"))
>>> print(timer.elapsed())
918.54 milliseconds

Trading View

  • Welcome a new class specialized in pulling data from TradingView!
>>> data = vbt.TVData.pull(
...     "NASDAQ:AAPL",
...     timeframe="1 minute",
...     tz="US/Eastern"
... )
>>> data.get()
                             Open    High     Low   Close   Volume
datetime                                                          
2022-12-05 09:30:00-05:00  147.75  148.31  147.50  148.28  37769.0
2022-12-05 09:31:00-05:00  148.28  148.67  148.28  148.49  10525.0
2022-12-05 09:32:00-05:00  148.50  148.73  148.30  148.30   4860.0
2022-12-05 09:33:00-05:00  148.25  148.73  148.25  148.64   5306.0
2022-12-05 09:34:00-05:00  148.62  148.97  148.52  148.97   5808.0
...                           ...     ...     ...     ...      ...
2023-01-17 15:55:00-05:00  135.80  135.91  135.80  135.86  37573.0
2023-01-17 15:56:00-05:00  135.85  135.88  135.80  135.88  18796.0
2023-01-17 15:57:00-05:00  135.88  135.93  135.85  135.91  21019.0
2023-01-17 15:58:00-05:00  135.90  135.97  135.89  135.95  20934.0
2023-01-17 15:59:00-05:00  135.94  136.00  135.84  135.94  86696.0

[11310 rows x 5 columns]

  • Most data classes can retrieve the full list of symbols available at an exchange and optionally filter the list either using a globbing or regular expression pattern. This works for local data classes as well!
Get all XRP pairs listed on Binance
>>> vbt.BinanceData.list_symbols("XRP*")
{'XRPAUD',
 'XRPBEARBUSD',
 'XRPBEARUSDT',
 'XRPBIDR',
 'XRPBKRW',
 'XRPBNB',
 'XRPBRL',
 'XRPBTC',
 'XRPBULLBUSD',
 'XRPBULLUSDT',
 'XRPBUSD',
 'XRPDOWNUSDT',
 'XRPETH',
 'XRPEUR',
 'XRPGBP',
 'XRPNGN',
 'XRPPAX',
 'XRPRUB',
 'XRPTRY',
 'XRPTUSD',
 'XRPUPUSDT',
 'XRPUSDC',
 'XRPUSDT'}

Symbol classes

  • Thanks to vectorbtpro taking advantage of multi-indexes in Pandas, you can associate each symbol with one to multiple classes, such as sectors. This can allow you to analyze the performance of a trading strategy relative to each class.
Compare equal-weighted portfolios for three sectors
>>> classes = vbt.symbol_dict({
...     "MSFT": dict(sector="Technology"),
...     "GOOGL": dict(sector="Technology"),
...     "META": dict(sector="Technology"),
...     "JPM": dict(sector="Finance"),
...     "BAC": dict(sector="Finance"),
...     "WFC": dict(sector="Finance"),
...     "AMZN": dict(sector="Retail"),
...     "WMT": dict(sector="Retail"),
...     "BABA": dict(sector="Retail"),
... })
>>> data = vbt.YFData.pull(
...     list(classes.keys()), 
...     classes=classes,
...     missing_index="drop"
... )
>>> pf = vbt.PF.from_orders(
...     data, 
...     size=vbt.index_dict({0: 1 / 3}),  # (1)!
...     size_type="targetpercent",
...     group_by="sector", 
...     cash_sharing=True
... )
>>> pf.value.vbt.plot().show()
  1. There are three assets in each group - allocate 33.3% to each asset at the first bar

Runnable data

  • Tired of figuring out which arguments are required by an indicator? Data instances can now recognize the arguments of an indicator and just any function in general, map them to the column names, and run the function by passing the required columns. You can also change the mapping, override indicator parameters, and also query indicators by their names - the data instance will search for it in all integrated indicator packages and return the first (and best) one found!
Run Stochastic RSI by data
>>> data = vbt.YFData.pull("BTC-USD")
>>> stochrsi = data.run("stochrsi")
>>> stochrsi.fastd
Date
2014-09-17 00:00:00+00:00          NaN
2014-09-18 00:00:00+00:00          NaN
2014-09-19 00:00:00+00:00          NaN
2014-09-20 00:00:00+00:00          NaN
2014-09-21 00:00:00+00:00          NaN
                                   ...
2023-01-15 00:00:00+00:00    96.168788
2023-01-16 00:00:00+00:00    91.733393
2023-01-17 00:00:00+00:00    78.295255
2023-01-18 00:00:00+00:00    48.793133
2023-01-20 00:00:00+00:00    26.242474
Name: Close, Length: 3047, dtype: float64

Data transformation

  • You've fetched some data, how do you change it? There's a new method that puts all symbols into one single DataFrame and passes this DataFrame to a UDF for transformation.
Remove weekends
>>> data = vbt.YFData.pull(["BTC-USD", "ETH-USD"], start="2020-01", end="2020-14")
>>> new_data = data.transform(lambda df: df[~df.index.weekday.isin([5, 6])])
>>> new_data.close
symbol                         BTC-USD     ETH-USD
Date                                              
2020-01-01 00:00:00+00:00  7200.174316  130.802002
2020-01-02 00:00:00+00:00  6985.470215  127.410179
2020-01-03 00:00:00+00:00  7344.884277  134.171707
2020-01-06 00:00:00+00:00  7769.219238  144.304153
2020-01-07 00:00:00+00:00  8163.692383  143.543991
2020-01-08 00:00:00+00:00  8079.862793  141.258133
2020-01-09 00:00:00+00:00  7879.071289  138.979202
2020-01-10 00:00:00+00:00  8166.554199  143.963776
2020-01-13 00:00:00+00:00  8144.194336  144.226593

Synthetic OHLC

  • There are new basic models for synthetic OHLC data generation - especially useful for leakage detection.
Generate 3 months of synthetic data using Geometric Brownian Motion
>>> data = vbt.GBMOHLCData.pull("R", start="2022-01", end="2022-04")
>>> data.plot().show()

Data saver

  • Imagine a script that can periodically pull the latest data from an exchange and save it to disk, all without your intervention? vectorbtpro implements two classes that can do just this: one that saves to CSV and another one that saves to HDF.
BTCUSDT_1m_saver.py
import vectorbtpro as vbt

import logging
logging.basicConfig(level=logging.INFO)

if __name__ == "__main__":
    if vbt.CSVDataSaver.file_exists():
        csv_saver = vbt.CSVDataSaver.load()
        csv_saver.update()
        init_save = False
    else:
        data = vbt.BinanceData.pull(
            "BTCUSDT", 
            start="10 minutes ago UTC",
            timeframe="1 minute"
        )
        csv_saver = vbt.CSVDataSaver(data)
        init_save = True
    csv_saver.update_every(1, "minute", init_save=init_save)
    csv_saver.save()  # (1)!
  1. CSV data saver stores only the latest data update, which acts as a starting point of the next update, thus save it and re-use in the next runtime
Run in console and then interrupt
$ python BTCUSDT_1m_saver.py
2023-02-01 12:26:36.744000+00:00 - 2023-02-01 12:36:00+00:00: : 1it [00:01,  1.22s/it]
INFO:vectorbtpro.data.saver:Saved initial 10 rows from 2023-02-01 12:27:00+00:00 to 2023-02-01 12:36:00+00:00
INFO:vectorbtpro.utils.schedule_:Starting schedule manager with jobs [Every 1 minute do update(save_kwargs=None) (last run: [never], next run: 2023-02-01 13:37:38)]
INFO:vectorbtpro.data.saver:Saved 2 rows from 2023-02-01 12:36:00+00:00 to 2023-02-01 12:37:00+00:00
INFO:vectorbtpro.data.saver:Saved 2 rows from 2023-02-01 12:37:00+00:00 to 2023-02-01 12:38:00+00:00
INFO:vectorbtpro.utils.schedule_:Stopping schedule manager
Run in console again to continue
$ python BTCUSDT_1m_saver.py
INFO:vectorbtpro.utils.schedule_:Starting schedule manager with jobs [Every 1 minute do update(save_kwargs=None) (last run: [never], next run: 2023-02-01 13:42:08)]
INFO:vectorbtpro.data.saver:Saved 5 rows from 2023-02-01 12:38:00+00:00 to 2023-02-01 12:42:00+00:00
INFO:vectorbtpro.utils.schedule_:Stopping schedule manager

Polygon.io

  • Welcome a new class specialized in pulling data from Polygon.io!
Get one month of 30-minute AAPL data from Polygon.io
>>> vbt.PolygonData.set_custom_settings(
...     client_config=dict(
...         api_key="YOUR_API_KEY"
...     )
... )
>>> data = vbt.PolygonData.pull(
...     "AAPL",
...     start="2022-12-01",  # (1)!
...     end="2023-01-01",
...     timeframe="30 minutes",
...     tz="US/Eastern"
... )
>>> data.get()
                             Open    High     Low     Close   Volume  \
Open time                                                              
2022-12-01 04:00:00-05:00  148.08  148.08  147.04  147.3700  50886.0   
2022-12-01 04:30:00-05:00  147.37  147.37  147.12  147.2600  16575.0   
2022-12-01 05:00:00-05:00  147.31  147.51  147.20  147.3800  20753.0   
2022-12-01 05:30:00-05:00  147.43  147.56  147.38  147.3800   7388.0   
2022-12-01 06:00:00-05:00  147.30  147.38  147.24  147.2400   7416.0   
...                           ...     ...     ...       ...      ...   
2022-12-30 17:30:00-05:00  129.94  130.05  129.91  129.9487  35694.0   
2022-12-30 18:00:00-05:00  129.95  130.00  129.94  129.9500  15595.0   
2022-12-30 18:30:00-05:00  129.94  130.05  129.94  130.0100  20287.0   
2022-12-30 19:00:00-05:00  129.99  130.04  129.99  130.0000  12490.0   
2022-12-30 19:30:00-05:00  130.00  130.04  129.97  129.9700  28271.0   

                           Trade count      VWAP  
Open time                                         
2022-12-01 04:00:00-05:00         1024  147.2632  
2022-12-01 04:30:00-05:00          412  147.2304  
2022-12-01 05:00:00-05:00          306  147.3466  
2022-12-01 05:30:00-05:00          201  147.4818  
2022-12-01 06:00:00-05:00          221  147.2938  
...                                ...       ...  
2022-12-30 17:30:00-05:00          350  129.9672  
2022-12-30 18:00:00-05:00          277  129.9572  
2022-12-30 18:30:00-05:00          312  130.0034  
2022-12-30 19:00:00-05:00          176  130.0140  
2022-12-30 19:30:00-05:00          366  129.9941  

[672 rows x 7 columns]
  1. In the timezone provided via tz

Alpha Vantage

  • Welcome a new class specialized in pulling data from Alpha Vantage!
Get Stochastic RSI of IBM from Alpha Vantage
>>> data = vbt.AVData.pull(
...     "IBM",
...     category="technical-indicators",
...     function="STOCHRSI",
...     params=dict(fastkperiod=14)
... )
>>> data.get()
                              FastD     FastK
1999-12-07 00:00:00+00:00  100.0000  100.0000
1999-12-08 00:00:00+00:00  100.0000  100.0000
1999-12-09 00:00:00+00:00   77.0255   31.0765
1999-12-10 00:00:00+00:00   43.6922    0.0000
1999-12-13 00:00:00+00:00   12.0197    4.9826
...                             ...       ...
2023-01-26 00:00:00+00:00   11.7960    0.0000
2023-01-27 00:00:00+00:00    3.7773    0.0000
2023-01-30 00:00:00+00:00    4.4824   13.4471
2023-01-31 00:00:00+00:00    7.8258   10.0302
2023-02-01 16:00:01+00:00   13.0966   15.8126

[5826 rows x 2 columns]

Get Index of Consumer Sentiment from Nasdaq Data Link
>>> data = vbt.NDLData.pull("UMICH/SOC1")
>>> data.get()
                           Index
Date                            
1952-11-30 00:00:00+00:00   86.2
1953-02-28 00:00:00+00:00   90.7
1953-08-31 00:00:00+00:00   80.8
1953-11-30 00:00:00+00:00   80.7
1954-02-28 00:00:00+00:00   82.0
...                          ...
2022-08-31 00:00:00+00:00   58.2
2022-09-30 00:00:00+00:00   58.6
2022-10-31 00:00:00+00:00   59.9
2022-11-30 00:00:00+00:00   56.8
2022-12-31 00:00:00+00:00   59.7

[632 rows x 1 columns]

Data merging

  • Often, there's a need to backtest symbols that are coming from different exchanges by putting them into the same basket. For this, vectorbtpro has got a class method that can merge multiple data instances into a single one. Not only you can combine multiple symbols, but also merge datasets that correspond to a single symbol - all done automatically!
Pull BTC datasets from various exchanges and plot them relative to their mean
>>> binance_data = vbt.CCXTData.pull("BTCUSDT", exchange="binance")
>>> bybit_data = vbt.CCXTData.pull("BTCUSDT", exchange="bybit")
>>> bitfinex_data = vbt.CCXTData.pull("BTC/USDT", exchange="bitfinex")
>>> kucoin_data = vbt.CCXTData.pull("BTC-USDT", exchange="kucoin")

>>> data = vbt.Data.merge([
...     binance_data.rename({"BTCUSDT": "Binance"}),
...     bybit_data.rename({"BTCUSDT": "Bybit"}),
...     bitfinex_data.rename({"BTC/USDT": "Bitfinex"}),
...     kucoin_data.rename({"BTC-USDT": "KuCoin"}),
... ], missing_index="drop", silence_warnings=True)

>>> @njit
... def rescale_nb(x):
...     return (x - x.mean()) / x.mean()

>>> rescaled_close = data.close.vbt.row_apply(rescale_nb)
>>> rescaled_close = rescaled_close.vbt.rolling_mean(30)
>>> rescaled_close.loc["2020":"2020"].vbt.plot().show()

Alpaca

  • Welcome a new class specialized in pulling data from Alpaca!
Get one week of adjusted 1-minute AAPL data from Alpaca
>>> vbt.AlpacaData.set_custom_settings(
...     client_config=dict(
...         api_key="YOUR_API_KEY",
...         secret_key="YOUR_API_SECRET"
...     )
... )
>>> data = vbt.AlpacaData.pull(
...     "AAPL",
...     start="one week ago 00:00",  # (1)!
...     end="15 minutes ago",  # (2)!
...     timeframe="1 minute",
...     adjustment="all",
...     tz="US/Eastern"
... )
>>> data.get()
                               Open      High       Low     Close  Volume  \
Open time                                                                   
2023-01-30 04:00:00-05:00  145.5400  145.5400  144.0100  144.0200  5452.0   
2023-01-30 04:01:00-05:00  144.0800  144.0800  144.0000  144.0500  3616.0   
2023-01-30 04:02:00-05:00  144.0300  144.0400  144.0100  144.0100  1671.0   
2023-01-30 04:03:00-05:00  144.0100  144.0300  144.0000  144.0300  4721.0   
2023-01-30 04:04:00-05:00  144.0200  144.0200  144.0200  144.0200  1343.0   
...                             ...       ...       ...       ...     ...   
2023-02-03 19:54:00-05:00  154.3301  154.3301  154.3301  154.3301   347.0   
2023-02-03 19:55:00-05:00  154.3300  154.3400  154.3200  154.3400  1438.0   
2023-02-03 19:56:00-05:00  154.3400  154.3400  154.3300  154.3300   588.0   
2023-02-03 19:58:00-05:00  154.3500  154.3500  154.3500  154.3500   555.0   
2023-02-03 19:59:00-05:00  154.3400  154.3900  154.3300  154.3900  3835.0   

                           Trade count        VWAP  
Open time                                           
2023-01-30 04:00:00-05:00          165  144.376126  
2023-01-30 04:01:00-05:00           81  144.036336  
2023-01-30 04:02:00-05:00           52  144.035314  
2023-01-30 04:03:00-05:00           56  144.012680  
2023-01-30 04:04:00-05:00           40  144.021854  
...                                ...         ...  
2023-02-03 19:54:00-05:00           21  154.331340  
2023-02-03 19:55:00-05:00           38  154.331756  
2023-02-03 19:56:00-05:00           17  154.338971  
2023-02-03 19:58:00-05:00           27  154.343090  
2023-02-03 19:59:00-05:00           58  154.357219  

[4224 rows x 7 columns]
  1. In the timezone provided via tz
  2. Remove if you have a paid plan

Local data

Once remote data has been fetched, you most likely want to persist it on disk. There are two new options for this: either serialize the entire data class, or save the actual data to CSV or HDF5. Each dataset can be stored in a single flat file, which makes it easier to work with than a database. Upon saving, the data can be effortlessly loaded back either by deserializing, or by using data classes that specialize in loading data from CSV and HDF5 files. These classes support a variety of features, including filtering by row and datetime ranges, updating, chunking, and even a smart dataset search that can traverse sub-directories recursively and return datasets that match a specific glob pattern or regular expression 🧲

Fetch and save symbols separately, then load them back jointly
>>> btc_data = vbt.BinanceData.pull("BTCUSDT")
>>> eth_data = vbt.BinanceData.pull("ETHUSDT")

>>> btc_data.to_hdf()
>>> eth_data.to_hdf()

>>> data = vbt.BinanceData.from_hdf(start="2020", end="2021")

Key 2/2

>>> data.close
symbol                      BTCUSDT  ETHUSDT
Open time                                   
2020-01-01 00:00:00+00:00   7200.85   130.77
2020-01-02 00:00:00+00:00   6965.71   127.19
2020-01-03 00:00:00+00:00   7344.96   134.35
2020-01-04 00:00:00+00:00   7354.11   134.20
2020-01-05 00:00:00+00:00   7358.75   135.37
...                             ...      ...
2020-12-27 00:00:00+00:00  26281.66   685.11
2020-12-28 00:00:00+00:00  27079.41   730.41
2020-12-29 00:00:00+00:00  27385.00   732.00
2020-12-30 00:00:00+00:00  28875.54   752.17
2020-12-31 00:00:00+00:00  28923.63   736.42

[366 rows x 2 columns]