Data¶
FinDataPy¶
- Welcome a new class for findatapy specialized in pulling data from Bloomberg, Eikon, Quandl, Dukascopy, and other data sources!
Discover tickers on Dukascopy and pull one day of tick data
>>> vbt.FinPyData.list_symbols(data_source="dukascopy")
['fx.dukascopy.tick.NYC.AUDCAD',
'fx.dukascopy.tick.NYC.AUDCHF',
'fx.dukascopy.tick.NYC.AUDJPY',
...
'fx.dukascopy.tick.NYC.USDTRY',
'fx.dukascopy.tick.NYC.USDZAR',
'fx.dukascopy.tick.NYC.ZARJPY']
>>> data = vbt.FinPyData.pull( # (1)!
... "fx.dukascopy.tick.NYC.EURUSD",
... start="14 Jun 2016",
... end="15 Jun 2016"
... )
>>> data.get()
close
Date
2016-06-14 00:00:00.844000+00:00 1.128795
2016-06-14 00:00:01.591000+00:00 1.128790
2016-06-14 00:00:01.743000+00:00 1.128775
2016-06-14 00:00:02.464000+00:00 1.128770
2016-06-14 00:00:02.971000+00:00 1.128760
... ...
2016-06-14 23:59:57.733000+00:00 1.121020
2016-06-14 23:59:58.239000+00:00 1.121030
2016-06-14 23:59:58.953000+00:00 1.121035
2016-06-14 23:59:59.004000+00:00 1.121050
2016-06-14 23:59:59.934000+00:00 1.121055
[82484 rows x 1 columns]
>>> data = vbt.FinPyData.pull( # (2)!
... "EURUSD",
... start="14 Jun 2016",
... end="15 Jun 2016",
... timeframe="tick",
... category="fx",
... data_source="dukascopy",
... fields=["bid", "ask", "bidv", "askv"]
... )
>>> data.get()
bid ask bidv askv
Date
2016-06-14 00:00:00.844000+00:00 1.12877 1.12882 1.00 10.12
2016-06-14 00:00:01.591000+00:00 1.12877 1.12881 1.00 1.00
2016-06-14 00:00:01.743000+00:00 1.12875 1.12880 3.11 3.00
2016-06-14 00:00:02.464000+00:00 1.12875 1.12879 2.21 1.00
2016-06-14 00:00:02.971000+00:00 1.12875 1.12877 2.21 1.00
... ... ... ... ...
2016-06-14 23:59:57.733000+00:00 1.12100 1.12104 1.24 1.50
2016-06-14 23:59:58.239000+00:00 1.12101 1.12105 9.82 1.12
2016-06-14 23:59:58.953000+00:00 1.12102 1.12105 1.50 1.12
2016-06-14 23:59:59.004000+00:00 1.12103 1.12107 1.50 1.12
2016-06-14 23:59:59.934000+00:00 1.12103 1.12108 1.87 2.25
[82484 rows x 4 columns]
- String format
- Keyword format
Databento¶
- Welcome a new class specialized in pulling data from Databento!
Get best bid and offer (BBO) data from Databento
>>> vbt.BentoData.set_custom_settings(
... client_config=dict(
... key="YOUR_KEY"
... )
... )
>>> params = dict(
... symbols="ESH3",
... dataset="GLBX.MDP3",
... start=vbt.timestamp("2022-10-28 20:30:00"),
... end=vbt.timestamp("2022-10-28 21:00:00"),
... schema="tbbo"
... )
>>> vbt.BentoData.get_cost(**params)
1.2002885341644287e-05
>>> data = vbt.BentoData.pull(**params)
>>> data.get()
ts_event \
ts_recv
2022-10-28 20:30:59.047138053+00:00 2022-10-28 20:30:59.046914657+00:00
2022-10-28 20:37:53.112494436+00:00 2022-10-28 20:37:53.112246421+00:00
...
2022-10-28 20:59:15.075191111+00:00 2022-10-28 20:59:15.074953895+00:00
2022-10-28 20:59:34.607239899+00:00 2022-10-28 20:59:34.606984277+00:00
rtype publisher_id instrument_id \
ts_recv
2022-10-28 20:30:59.047138053+00:00 1 1 206299
2022-10-28 20:37:53.112494436+00:00 1 1 206299
...
2022-10-28 20:59:15.075191111+00:00 1 1 206299
2022-10-28 20:59:34.607239899+00:00 1 1 206299
action side depth price size flags \
ts_recv
2022-10-28 20:30:59.047138053+00:00 T B 0 3955.25 1 0
2022-10-28 20:37:53.112494436+00:00 T A 0 3955.00 1 0
...
2022-10-28 20:59:15.075191111+00:00 T A 0 3953.75 1 0
2022-10-28 20:59:34.607239899+00:00 T A 0 3954.50 2 0
ts_in_delta sequence bid_px_00 \
ts_recv
2022-10-28 20:30:59.047138053+00:00 18553 73918214 3954.75
2022-10-28 20:37:53.112494436+00:00 18334 73926240 3955.00
...
2022-10-28 20:59:15.075191111+00:00 19294 73945515 3953.75
2022-10-28 20:59:34.607239899+00:00 18701 73945932 3954.50
ask_px_00 bid_sz_00 ask_sz_00 \
ts_recv
2022-10-28 20:30:59.047138053+00:00 3955.25 1 1
2022-10-28 20:37:53.112494436+00:00 3955.75 1 1
...
2022-10-28 20:59:15.075191111+00:00 3956.00 1 1
2022-10-28 20:59:34.607239899+00:00 3956.00 4 1
bid_ct_00 ask_ct_00 symbol
ts_recv
2022-10-28 20:30:59.047138053+00:00 1 1 ESH3
2022-10-28 20:37:53.112494436+00:00 1 1 ESH3
...
2022-10-28 20:59:15.075191111+00:00 1 1 ESH3
2022-10-28 20:59:34.607239899+00:00 3 1 ESH3
SQL queries¶
- Thanks to DuckDB, we can now run SQL queries directly on data instances!
Run a rolling average of 14 days on minute data using SQL
>>> data = vbt.TVData.pull(
... "AAPL",
... exchange="NASDAQ",
... timeframe="1 minute",
... tz="America/New_York"
... )
>>> data.sql("""
... SELECT datetime, AVG(Close) OVER(
... ORDER BY "datetime" ASC
... RANGE BETWEEN INTERVAL 14 DAYS PRECEDING AND CURRENT ROW
... ) AS "Moving Average"
... FROM "AAPL";
... """)
datetime
2023-09-11 09:30:00-04:00 180.080000
2023-09-11 09:31:00-04:00 179.965000
2023-09-11 09:32:00-04:00 180.000000
2023-09-11 09:33:00-04:00 180.022500
2023-09-11 09:34:00-04:00 179.984000
...
2023-10-20 15:55:00-04:00 177.786669
2023-10-20 15:56:00-04:00 177.785492
2023-10-20 15:57:00-04:00 177.784322
2023-10-20 15:58:00-04:00 177.783166
2023-10-20 15:59:00-04:00 177.781986
Name: Moving Average, Length: 11700, dtype: float64
DuckDB¶
- DuckDB is a high-performance analytical database system that provides a rich SQL dialect to interact with a variety of data stores. Not only it can run analytical queries on local data even if it doesn't fit into memory and without having to use a distributed framework, but it can also directly query CSV, Parquet, and JSON files.
Save minute data to a DuckDB database, and read one day
>>> data = vbt.TVData.pull(
... "AAPL",
... exchange="NASDAQ",
... timeframe="1 minute",
... tz="America/New_York"
... )
>>> URL = "database.duckdb"
>>> data.to_duckdb(connection=URL)
>>> day_data = vbt.DuckDBData.pull(
... "AAPL",
... start="2023-10-02 09:30:00",
... end="2023-10-02 16:00:00",
... tz="America/New_York",
... connection=URL
... )
>>> day_data.get()
Open High Low Close Volume
datetime
2023-10-02 09:30:00-04:00 171.260 171.34 170.93 171.10 61654.0
2023-10-02 09:31:00-04:00 171.130 172.37 171.13 172.30 53481.0
2023-10-02 09:32:00-04:00 172.310 172.64 172.16 172.64 44750.0
2023-10-02 09:33:00-04:00 172.640 172.97 172.54 172.78 53195.0
2023-10-02 09:34:00-04:00 172.780 173.07 172.75 173.00 47416.0
... ... ... ... ... ...
2023-10-02 15:55:00-04:00 173.300 173.51 173.26 173.51 61619.0
2023-10-02 15:56:00-04:00 173.525 173.59 173.42 173.43 45066.0
2023-10-02 15:57:00-04:00 173.430 173.55 173.42 173.50 45220.0
2023-10-02 15:58:00-04:00 173.510 173.60 173.46 173.56 47371.0
2023-10-02 15:59:00-04:00 173.560 173.78 173.56 173.75 161253.0
[390 rows x 5 columns]
SQLAlchemy¶
- SQLAlchemy provides a standard interface that allows developers to create database-agnostic code to communicate with a wide variety of SQL database engines. With its help, we can now effortlessly store data in SQL databases as well as read from them.
Save minute data to a PostgreSQL database, and read one day
>>> data = vbt.TVData.pull(
... "AAPL",
... exchange="NASDAQ",
... timeframe="1 minute",
... tz="America/New_York"
... )
>>> URL = "postgresql://postgres:postgres@localhost:5432"
>>> data.to_sql(engine=URL)
>>> day_data = vbt.SQLData.pull(
... "AAPL",
... start="2023-10-02 09:30:00",
... end="2023-10-02 16:00:00",
... tz="America/New_York",
... engine=URL
... )
>>> day_data.get()
Open High Low Close Volume
datetime
2023-10-02 09:30:00-04:00 171.260 171.34 170.93 171.10 61654.0
2023-10-02 09:31:00-04:00 171.130 172.37 171.13 172.30 53481.0
2023-10-02 09:32:00-04:00 172.310 172.64 172.16 172.64 44750.0
2023-10-02 09:33:00-04:00 172.640 172.97 172.54 172.78 53195.0
2023-10-02 09:34:00-04:00 172.780 173.07 172.75 173.00 47416.0
... ... ... ... ... ...
2023-10-02 15:55:00-04:00 173.300 173.51 173.26 173.51 61619.0
2023-10-02 15:56:00-04:00 173.525 173.59 173.42 173.43 45066.0
2023-10-02 15:57:00-04:00 173.430 173.55 173.42 173.50 45220.0
2023-10-02 15:58:00-04:00 173.510 173.60 173.46 173.56 47371.0
2023-10-02 15:59:00-04:00 173.560 173.78 173.56 173.75 161253.0
[390 rows x 5 columns]
PyArrow & FastParquet¶
- Data can be written to Feather using PyArrow and to Parquet using PyArrow or FastParquet. Parquet excels in write-once, read-many scenarios, offering highly efficient data compression and decompression, which makes it a great choice for storing time series data.
Save minute data to a Parquet dataset partitioned by day, and read one day
>>> data = vbt.TVData.pull(
... "AAPL",
... exchange="NASDAQ",
... timeframe="1 minute",
... tz="America/New_York"
... )
>>> data.to_parquet(partition_by="day") # (1)!
>>> day_data = vbt.ParquetData.pull("AAPL", filters=[("group", "==", "2023-10-02")])
>>> day_data.get()
Open High Low Close Volume
datetime
2023-10-02 09:30:00-04:00 171.260 171.34 170.93 171.10 61654.0
2023-10-02 09:31:00-04:00 171.130 172.37 171.13 172.30 53481.0
2023-10-02 09:32:00-04:00 172.310 172.64 172.16 172.64 44750.0
2023-10-02 09:33:00-04:00 172.640 172.97 172.54 172.78 53195.0
2023-10-02 09:34:00-04:00 172.780 173.07 172.75 173.00 47416.0
... ... ... ... ... ...
2023-10-02 15:55:00-04:00 173.300 173.51 173.26 173.51 61619.0
2023-10-02 15:56:00-04:00 173.525 173.59 173.42 173.43 45066.0
2023-10-02 15:57:00-04:00 173.430 173.55 173.42 173.50 45220.0
2023-10-02 15:58:00-04:00 173.510 173.60 173.46 173.56 47371.0
2023-10-02 15:59:00-04:00 173.560 173.78 173.56 173.75 161253.0
[390 rows x 5 columns]
- Without
partition_by
, the data will be saved to a single Parquet file. You will still be able to filter rows by any column in newer versions of PyArrow and Pandas.
Feature-oriented data¶
- The main limitation of the VBT's data class was that it could only store data in a symbol-oriented format, which means features such as OHLC had to be put into a single DataFrame beforehand. This appears a bit counterproductive as in VBT we mainly work on these features separately. For example, when we call
data.close
, VBT actually searches for "close" columns across all symbols, extracts them, and then concatenates them into another DataFrame. Thus, the data class has been redesigned to natively store feature-oriented data as well!
Create a feature-oriented data instance from various portfolio time series
>>> data = vbt.YFData.pull(["AAPL", "MSFT", "GOOG"])
>>> pf = data.run("from_random_signals", n=[10, 20, 30])
>>> pf_data = vbt.Data.from_data(
... vbt.feature_dict({
... "cash": pf.cash,
... "assets": pf.assets,
... "asset_value": pf.asset_value,
... "value": pf.value
... })
... )
>>> pf_data.get(feature="cash", symbol=(10, "AAPL"))
Date
1980-12-12 05:00:00+00:00 100.000000
1980-12-15 05:00:00+00:00 100.000000
1980-12-16 05:00:00+00:00 100.000000
1980-12-17 05:00:00+00:00 100.000000
1980-12-18 05:00:00+00:00 100.000000
...
2023-08-25 04:00:00+00:00 81193.079771
2023-08-28 04:00:00+00:00 81193.079771
2023-08-29 04:00:00+00:00 81193.079771
2023-08-30 04:00:00+00:00 81193.079771
2023-08-31 04:00:00+00:00 81193.079771
Name: (10, AAPL), Length: 10770, dtype: float64
Parallel data¶
- Data fetching and updating can be easily parallelized.
Benchmark fetching multiple symbols serially and concurrently
>>> symbols = ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"]
>>> with vbt.Timer() as timer:
... data = vbt.YFData.pull(symbols)
>>> print(timer.elapsed())
4.52 seconds
>>> with vbt.Timer() as timer:
... data = vbt.YFData.pull(symbols, execute_kwargs=dict(engine="threadpool"))
>>> print(timer.elapsed())
918.54 milliseconds
Trading View¶
- Welcome a new class specialized in pulling data from TradingView!
Pull 1-minute AAPL data
>>> data = vbt.TVData.pull(
... "NASDAQ:AAPL",
... timeframe="1 minute",
... tz="US/Eastern"
... )
>>> data.get()
Open High Low Close Volume
datetime
2022-12-05 09:30:00-05:00 147.75 148.31 147.50 148.28 37769.0
2022-12-05 09:31:00-05:00 148.28 148.67 148.28 148.49 10525.0
2022-12-05 09:32:00-05:00 148.50 148.73 148.30 148.30 4860.0
2022-12-05 09:33:00-05:00 148.25 148.73 148.25 148.64 5306.0
2022-12-05 09:34:00-05:00 148.62 148.97 148.52 148.97 5808.0
... ... ... ... ... ...
2023-01-17 15:55:00-05:00 135.80 135.91 135.80 135.86 37573.0
2023-01-17 15:56:00-05:00 135.85 135.88 135.80 135.88 18796.0
2023-01-17 15:57:00-05:00 135.88 135.93 135.85 135.91 21019.0
2023-01-17 15:58:00-05:00 135.90 135.97 135.89 135.95 20934.0
2023-01-17 15:59:00-05:00 135.94 136.00 135.84 135.94 86696.0
[11310 rows x 5 columns]
Symbol search¶
- Most data classes can retrieve the full list of symbols available at an exchange and optionally filter the list either using a globbing or regular expression pattern. This works for local data classes as well!
Get all XRP pairs listed on Binance
>>> vbt.BinanceData.list_symbols("XRP*")
{'XRPAUD',
'XRPBEARBUSD',
'XRPBEARUSDT',
'XRPBIDR',
'XRPBKRW',
'XRPBNB',
'XRPBRL',
'XRPBTC',
'XRPBULLBUSD',
'XRPBULLUSDT',
'XRPBUSD',
'XRPDOWNUSDT',
'XRPETH',
'XRPEUR',
'XRPGBP',
'XRPNGN',
'XRPPAX',
'XRPRUB',
'XRPTRY',
'XRPTUSD',
'XRPUPUSDT',
'XRPUSDC',
'XRPUSDT'}
Symbol classes¶
- Thanks to VBT taking advantage of multi-indexes in Pandas, you can associate each symbol with one to multiple classes, such as sectors. This can allow you to analyze the performance of a trading strategy relative to each class.
Compare equal-weighted portfolios for three sectors
>>> classes = vbt.symbol_dict({
... "MSFT": dict(sector="Technology"),
... "GOOGL": dict(sector="Technology"),
... "META": dict(sector="Technology"),
... "JPM": dict(sector="Finance"),
... "BAC": dict(sector="Finance"),
... "WFC": dict(sector="Finance"),
... "AMZN": dict(sector="Retail"),
... "WMT": dict(sector="Retail"),
... "BABA": dict(sector="Retail"),
... })
>>> data = vbt.YFData.pull(
... list(classes.keys()),
... classes=classes,
... missing_index="drop"
... )
>>> pf = vbt.PF.from_orders(
... data,
... size=vbt.index_dict({0: 1 / 3}), # (1)!
... size_type="targetpercent",
... group_by="sector",
... cash_sharing=True
... )
>>> pf.value.vbt.plot().show()
- There are three assets in each group - allocate 33.3% to each asset at the first bar
Runnable data¶
- Tired of figuring out which arguments are required by an indicator? Data instances can now recognize the arguments of an indicator and just any function in general, map them to the column names, and run the function by passing the required columns. You can also change the mapping, override indicator parameters, and also query indicators by their names - the data instance will search for it in all integrated indicator packages and return the first (and best) one found!
Run Stochastic RSI by data
>>> data = vbt.YFData.pull("BTC-USD")
>>> stochrsi = data.run("stochrsi")
>>> stochrsi.fastd
Date
2014-09-17 00:00:00+00:00 NaN
2014-09-18 00:00:00+00:00 NaN
2014-09-19 00:00:00+00:00 NaN
2014-09-20 00:00:00+00:00 NaN
2014-09-21 00:00:00+00:00 NaN
...
2023-01-15 00:00:00+00:00 96.168788
2023-01-16 00:00:00+00:00 91.733393
2023-01-17 00:00:00+00:00 78.295255
2023-01-18 00:00:00+00:00 48.793133
2023-01-20 00:00:00+00:00 26.242474
Name: Close, Length: 3047, dtype: float64
Data transformation¶
- You've fetched some data, how do you change it? There's a new method that puts all symbols into one single DataFrame and passes this DataFrame to a UDF for transformation.
Remove weekends
>>> data = vbt.YFData.pull(["BTC-USD", "ETH-USD"], start="2020-01-01", end="2020-01-14")
>>> new_data = data.transform(lambda df: df[~df.index.weekday.isin([5, 6])])
>>> new_data.close
symbol BTC-USD ETH-USD
Date
2020-01-01 00:00:00+00:00 7200.174316 130.802002
2020-01-02 00:00:00+00:00 6985.470215 127.410179
2020-01-03 00:00:00+00:00 7344.884277 134.171707
2020-01-06 00:00:00+00:00 7769.219238 144.304153
2020-01-07 00:00:00+00:00 8163.692383 143.543991
2020-01-08 00:00:00+00:00 8079.862793 141.258133
2020-01-09 00:00:00+00:00 7879.071289 138.979202
2020-01-10 00:00:00+00:00 8166.554199 143.963776
2020-01-13 00:00:00+00:00 8144.194336 144.226593
Synthetic OHLC¶
- There are new basic models for synthetic OHLC data generation - especially useful for leakage detection.
Generate 3 months of synthetic data using Geometric Brownian Motion
>>> data = vbt.GBMOHLCData.pull("R", start="2022-01", end="2022-04")
>>> data.plot().show()
Data saver¶
- Imagine a script that can periodically pull the latest data from an exchange and save it to disk, all without your intervention? VBT implements two classes that can do just this: one that saves to CSV and another one that saves to HDF.
BTCUSDT_1m_saver.py
from vectorbtpro import *
import logging
logging.basicConfig(level=logging.INFO)
if __name__ == "__main__":
if vbt.CSVDataSaver.file_exists():
csv_saver = vbt.CSVDataSaver.load()
csv_saver.update()
init_save = False
else:
data = vbt.BinanceData.pull(
"BTCUSDT",
start="10 minutes ago UTC",
timeframe="1 minute"
)
csv_saver = vbt.CSVDataSaver(data)
init_save = True
csv_saver.update_every(1, "minute", init_save=init_save)
csv_saver.save() # (1)!
- CSV data saver stores only the latest data update, which acts as a starting point of the next update, thus save it and re-use in the next runtime
Run in console and then interrupt
$ python BTCUSDT_1m_saver.py
2023-02-01 12:26:36.744000+00:00 - 2023-02-01 12:36:00+00:00: : 1it [00:01, 1.22s/it]
INFO:vectorbtpro.data.saver:Saved initial 10 rows from 2023-02-01 12:27:00+00:00 to 2023-02-01 12:36:00+00:00
INFO:vectorbtpro.utils.schedule_:Starting schedule manager with jobs [Every 1 minute do update(save_kwargs=None) (last run: [never], next run: 2023-02-01 13:37:38)]
INFO:vectorbtpro.data.saver:Saved 2 rows from 2023-02-01 12:36:00+00:00 to 2023-02-01 12:37:00+00:00
INFO:vectorbtpro.data.saver:Saved 2 rows from 2023-02-01 12:37:00+00:00 to 2023-02-01 12:38:00+00:00
INFO:vectorbtpro.utils.schedule_:Stopping schedule manager
Run in console again to continue
$ python BTCUSDT_1m_saver.py
INFO:vectorbtpro.utils.schedule_:Starting schedule manager with jobs [Every 1 minute do update(save_kwargs=None) (last run: [never], next run: 2023-02-01 13:42:08)]
INFO:vectorbtpro.data.saver:Saved 5 rows from 2023-02-01 12:38:00+00:00 to 2023-02-01 12:42:00+00:00
INFO:vectorbtpro.utils.schedule_:Stopping schedule manager
Polygon.io¶
- Welcome a new class specialized in pulling data from Polygon.io!
Get one month of 30-minute AAPL data from Polygon.io
>>> vbt.PolygonData.set_custom_settings(
... client_config=dict(
... api_key="YOUR_API_KEY"
... )
... )
>>> data = vbt.PolygonData.pull(
... "AAPL",
... start="2022-12-01", # (1)!
... end="2023-01-01",
... timeframe="30 minutes",
... tz="US/Eastern"
... )
>>> data.get()
Open High Low Close Volume \
Open time
2022-12-01 04:00:00-05:00 148.08 148.08 147.04 147.3700 50886.0
2022-12-01 04:30:00-05:00 147.37 147.37 147.12 147.2600 16575.0
2022-12-01 05:00:00-05:00 147.31 147.51 147.20 147.3800 20753.0
2022-12-01 05:30:00-05:00 147.43 147.56 147.38 147.3800 7388.0
2022-12-01 06:00:00-05:00 147.30 147.38 147.24 147.2400 7416.0
... ... ... ... ... ...
2022-12-30 17:30:00-05:00 129.94 130.05 129.91 129.9487 35694.0
2022-12-30 18:00:00-05:00 129.95 130.00 129.94 129.9500 15595.0
2022-12-30 18:30:00-05:00 129.94 130.05 129.94 130.0100 20287.0
2022-12-30 19:00:00-05:00 129.99 130.04 129.99 130.0000 12490.0
2022-12-30 19:30:00-05:00 130.00 130.04 129.97 129.9700 28271.0
Trade count VWAP
Open time
2022-12-01 04:00:00-05:00 1024 147.2632
2022-12-01 04:30:00-05:00 412 147.2304
2022-12-01 05:00:00-05:00 306 147.3466
2022-12-01 05:30:00-05:00 201 147.4818
2022-12-01 06:00:00-05:00 221 147.2938
... ... ...
2022-12-30 17:30:00-05:00 350 129.9672
2022-12-30 18:00:00-05:00 277 129.9572
2022-12-30 18:30:00-05:00 312 130.0034
2022-12-30 19:00:00-05:00 176 130.0140
2022-12-30 19:30:00-05:00 366 129.9941
[672 rows x 7 columns]
- In the timezone provided via
tz
Alpha Vantage¶
- Welcome a new class specialized in pulling data from Alpha Vantage!
Get Stochastic RSI of IBM from Alpha Vantage
>>> data = vbt.AVData.pull(
... "IBM",
... category="technical-indicators",
... function="STOCHRSI",
... params=dict(fastkperiod=14)
... )
>>> data.get()
FastD FastK
1999-12-07 00:00:00+00:00 100.0000 100.0000
1999-12-08 00:00:00+00:00 100.0000 100.0000
1999-12-09 00:00:00+00:00 77.0255 31.0765
1999-12-10 00:00:00+00:00 43.6922 0.0000
1999-12-13 00:00:00+00:00 12.0197 4.9826
... ... ...
2023-01-26 00:00:00+00:00 11.7960 0.0000
2023-01-27 00:00:00+00:00 3.7773 0.0000
2023-01-30 00:00:00+00:00 4.4824 13.4471
2023-01-31 00:00:00+00:00 7.8258 10.0302
2023-02-01 16:00:01+00:00 13.0966 15.8126
[5826 rows x 2 columns]
Nasdaq Data Link¶
- Welcome a new class specialized in pulling data from Nasdaq Data Link!
Get Index of Consumer Sentiment from Nasdaq Data Link
>>> data = vbt.NDLData.pull("UMICH/SOC1")
>>> data.get()
Index
Date
1952-11-30 00:00:00+00:00 86.2
1953-02-28 00:00:00+00:00 90.7
1953-08-31 00:00:00+00:00 80.8
1953-11-30 00:00:00+00:00 80.7
1954-02-28 00:00:00+00:00 82.0
... ...
2022-08-31 00:00:00+00:00 58.2
2022-09-30 00:00:00+00:00 58.6
2022-10-31 00:00:00+00:00 59.9
2022-11-30 00:00:00+00:00 56.8
2022-12-31 00:00:00+00:00 59.7
[632 rows x 1 columns]
Data merging¶
- Often, there's a need to backtest symbols that are coming from different exchanges by putting them into the same basket. For this, VBT has got a class method that can merge multiple data instances into a single one. Not only you can combine multiple symbols, but also merge datasets that correspond to a single symbol - all done automatically!
Pull BTC datasets from various exchanges and plot them relative to their mean
>>> binance_data = vbt.CCXTData.pull("BTCUSDT", exchange="binance")
>>> bybit_data = vbt.CCXTData.pull("BTCUSDT", exchange="bybit")
>>> bitfinex_data = vbt.CCXTData.pull("BTC/USDT", exchange="bitfinex")
>>> kucoin_data = vbt.CCXTData.pull("BTC-USDT", exchange="kucoin")
>>> data = vbt.Data.merge([
... binance_data.rename({"BTCUSDT": "Binance"}),
... bybit_data.rename({"BTCUSDT": "Bybit"}),
... bitfinex_data.rename({"BTC/USDT": "Bitfinex"}),
... kucoin_data.rename({"BTC-USDT": "KuCoin"}),
... ], missing_index="drop", silence_warnings=True)
>>> @njit
... def rescale_nb(x):
... return (x - x.mean()) / x.mean()
>>> rescaled_close = data.close.vbt.row_apply(rescale_nb)
>>> rescaled_close = rescaled_close.vbt.rolling_mean(30)
>>> rescaled_close.loc["2023":"2023"].vbt.plot().show()
Alpaca¶
- Welcome a new class specialized in pulling data from Alpaca!
Get one week of adjusted 1-minute AAPL data from Alpaca
>>> vbt.AlpacaData.set_custom_settings(
... client_config=dict(
... api_key="YOUR_API_KEY",
... secret_key="YOUR_API_SECRET"
... )
... )
>>> data = vbt.AlpacaData.pull(
... "AAPL",
... start="one week ago 00:00", # (1)!
... end="15 minutes ago", # (2)!
... timeframe="1 minute",
... adjustment="all",
... tz="US/Eastern"
... )
>>> data.get()
Open High Low Close Volume \
Open time
2023-01-30 04:00:00-05:00 145.5400 145.5400 144.0100 144.0200 5452.0
2023-01-30 04:01:00-05:00 144.0800 144.0800 144.0000 144.0500 3616.0
2023-01-30 04:02:00-05:00 144.0300 144.0400 144.0100 144.0100 1671.0
2023-01-30 04:03:00-05:00 144.0100 144.0300 144.0000 144.0300 4721.0
2023-01-30 04:04:00-05:00 144.0200 144.0200 144.0200 144.0200 1343.0
... ... ... ... ... ...
2023-02-03 19:54:00-05:00 154.3301 154.3301 154.3301 154.3301 347.0
2023-02-03 19:55:00-05:00 154.3300 154.3400 154.3200 154.3400 1438.0
2023-02-03 19:56:00-05:00 154.3400 154.3400 154.3300 154.3300 588.0
2023-02-03 19:58:00-05:00 154.3500 154.3500 154.3500 154.3500 555.0
2023-02-03 19:59:00-05:00 154.3400 154.3900 154.3300 154.3900 3835.0
Trade count VWAP
Open time
2023-01-30 04:00:00-05:00 165 144.376126
2023-01-30 04:01:00-05:00 81 144.036336
2023-01-30 04:02:00-05:00 52 144.035314
2023-01-30 04:03:00-05:00 56 144.012680
2023-01-30 04:04:00-05:00 40 144.021854
... ... ...
2023-02-03 19:54:00-05:00 21 154.331340
2023-02-03 19:55:00-05:00 38 154.331756
2023-02-03 19:56:00-05:00 17 154.338971
2023-02-03 19:58:00-05:00 27 154.343090
2023-02-03 19:59:00-05:00 58 154.357219
[4224 rows x 7 columns]
- In the timezone provided via
tz
- Remove if you have a paid plan
Local data¶
Once remote data has been fetched, you most likely want to persist it on disk. There are two new options for this: either serialize the entire data class, or save the actual data to CSV or HDF5. Each dataset can be stored in a single flat file, which makes it easier to work with than a database. Upon saving, the data can be effortlessly loaded back either by deserializing, or by using data classes that specialize in loading data from CSV and HDF5 files. These classes support a variety of features, including filtering by row and datetime ranges, updating, chunking, and even a smart dataset search that can traverse sub-directories recursively and return datasets that match a specific glob pattern or regular expression
Fetch and save symbols separately, then load them back jointly
>>> btc_data = vbt.BinanceData.pull("BTCUSDT")
>>> eth_data = vbt.BinanceData.pull("ETHUSDT")
>>> btc_data.to_hdf()
>>> eth_data.to_hdf()
>>> data = vbt.BinanceData.from_hdf(start="2020", end="2021")
>>> data.close
symbol BTCUSDT ETHUSDT
Open time
2020-01-01 00:00:00+00:00 7200.85 130.77
2020-01-02 00:00:00+00:00 6965.71 127.19
2020-01-03 00:00:00+00:00 7344.96 134.35
2020-01-04 00:00:00+00:00 7354.11 134.20
2020-01-05 00:00:00+00:00 7358.75 135.37
... ... ...
2020-12-27 00:00:00+00:00 26281.66 685.11
2020-12-28 00:00:00+00:00 27079.41 730.41
2020-12-29 00:00:00+00:00 27385.00 732.00
2020-12-30 00:00:00+00:00 28875.54 752.17
2020-12-31 00:00:00+00:00 28923.63 736.42
[366 rows x 2 columns]