Optimization¶
Purged CV¶
- Added support for walk-forward cross-validation (CV) with purging and combinatorial CV with purging and embargoing, as inspired by the Marcos Lopez de Prado's Advances in Financial Machine Learning.
Create and plot a combinatorial splitter with purging and embargoing
>>> splitter = vbt.Splitter.from_purged_kfold(
... vbt.date_range("2024", "2025"),
... n_folds=10,
... n_test_folds=2,
... purge_td="3 days",
... embargo_td="3 days"
... )
>>> splitter.plots().show()
Paramables¶
- Each analyzable VBT object (such as data, indicator, or portfolio) can now be split into items - multiple objects of the same type with only one column/group. This enables using VBT objects as standalone parameters and processing only a subset of information, such as symbol in a data instance or parameter combination in an indicator, at a time.
Combine outputs of a SMA indicator combinatorially
>>> @vbt.parameterized(merge_func="column_stack")
... def get_signals(fast_sma, slow_sma): # (1)!
... entries = fast_sma.crossed_above(slow_sma)
... exits = fast_sma.crossed_below(slow_sma)
... return entries, exits
>>> data = vbt.YFData.pull(["BTC-USD", "ETH-USD"])
>>> sma = data.run("talib:sma", timeperiod=range(20, 50, 2)) # (2)!
>>> fast_sma = sma.rename_levels({"sma_timeperiod": "fast"}) # (3)!
>>> slow_sma = sma.rename_levels({"sma_timeperiod": "slow"})
>>> entries, exits = get_signals(
... vbt.Param(fast_sma, condition="__fast__ < __slow__"), # (4)!
... vbt.Param(slow_sma)
... )
>>> entries.columns
MultiIndex([(20, 22, 'BTC-USD'),
(20, 22, 'ETH-USD'),
(20, 24, 'BTC-USD'),
(20, 24, 'ETH-USD'),
(20, 26, 'BTC-USD'),
(20, 26, 'ETH-USD'),
...
(44, 46, 'BTC-USD'),
(44, 46, 'ETH-USD'),
(44, 48, 'BTC-USD'),
(44, 48, 'ETH-USD'),
(46, 48, 'BTC-USD'),
(46, 48, 'ETH-USD')],
names=['fast', 'slow', 'symbol'], length=210)
- Regular function that takes two indicators and returns signals
- Run a SMA indicator once on all time periods
- Copy the indicator and rename the parameter level to get a distinct indicator instance
- Pass both indicator instances as parameters. This will split each instance into smaller instances with only one column. Also, remove all columns where fast window >= slow window.
Lazy parameter grids¶
- Parameterized decorator no longer necessarily materializes parameter grids if you're only interested in a subset of all parameter combinations. Consequently, this enables the generation of random parameter combinations to be almost instant, regardless of the total number of all possible parameter combinations.
Test a random subset of a huge number of parameter combinations
>>> @vbt.parameterized(merge_func="concat")
... def test_combination(data, n, sl_stop, tsl_stop, tp_stop):
... return data.run(
... "from_random_signals",
... n=n,
... sl_stop=sl_stop,
... tsl_stop=tsl_stop,
... tp_stop=tp_stop,
... ).total_return
>>> n = np.arange(10, 100)
>>> sl_stop = np.arange(1, 1000) / 1000
>>> tsl_stop = np.arange(1, 1000) / 1000
>>> tp_stop = np.arange(1, 1000) / 1000
>>> len(n) * len(sl_stop) * len(tsl_stop) * len(tp_stop)
89730269910
>>> test_combination(
... vbt.YFData.pull("BTC-USD"),
... n=vbt.Param(n),
... sl_stop=vbt.Param(sl_stop),
... tsl_stop=vbt.Param(tsl_stop),
... tp_stop=vbt.Param(tp_stop),
... _random_subset=10
... )
n sl_stop tsl_stop tp_stop
34 0.188 0.916 0.749 6.869901
44 0.176 0.734 0.550 6.186478
50 0.421 0.245 0.253 0.540188
51 0.033 0.951 0.344 6.514647
0.915 0.461 0.322 2.915987
73 0.057 0.690 0.008 -0.204080
74 0.368 0.360 0.935 14.207262
76 0.771 0.342 0.187 -0.278499
83 0.796 0.788 0.730 6.450076
96 0.873 0.429 0.815 18.670965
dtype: float64
Mono-chunks¶
- Parameterized decorator has been extended to split parameter combinations into so-called "mono-chunks", merge the parameter values within each chunk into a single value, and execute the entire chunk with a single function call. This way, you are not limited by only one parameter combination being processed at a time anymore Just note: you must adapt the function to take multiple parameter values and change the merging function as appropriate.
Test 100 combinations of SL and TP values per thread
>>> @vbt.parameterized(
... merge_func="concat",
... mono_chunk_len=100, # (1)!
... chunk_len="auto", # (2)!
... engine="threadpool", # (3)!
... warmup=True # (4)!
... )
... @njit(nogil=True)
... def test_stops_nb(close, entries, exits, sl_stop, tp_stop):
... sim_out = vbt.pf_nb.from_signals_nb(
... target_shape=(close.shape[0], sl_stop.shape[1]),
... group_lens=np.full(sl_stop.shape[1], 1),
... close=close,
... long_entries=entries,
... short_entries=exits,
... sl_stop=sl_stop,
... tp_stop=tp_stop,
... save_returns=True
... )
... return vbt.ret_nb.total_return_nb(sim_out.in_outputs.returns)
>>> data = vbt.YFData.pull("BTC-USD", start="2020") # (5)!
>>> entries, exits = data.run("randnx", n=10, hide_params=True, unpack=True) # (6)!
>>> sharpe_ratios = test_stops_nb(
... vbt.to_2d_array(data.close),
... vbt.to_2d_array(entries),
... vbt.to_2d_array(exits),
... sl_stop=vbt.Param(np.arange(0.01, 1.0, 0.01), mono_merge_func=np.column_stack), # (7)!
... tp_stop=vbt.Param(np.arange(0.01, 1.0, 0.01), mono_merge_func=np.column_stack)
... )
>>> sharpe_ratios.vbt.heatmap().show()
- 100 values will make up one array (= 1 mono-chunk)
- Execute N mono-chunks in parallel, where N is the number of cores
- Use multithreading
- Execute one mono-chunk to compile the function before distribution
- The function above works with only one symbol
- Pick 10 entries and exits randomly
- For each mono-chunk, stack all values into a two-dimensional array
CV decorator¶
- Most cross-validation tasks involve testing a grid of parameter combinations on the training data, selecting the best parameter combination, and validating it on the test data. This procedure needs to be repeated on each split. The cross-validation decorator combines the parameterized and split decorators to automate such a task.
Tutorial
Learn more in the Cross-validation tutorial.
Cross-validate a SMA crossover using random search
>>> @vbt.cv_split(
... splitter="from_rolling",
... splitter_kwargs=dict(length=365, split=0.5, set_labels=["train", "test"]),
... takeable_args=["data"],
... parameterized_kwargs=dict(random_subset=100),
... merge_func="concat"
... )
... def sma_crossover_cv(data, fast_period, slow_period, metric):
... fast_sma = data.run("sma", fast_period, hide_params=True)
... slow_sma = data.run("sma", slow_period, hide_params=True)
... entries = fast_sma.real_crossed_above(slow_sma)
... exits = fast_sma.real_crossed_below(slow_sma)
... pf = vbt.PF.from_signals(data, entries, exits, direction="both")
... return pf.deep_getattr(metric)
>>> sma_crossover_cv(
... vbt.YFData.pull("BTC-USD", start="4 years ago"),
... vbt.Param(np.arange(20, 50), condition="x < slow_period"),
... vbt.Param(np.arange(20, 50)),
... "trades.expectancy"
... )
split set fast_period slow_period
0 train 20 25 8.015725
test 20 23 0.573465
1 train 40 48 -4.356317
test 39 40 5.666271
2 train 24 45 18.253340
test 22 36 111.202831
3 train 20 31 54.626024
test 20 25 -1.596945
4 train 25 48 41.328588
test 25 30 6.620254
5 train 26 32 7.178085
test 24 29 4.087456
6 train 22 23 -0.581255
test 22 31 -2.494519
dtype: float64
Split decorator¶
- Normally, to run a function on each split, you need to build a splitter specifically targeted at the input data passed to the function. That is, each time the input data changes, you need to rebuild the splitter. This process is automated by the split decorator, which wraps a function and thus gets access to all the arguments the function receives to do various splitting decisions. Basically, it can "infect" any Python function with splitting functionality
Tutorial
Learn more in the Cross-validation tutorial.
Get total return from holding in each quarter
>>> @vbt.split(
... splitter="from_grouper",
... splitter_kwargs=dict(by="Q"),
... takeable_args=["data"],
... merge_func="concat"
... )
... def get_quarter_return(data):
... return data.returns.vbt.returns.total()
>>> data = vbt.YFData.pull("BTC-USD")
>>> get_quarter_return(data.loc["2021"])
Date
2021Q1 1.005805
2021Q2 -0.407050
2021Q3 0.304383
2021Q4 -0.037627
Freq: Q-DEC, dtype: float64
>>> get_quarter_return(data.loc["2022"])
Date
2022Q1 -0.045047
2022Q2 -0.572515
2022Q3 0.008429
2022Q4 -0.143154
Freq: Q-DEC, dtype: float64
Conditional parameters¶
- Parameters can depend on each other. For instance, when testing a crossover of moving averages, it makes no sense to test a fast window that has a bigger length than the slow window. By filtering such cases out, you only need to evaluate half as many parameter combinations.
Test slow windows being longer than fast windows by at least 5
>>> @vbt.parameterized(merge_func="column_stack")
... def ma_crossover_signals(data, fast_window, slow_window):
... fast_sma = data.run("sma", fast_window, short_name="fast_sma")
... slow_sma = data.run("sma", slow_window, short_name="slow_sma")
... entries = fast_sma.real_crossed_above(slow_sma.real)
... exits = fast_sma.real_crossed_below(slow_sma.real)
... return entries, exits
>>> entries, exits = ma_crossover_signals(
... vbt.YFData.pull("BTC-USD", start="one year ago UTC"),
... vbt.Param(np.arange(5, 50), condition="slow_window - fast_window >= 5"),
... vbt.Param(np.arange(5, 50))
... )
>>> entries.columns
MultiIndex([( 5, 10),
( 5, 11),
( 5, 12),
( 5, 13),
( 5, 14),
...
(42, 48),
(42, 49),
(43, 48),
(43, 49),
(44, 49)],
names=['fast_window', 'slow_window'], length=820)
Splitter¶
- Splitters in scikit-learn are a poor fit for validating ML-based and rule-based trading strategies; VBT has a juggernaut class that supports various splitting schemes safe for backtesting, including rolling windows, expanding windows, time-anchored windows, random windows for block bootstraps, and even Pandas-native
groupby
andresample
instructions such as "M" for monthly frequency. As the cherry on the cake, the produced splits can be easily analyzed and visualized too! For example, you can detect any split or set overlaps, convert all the splits into a single boolean mask for custom analysis, group splits and sets, and analyze their distribution relative to each other. The class has more lines of code than the entire backtesting.py package, don't underestimate the new king in town!
Tutorial
Learn more in the Cross-validation tutorial.
Roll a 360-day window and split it equally into train and test sets
>>> data = vbt.YFData.pull("BTC-USD", start="4 years ago")
>>> splitter = vbt.Splitter.from_rolling(
... data.index,
... length="360 days",
... split=0.5,
... set_labels=["train", "test"],
... freq="daily"
... )
>>> splitter.plots().show()
Random search¶
- While grid search looks at every possible combination of hyperparameters, random search only selects and tests a random combination of hyperparameters. This is especially useful when the number of parameter combinations is huge. Also, random search has shown to find equal or better values than grid search within fewer function evaluations. The indicator factory, parameterized decorator, and any method that does broadcasting now supports random search out of the box.
Test a random subset of SL, TSL, and TP combinations
>>> data = vbt.YFData.pull("BTC-USD", start="2020")
>>> stop_values = np.arange(1, 100) / 100 # (1)!
>>> pf = vbt.PF.from_random_signals(
... data,
... n=100,
... sl_stop=vbt.Param(stop_values),
... tsl_stop=vbt.Param(stop_values),
... tp_stop=vbt.Param(stop_values),
... broadcast_kwargs=dict(random_subset=1000) # (2)!
... )
>>> pf.total_return.sort_values(ascending=False)
sl_stop tsl_stop tp_stop
0.06 0.85 0.43 2.291260
0.74 0.40 2.222212
0.97 0.22 2.149849
0.40 0.10 0.23 2.082935
0.47 0.09 0.25 2.030105
...
0.51 0.36 0.01 -0.618805
0.53 0.37 0.01 -0.624761
0.35 0.60 0.02 -0.662992
0.29 0.13 0.02 -0.671376
0.46 0.72 0.02 -0.720024
Name: total_return, Length: 1000, dtype: float64
- 100 combinations of each parameter = 100 ^ 3 = 1,000,000 combinations
- Indicator factory and parameterized decorator take this argument directly
Parameterized decorator¶
- There is a special decorator that can make any Python function accept multiple parameter combinations, even if the function itself can handle only one! The decorator wraps the function, thus getting access to its arguments; it then identifies all the arguments that act as parameters, builds a grid of them, and calls the underlying function on each parameter combination from that grid. The execution part can be easily parallelized. After all the outputs are ready, it merges them into a single object. Use cases are endless: from running indicators that cannot be wrapped with the indicator factory, to parameterizing entire pipelines!
Example 1: Parameterize a simple SMA indicator without Indicator Factory
>>> @vbt.parameterized(merge_func="column_stack") # (1)!
... def sma(close, window):
... return close.rolling(window).mean()
>>> data = vbt.YFData.pull("BTC-USD")
>>> sma(data.close, vbt.Param(range(20, 50)))
- Use
column_stack
for merging time series in form of DataFrames and complex vectorbt objects such as portfolios
window 20 21 22 \
Date
2014-09-17 00:00:00+00:00 NaN NaN NaN
2014-09-18 00:00:00+00:00 NaN NaN NaN
2014-09-19 00:00:00+00:00 NaN NaN NaN
... ... ... ...
2024-03-07 00:00:00+00:00 57657.135156 57395.376488 57147.339134
2024-03-08 00:00:00+00:00 58488.990039 58163.942708 57891.045455
2024-03-09 00:00:00+00:00 59297.836523 58956.156064 58624.648793
...
window 48 49
Date
2014-09-17 00:00:00+00:00 NaN NaN
2014-09-18 00:00:00+00:00 NaN NaN
2014-09-19 00:00:00+00:00 NaN NaN
... ... ...
2024-03-07 00:00:00+00:00 49928.186686 49758.599330
2024-03-08 00:00:00+00:00 50483.072266 50303.123565
2024-03-09 00:00:00+00:00 51040.440837 50846.672353
[3462 rows x 30 columns]
Example 2: Parameterize an entire Bollinger Bands pipeline
>>> @vbt.parameterized(merge_func="concat") # (1)!
... def bbands_sharpe(data, timeperiod=14, nbdevup=2, nbdevdn=2, thup=0.3, thdn=0.1):
... bb = data.run(
... "talib_bbands",
... timeperiod=timeperiod,
... nbdevup=nbdevup,
... nbdevdn=nbdevdn
... )
... bandwidth = (bb.upperband - bb.lowerband) / bb.middleband
... cond1 = data.low < bb.lowerband
... cond2 = bandwidth > thup
... cond3 = data.high > bb.upperband
... cond4 = bandwidth < thdn
... entries = (cond1 & cond2) | (cond3 & cond4)
... exits = (cond1 & cond4) | (cond3 & cond2)
... pf = vbt.PF.from_signals(data, entries, exits)
... return pf.sharpe_ratio
>>> bbands_sharpe(
... vbt.YFData.pull("BTC-USD"),
... nbdevup=vbt.Param([1, 2]), # (2)!
... nbdevdn=vbt.Param([1, 2]),
... thup=vbt.Param([0.4, 0.5]),
... thdn=vbt.Param([0.1, 0.2])
... )
- Use
concat
for merging metrics in form of scalars and Series - Build the Cartesian product of 4 parameters
nbdevup nbdevdn thup thdn
1 1 0.4 0.1 1.681532
0.2 1.617400
0.5 0.1 1.424175
0.2 1.563520
2 0.4 0.1 1.218554
0.2 1.520852
0.5 0.1 1.242523
0.2 1.317883
2 1 0.4 0.1 1.174562
0.2 1.469828
0.5 0.1 1.427940
0.2 1.460635
2 0.4 0.1 1.000210
0.2 1.378108
0.5 0.1 1.196087
0.2 1.782502
dtype: float64
Riskfolio-Lib¶
- Riskfolio-Lib is another increasingly popular library for portfolio optimization that has been integrated into VBT. The integration was done by automating typical workflows inside Riskfolio-Lib and putting them into a single function, such that many portfolio optimization problems can be expressed using a single set of keyword arguments and thus parameterized easily.
Tutorial
Learn more in the Portfolio optimization tutorial.
Run Nested Clustered Optimization (NCO) on a monthly basis
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_riskfolio(
... returns=data.close.vbt.to_returns(),
... port_cls="hc",
... every="M"
... )
>>> pfo.plot().show()
Array-like parameters¶
- Broadcasting mechanism has been completely refactored and now supports parameters. Many parameters in VBT, such as SL and TP, are array-like and can be provided per row, column, and even element. Internally, even a scalar is treated like a regular time series and broadcasted along other proper time series. Thus, to test multiple parameter combinations, one had to tile other time series such that all shapes perfectly match. With this feature, the tiling procedure is performed automatically!
Write a steep slope indicator without indicator factory
>>> def steep_slope(close, up_th):
... r = vbt.broadcast(dict(close=close, up_th=up_th))
... return r["close"].pct_change() >= r["up_th"]
>>> data = vbt.YFData.pull("BTC-USD", start="2020", end="2022")
>>> fig = data.plot(plot_volume=False)
>>> sma = vbt.talib("SMA").run(data.close, timeperiod=50).real
>>> sma.rename("SMA").vbt.plot(fig=fig)
>>> mask = steep_slope(sma, vbt.Param([0.005, 0.01, 0.015])) # (1)!
>>> def plot_mask_ranges(column, color):
... mask.vbt.ranges.plot_shapes(
... column=column,
... plot_close=False,
... shape_kwargs=dict(fillcolor=color),
... fig=fig
... )
>>> plot_mask_ranges(0.005, "orangered")
>>> plot_mask_ranges(0.010, "orange")
>>> plot_mask_ranges(0.015, "yellow")
>>> fig.update_xaxes(showgrid=False)
>>> fig.update_yaxes(showgrid=False)
>>> fig.show()
- Test three parameters and generate a mask with three columns - one per parameter
Parameters¶
- There is a new module addition for working with parameters.
Generate 10,000 random parameter combinations for MACD
>>> from itertools import combinations
>>> window_space = np.arange(100)
>>> fastk_windows, slowk_windows = list(zip(*combinations(window_space, 2))) # (1)!
>>> window_type_space = list(vbt.enums.WType)
>>> param_product = vbt.combine_params(
... dict(
... fast_window=vbt.Param(fastk_windows, level=0), # (2)!
... slow_window=vbt.Param(slowk_windows, level=0),
... signal_window=vbt.Param(window_space, level=1),
... macd_wtype=vbt.Param(window_type_space, level=2), # (3)!
... signal_wtype=vbt.Param(window_type_space, level=2),
... ),
... random_subset=10_000,
... build_index=False
... )
>>> pd.DataFrame(param_product)
fast_window slow_window signal_window macd_wtype signal_wtype
0 0 1 47 3 3
1 0 2 21 2 2
2 0 2 33 1 1
3 0 2 42 1 1
4 0 3 52 1 1
... ... ... ... ... ...
9995 97 99 19 1 1
9996 97 99 92 4 4
9997 98 99 2 2 2
9998 98 99 12 1 1
9999 98 99 81 2 2
[10000 rows x 5 columns]
- Fast windows should be shorter than slow windows
- We already combined fast and slow windows, thus make them share the same product level
- We don't want to combine window types, thus make them share the same product level
Portfolio optimization¶
- Portfolio optimization is the process of creating a portfolio of assets, for which your investment has the maximum return and minimum risk. Usually, this process is performed periodically and involves generating new weights to rebalance an existing portfolio. As most things in VBT, the weight generation step is implemented as a callback by the user while the optimizer calls that callback periodically. The final result is a collection of the returned weight allocations that can be analyzed, visualized, and used in an actual simulation
Tutorial
Learn more in the Portfolio optimization tutorial.
Allocate assets inversely to their total return in the last month
>>> def regime_change_optimize_func(data):
... returns = data.returns
... total_return = returns.vbt.returns.total()
... weights = data.symbol_wrapper.fill_reduced(0)
... pos_mask = total_return > 0
... if pos_mask.any():
... weights[pos_mask] = total_return[pos_mask] / total_return.abs().sum()
... neg_mask = total_return < 0
... if neg_mask.any():
... weights[neg_mask] = total_return[neg_mask] / total_return.abs().sum()
... return -1 * weights
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_optimize_func(
... data.symbol_wrapper,
... regime_change_optimize_func,
... vbt.RepEval("data[index_slice]", context=dict(data=data)),
... every="M"
... )
>>> pfo.plot().show()
PyPortfolioOpt¶
- PyPortfolioOpt is a popular financial portfolio optimization package that includes both classical methods (Markowitz 1952 and Black-Litterman), suggested best practices (e.g covariance shrinkage), along with many recent developments and novel features, like L2 regularisation, shrunk covariance, and hierarchical risk parity.
Tutorial
Learn more in the Portfolio optimization tutorial.
Run Nested Clustered Optimization (NCO) on a monthly basis
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_pypfopt(
... returns=data.returns,
... optimizer="hrp",
... target="optimize",
... every="M"
... )
>>> pfo.plot().show()
Universal Portfolios¶
- Universal Portfolios is a package putting together different Online Portfolio Selection (OLPS) algorithms.
Tutorial
Learn more in the Portfolio optimization tutorial.
Simulate an online minumum-variance portfolio on a weekly time frame
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_universal_algo(
... "MPT",
... data.resample("W").close,
... window=52,
... min_history=4,
... mu_estimator='historical',
... cov_estimator='empirical',
... method='mpt',
... q=0
... )
>>> pfo.plot().show()