Optimization¶
Purged CV¶
- Added support for walk-forward cross-validation (CV) with purging, as well as combinatorial CV with purging and embargoing, based on Marcos Lopez de Prado's Advances in Financial Machine Learning.
Create and plot a combinatorial splitter with purging and embargoing
>>> splitter = vbt.Splitter.from_purged_kfold(
... vbt.date_range("2024", "2025"),
... n_folds=10,
... n_test_folds=2,
... purge_td="3 days",
... embargo_td="3 days"
... )
>>> splitter.plots().show()
Paramables¶
- Each analyzable VBT object (such as data, indicator, or portfolio) can now be split into items, which are multiple objects of the same type, each containing only one column or group. This makes it possible to use VBT objects as standalone parameters and process only a subset of information at a time, such as a symbol in a data instance or a parameter combination in an indicator.
Combine outputs of a SMA indicator combinatorially
>>> @vbt.parameterized(merge_func="column_stack")
... def get_signals(fast_sma, slow_sma): # (1)!
... entries = fast_sma.crossed_above(slow_sma)
... exits = fast_sma.crossed_below(slow_sma)
... return entries, exits
>>> data = vbt.YFData.pull(["BTC-USD", "ETH-USD"])
>>> sma = data.run("talib:sma", timeperiod=range(20, 50, 2)) # (2)!
>>> fast_sma = sma.rename_levels({"sma_timeperiod": "fast"}) # (3)!
>>> slow_sma = sma.rename_levels({"sma_timeperiod": "slow"})
>>> entries, exits = get_signals(
... vbt.Param(fast_sma, condition="__fast__ < __slow__"), # (4)!
... vbt.Param(slow_sma)
... )
>>> entries.columns
MultiIndex([(20, 22, 'BTC-USD'),
(20, 22, 'ETH-USD'),
(20, 24, 'BTC-USD'),
(20, 24, 'ETH-USD'),
(20, 26, 'BTC-USD'),
(20, 26, 'ETH-USD'),
...
(44, 46, 'BTC-USD'),
(44, 46, 'ETH-USD'),
(44, 48, 'BTC-USD'),
(44, 48, 'ETH-USD'),
(46, 48, 'BTC-USD'),
(46, 48, 'ETH-USD')],
names=['fast', 'slow', 'symbol'], length=210)
- Regular function that takes two indicators and returns signals.
- Run an SMA indicator once on all time periods.
- Copy the indicator and rename the parameter level to get a distinct indicator instance.
- Pass both indicator instances as parameters. This splits each instance into smaller instances with only one column. Also, remove all columns where the fast window is greater than or equal to the slow window.
Lazy parameter grids¶
- The parameterized decorator no longer needs to materialize parameter grids if you are only interested in a subset of all parameter combinations. This change enables the generation of random parameter combinations almost instantly, no matter how large the total number of possible combinations is.
Test a random subset of a huge number of parameter combinations
>>> @vbt.parameterized(merge_func="concat")
... def test_combination(data, n, sl_stop, tsl_stop, tp_stop):
... return data.run(
... "from_random_signals",
... n=n,
... sl_stop=sl_stop,
... tsl_stop=tsl_stop,
... tp_stop=tp_stop,
... ).total_return
>>> n = np.arange(10, 100)
>>> sl_stop = np.arange(1, 1000) / 1000
>>> tsl_stop = np.arange(1, 1000) / 1000
>>> tp_stop = np.arange(1, 1000) / 1000
>>> len(n) * len(sl_stop) * len(tsl_stop) * len(tp_stop)
89730269910
>>> test_combination(
... vbt.YFData.pull("BTC-USD"),
... n=vbt.Param(n),
... sl_stop=vbt.Param(sl_stop),
... tsl_stop=vbt.Param(tsl_stop),
... tp_stop=vbt.Param(tp_stop),
... _random_subset=10
... )
n sl_stop tsl_stop tp_stop
34 0.188 0.916 0.749 6.869901
44 0.176 0.734 0.550 6.186478
50 0.421 0.245 0.253 0.540188
51 0.033 0.951 0.344 6.514647
0.915 0.461 0.322 2.915987
73 0.057 0.690 0.008 -0.204080
74 0.368 0.360 0.935 14.207262
76 0.771 0.342 0.187 -0.278499
83 0.796 0.788 0.730 6.450076
96 0.873 0.429 0.815 18.670965
dtype: float64
Mono-chunks¶
- The parameterized decorator now supports splitting parameter combinations into "mono-chunks," merging the parameter values within each chunk into a single value, and running the entire chunk with a single function call. This means you are no longer limited to processing only one parameter combination at a time
Keep in mind that your function must be adapted to handle multiple parameter values, and you should modify the merging function as needed.
Test 100 combinations of SL and TP values per thread
>>> @vbt.parameterized(
... merge_func="concat",
... mono_chunk_len=100, # (1)!
... chunk_len="auto", # (2)!
... engine="threadpool", # (3)!
... warmup=True # (4)!
... )
... @njit(nogil=True)
... def test_stops_nb(close, entries, exits, sl_stop, tp_stop):
... sim_out = vbt.pf_nb.from_signals_nb(
... target_shape=(close.shape[0], sl_stop.shape[1]),
... group_lens=np.full(sl_stop.shape[1], 1),
... close=close,
... long_entries=entries,
... short_entries=exits,
... sl_stop=sl_stop,
... tp_stop=tp_stop,
... save_returns=True
... )
... return vbt.ret_nb.total_return_nb(sim_out.in_outputs.returns)
>>> data = vbt.YFData.pull("BTC-USD", start="2020") # (5)!
>>> entries, exits = data.run("randnx", n=10, hide_params=True, unpack=True) # (6)!
>>> sharpe_ratios = test_stops_nb(
... vbt.to_2d_array(data.close),
... vbt.to_2d_array(entries),
... vbt.to_2d_array(exits),
... sl_stop=vbt.Param(np.arange(0.01, 1.0, 0.01), mono_merge_func=np.column_stack), # (7)!
... tp_stop=vbt.Param(np.arange(0.01, 1.0, 0.01), mono_merge_func=np.column_stack)
... )
>>> sharpe_ratios.vbt.heatmap().show()
- 100 values are combined into one array, forming a single mono-chunk.
- Execute N mono-chunks in parallel, where N is the number of cores.
- Use multithreading.
- Execute one mono-chunk to compile the function before distributing other chunks.
- The function above operates with only one symbol.
- Pick 10 entries and exits randomly.
- For each mono-chunk, stack all values into a two-dimensional array.
CV decorator¶
- Most cross-validation tasks involve testing a grid of parameter combinations on the training data, selecting the best parameter combination, and validating it on the test data. This process must be repeated for each split. The cross-validation decorator combines the parameterized and split decorators to automate this task.
Tutorial
Learn more in the Cross-validation tutorial.
Cross-validate a SMA crossover using random search
>>> @vbt.cv_split(
... splitter="from_rolling",
... splitter_kwargs=dict(length=365, split=0.5, set_labels=["train", "test"]),
... takeable_args=["data"],
... parameterized_kwargs=dict(random_subset=100),
... merge_func="concat"
... )
... def sma_crossover_cv(data, fast_period, slow_period, metric):
... fast_sma = data.run("sma", fast_period, hide_params=True)
... slow_sma = data.run("sma", slow_period, hide_params=True)
... entries = fast_sma.real_crossed_above(slow_sma)
... exits = fast_sma.real_crossed_below(slow_sma)
... pf = vbt.PF.from_signals(data, entries, exits, direction="both")
... return pf.deep_getattr(metric)
>>> sma_crossover_cv(
... vbt.YFData.pull("BTC-USD", start="4 years ago"),
... vbt.Param(np.arange(20, 50), condition="x < slow_period"),
... vbt.Param(np.arange(20, 50)),
... "trades.expectancy"
... )
split set fast_period slow_period
0 train 20 25 8.015725
test 20 23 0.573465
1 train 40 48 -4.356317
test 39 40 5.666271
2 train 24 45 18.253340
test 22 36 111.202831
3 train 20 31 54.626024
test 20 25 -1.596945
4 train 25 48 41.328588
test 25 30 6.620254
5 train 26 32 7.178085
test 24 29 4.087456
6 train 22 23 -0.581255
test 22 31 -2.494519
dtype: float64
Split decorator¶
- Normally, to run a function on each split, you need to build a splitter specifically targeted at the input data provided to the function. This means that each time the input data changes, you must recreate the splitter. The split decorator automates this process by wrapping the function, giving it access to all arguments so it can make splitting decisions as needed. Essentially, it can "infect" any Python function with splitting functionality
Tutorial
Learn more in the Cross-validation tutorial.
Get total return from holding in each quarter
>>> @vbt.split(
... splitter="from_grouper",
... splitter_kwargs=dict(by="Q"),
... takeable_args=["data"],
... merge_func="concat"
... )
... def get_quarter_return(data):
... return data.returns.vbt.returns.total()
>>> data = vbt.YFData.pull("BTC-USD")
>>> get_quarter_return(data.loc["2021"])
Date
2021Q1 1.005805
2021Q2 -0.407050
2021Q3 0.304383
2021Q4 -0.037627
Freq: Q-DEC, dtype: float64
>>> get_quarter_return(data.loc["2022"])
Date
2022Q1 -0.045047
2022Q2 -0.572515
2022Q3 0.008429
2022Q4 -0.143154
Freq: Q-DEC, dtype: float64
Conditional parameters¶
- Parameters can depend on each other. For example, when testing a crossover of moving averages, it makes no sense to test a fast window that is longer than the slow window. By filtering out such cases, you only need to evaluate about half as many parameter combinations.
Test slow windows being longer than fast windows by at least 5
>>> @vbt.parameterized(merge_func="column_stack")
... def ma_crossover_signals(data, fast_window, slow_window):
... fast_sma = data.run("sma", fast_window, short_name="fast_sma")
... slow_sma = data.run("sma", slow_window, short_name="slow_sma")
... entries = fast_sma.real_crossed_above(slow_sma.real)
... exits = fast_sma.real_crossed_below(slow_sma.real)
... return entries, exits
>>> entries, exits = ma_crossover_signals(
... vbt.YFData.pull("BTC-USD", start="one year ago UTC"),
... vbt.Param(np.arange(5, 50), condition="slow_window - fast_window >= 5"),
... vbt.Param(np.arange(5, 50))
... )
>>> entries.columns
MultiIndex([( 5, 10),
( 5, 11),
( 5, 12),
( 5, 13),
( 5, 14),
...
(42, 48),
(42, 49),
(43, 48),
(43, 49),
(44, 49)],
names=['fast_window', 'slow_window'], length=820)
Splitter¶
- Splitters in scikit-learn are not ideal for validating ML-based and rule-based trading strategies. VBT provides a juggernaut class that supports many splitting schemes that are safe for backtesting, including rolling windows, expanding windows, time-anchored windows, random windows for block bootstraps, and even Pandas-native
groupby
andresample
instructions such as "M" for monthly frequency. As a bonus, the produced splits can be easily analyzed and visualized! For example, you can detect any split or set overlaps, convert all splits into a single boolean mask for custom analysis, group splits and sets, and analyze their distribution relative to each other. This class contains more lines of code than the entire backtesting.py package, so do not underestimate the new king in town!
Tutorial
Learn more in the Cross-validation tutorial.
Roll a 360-day window and split it equally into train and test sets
>>> data = vbt.YFData.pull("BTC-USD", start="4 years ago")
>>> splitter = vbt.Splitter.from_rolling(
... data.index,
... length="360 days",
... split=0.5,
... set_labels=["train", "test"],
... freq="daily"
... )
>>> splitter.plots().show()
Random search¶
- While grid search tests every possible combination of hyperparameters, random search selects and tests random combinations of hyperparameters. This is especially useful when there is a huge number of parameter combinations. Random search has also been shown to find equal or better values than grid search with fewer function evaluations. The indicator factory, parameterized decorator, and any method that performs broadcasting now support random search out of the box.
Test a random subset of SL, TSL, and TP combinations
>>> data = vbt.YFData.pull("BTC-USD", start="2020")
>>> stop_values = np.arange(1, 100) / 100 # (1)!
>>> pf = vbt.PF.from_random_signals(
... data,
... n=100,
... sl_stop=vbt.Param(stop_values),
... tsl_stop=vbt.Param(stop_values),
... tp_stop=vbt.Param(stop_values),
... broadcast_kwargs=dict(random_subset=1000) # (2)!
... )
>>> pf.total_return.sort_values(ascending=False)
sl_stop tsl_stop tp_stop
0.06 0.85 0.43 2.291260
0.74 0.40 2.222212
0.97 0.22 2.149849
0.40 0.10 0.23 2.082935
0.47 0.09 0.25 2.030105
...
0.51 0.36 0.01 -0.618805
0.53 0.37 0.01 -0.624761
0.35 0.60 0.02 -0.662992
0.29 0.13 0.02 -0.671376
0.46 0.72 0.02 -0.720024
Name: total_return, Length: 1000, dtype: float64
- 100 combinations of each parameter = 100 ^ 3 = 1,000,000 combinations.
- The indicator factory and parameterized decorator accept this argument directly.
Parameterized decorator¶
- There is a special decorator that allows any Python function to accept multiple parameter combinations, even if the function itself supports only one. The decorator wraps the function, gains access to its arguments, identifies all arguments acting as parameters, builds a grid from them, and calls the underlying function on each parameter combination from that grid. The execution can be easily parallelized. Once all outputs are ready, it merges them into a single object. Use cases are endless: from running indicators that cannot be wrapped with the indicator factory, to parameterizing entire pipelines!
Example 1: Parameterize a simple SMA indicator without Indicator Factory
>>> @vbt.parameterized(merge_func="column_stack") # (1)!
... def sma(close, window):
... return close.rolling(window).mean()
>>> data = vbt.YFData.pull("BTC-USD")
>>> sma(data.close, vbt.Param(range(20, 50)))
- Use
column_stack
to merge time series in the form of DataFrames and complex VBT objects such as portfolios.
window 20 21 22 \
Date
2014-09-17 00:00:00+00:00 NaN NaN NaN
2014-09-18 00:00:00+00:00 NaN NaN NaN
2014-09-19 00:00:00+00:00 NaN NaN NaN
... ... ... ...
2024-03-07 00:00:00+00:00 57657.135156 57395.376488 57147.339134
2024-03-08 00:00:00+00:00 58488.990039 58163.942708 57891.045455
2024-03-09 00:00:00+00:00 59297.836523 58956.156064 58624.648793
...
window 48 49
Date
2014-09-17 00:00:00+00:00 NaN NaN
2014-09-18 00:00:00+00:00 NaN NaN
2014-09-19 00:00:00+00:00 NaN NaN
... ... ...
2024-03-07 00:00:00+00:00 49928.186686 49758.599330
2024-03-08 00:00:00+00:00 50483.072266 50303.123565
2024-03-09 00:00:00+00:00 51040.440837 50846.672353
[3462 rows x 30 columns]
Example 2: Parameterize an entire Bollinger Bands pipeline
>>> @vbt.parameterized(merge_func="concat") # (1)!
... def bbands_sharpe(data, timeperiod=14, nbdevup=2, nbdevdn=2, thup=0.3, thdn=0.1):
... bb = data.run(
... "talib_bbands",
... timeperiod=timeperiod,
... nbdevup=nbdevup,
... nbdevdn=nbdevdn
... )
... bandwidth = (bb.upperband - bb.lowerband) / bb.middleband
... cond1 = data.low < bb.lowerband
... cond2 = bandwidth > thup
... cond3 = data.high > bb.upperband
... cond4 = bandwidth < thdn
... entries = (cond1 & cond2) | (cond3 & cond4)
... exits = (cond1 & cond4) | (cond3 & cond2)
... pf = vbt.PF.from_signals(data, entries, exits)
... return pf.sharpe_ratio
>>> bbands_sharpe(
... vbt.YFData.pull("BTC-USD"),
... nbdevup=vbt.Param([1, 2]), # (2)!
... nbdevdn=vbt.Param([1, 2]),
... thup=vbt.Param([0.4, 0.5]),
... thdn=vbt.Param([0.1, 0.2])
... )
- Use
concat
to merge metrics in the form of scalars and Series. - Builds the Cartesian product of 4 parameters.
nbdevup nbdevdn thup thdn
1 1 0.4 0.1 1.681532
0.2 1.617400
0.5 0.1 1.424175
0.2 1.563520
2 0.4 0.1 1.218554
0.2 1.520852
0.5 0.1 1.242523
0.2 1.317883
2 1 0.4 0.1 1.174562
0.2 1.469828
0.5 0.1 1.427940
0.2 1.460635
2 0.4 0.1 1.000210
0.2 1.378108
0.5 0.1 1.196087
0.2 1.782502
dtype: float64
Riskfolio-Lib¶
- Riskfolio-Lib is another increasingly popular library for portfolio optimization that has been integrated into VBT. Integration was done by automating typical workflows inside Riskfolio-Lib and putting them into a single function, so many portfolio optimization problems can be expressed using a single set of keyword arguments and easily parameterized.
Tutorial
Learn more in the Portfolio optimization tutorial.
Run Nested Clustered Optimization (NCO) on a monthly basis
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_riskfolio(
... returns=data.close.vbt.to_returns(),
... port_cls="hc",
... every="M"
... )
>>> pfo.plot().show()
Array-like parameters¶
- The broadcasting mechanism has been completely refactored and now supports parameters. Many parameters in VBT, such as SL and TP, are array-like and can be provided per row, per column, or even per element. Internally, even a scalar is treated as a regular time series and is broadcast along with other proper time series. Previously, to test multiple parameter combinations, you had to tile other time series so that all shapes matched perfectly. With this feature, the tiling procedure is performed automatically!
Write a steep slope indicator without indicator factory
>>> def steep_slope(close, up_th):
... r = vbt.broadcast(dict(close=close, up_th=up_th))
... return r["close"].pct_change() >= r["up_th"]
>>> data = vbt.YFData.pull("BTC-USD", start="2020", end="2022")
>>> fig = data.plot(plot_volume=False)
>>> sma = vbt.talib("SMA").run(data.close, timeperiod=50).real
>>> sma.rename("SMA").vbt.plot(fig=fig)
>>> mask = steep_slope(sma, vbt.Param([0.005, 0.01, 0.015])) # (1)!
>>> def plot_mask_ranges(column, color):
... mask.vbt.ranges.plot_shapes(
... column=column,
... plot_close=False,
... shape_kwargs=dict(fillcolor=color),
... fig=fig
... )
>>> plot_mask_ranges(0.005, "orangered")
>>> plot_mask_ranges(0.010, "orange")
>>> plot_mask_ranges(0.015, "yellow")
>>> fig.update_xaxes(showgrid=False)
>>> fig.update_yaxes(showgrid=False)
>>> fig.show()
- Tests three parameters and generates a mask with three columns, one for each parameter.
Parameters¶
- There is a new module for working with parameters.
Generate 10,000 random parameter combinations for MACD
>>> from itertools import combinations
>>> window_space = np.arange(100)
>>> fastk_windows, slowk_windows = list(zip(*combinations(window_space, 2))) # (1)!
>>> window_type_space = list(vbt.enums.WType)
>>> param_product = vbt.combine_params(
... dict(
... fast_window=vbt.Param(fastk_windows, level=0), # (2)!
... slow_window=vbt.Param(slowk_windows, level=0),
... signal_window=vbt.Param(window_space, level=1),
... macd_wtype=vbt.Param(window_type_space, level=2), # (3)!
... signal_wtype=vbt.Param(window_type_space, level=2),
... ),
... random_subset=10_000,
... build_index=False
... )
>>> pd.DataFrame(param_product)
fast_window slow_window signal_window macd_wtype signal_wtype
0 0 1 47 3 3
1 0 2 21 2 2
2 0 2 33 1 1
3 0 2 42 1 1
4 0 3 52 1 1
... ... ... ... ... ...
9995 97 99 19 1 1
9996 97 99 92 4 4
9997 98 99 2 2 2
9998 98 99 12 1 1
9999 98 99 81 2 2
[10000 rows x 5 columns]
- Fast windows should be shorter than slow windows.
- Fast and slow windows were already combined, so they share the same product level.
- Window types do not need to be combined, so they share the same product level.
Portfolio optimization¶
- Portfolio optimization is the process of creating a portfolio of assets that aims to maximize return and minimize risk. Usually, this process is performed periodically and involves generating new weights to rebalance an existing portfolio. As with most things in VBT, the weight generation step is implemented as a callback by the user, while the optimizer calls that callback periodically. The final result is a collection of returned weight allocations that can be analyzed, visualized, and used in actual simulations
Tutorial
Learn more in the Portfolio optimization tutorial.
Allocate assets inversely to their total return in the last month
>>> def regime_change_optimize_func(data):
... returns = data.returns
... total_return = returns.vbt.returns.total()
... weights = data.symbol_wrapper.fill_reduced(0)
... pos_mask = total_return > 0
... if pos_mask.any():
... weights[pos_mask] = total_return[pos_mask] / total_return.abs().sum()
... neg_mask = total_return < 0
... if neg_mask.any():
... weights[neg_mask] = total_return[neg_mask] / total_return.abs().sum()
... return -1 * weights
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_optimize_func(
... data.symbol_wrapper,
... regime_change_optimize_func,
... vbt.RepEval("data[index_slice]", context=dict(data=data)),
... every="M"
... )
>>> pfo.plot().show()
PyPortfolioOpt¶
- PyPortfolioOpt is a popular financial portfolio optimization package that includes both classical methods (Markowitz 1952 and Black-Litterman), suggested best practices (such as covariance shrinkage), and many recent developments and novel features, like L2 regularization, shrunk covariance, and hierarchical risk parity.
Tutorial
Learn more in the Portfolio optimization tutorial.
Run Nested Clustered Optimization (NCO) on a monthly basis
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_pypfopt(
... returns=data.returns,
... optimizer="hrp",
... target="optimize",
... every="M"
... )
>>> pfo.plot().show()
Universal Portfolios¶
- Universal Portfolios is a package that brings together various Online Portfolio Selection (OLPS) algorithms.
Tutorial
Learn more in the Portfolio optimization tutorial.
Simulate an online minimum-variance portfolio on a weekly time frame
>>> data = vbt.YFData.pull(
... ["SPY", "TLT", "XLF", "XLE", "XLU", "XLK", "XLB", "XLP", "XLY", "XLI", "XLV"],
... start="2020",
... end="2023",
... missing_index="drop"
... )
>>> pfo = vbt.PFO.from_universal_algo(
... "MPT",
... data.resample("W").close,
... window=52,
... min_history=4,
... mu_estimator='historical',
... cov_estimator='empirical',
... method='mpt',
... q=0
... )
>>> pfo.plot().show()
And many more...¶
- Look forward to more killer features being added every week!