Abdul Malik | Finance & Technology

What I Learned After Running an LLM Portfolio for Five Years

2026-06-10T00:00:00+02:00

When I started this research, I expected the AI portfolio to win.

A large language model can process earnings calls, Federal Reserve communications, market indicators, and macroeconomic data all at once. That is roughly what experienced investors do when they sit down to make allocation decisions. The bet was that doing it at scale, across thousands of documents, would improve the result.

It did not.

Over the five-year out-of-sample period, the LLM-enhanced portfolio generated a Sharpe Ratio of 0.767. The strongest quantitative benchmark came in at 0.856. Looking at those numbers on their own, the conclusion seems simple: the AI did not work.

But the numbers were hiding something.

The Results

Here is how the main strategies compared across the full evaluation period:

Strategy	Sharpe Ratio
Minimum Variance	0.879
Signal-Enhanced Minimum Variance	0.856
LLM-Enhanced Portfolio	0.767

The LLM underperformed both quantitative benchmarks, including the Signal-Enhanced portfolio that already incorporated momentum signals, volatility regime adjustments, and yield curve data. On a pure risk-adjusted basis, there was no case for the AI approach.

One number did not fit neatly into that picture. The LLM-enhanced portfolio generated a statistically significant Fama-French alpha, meaning the model was capturing information that standard risk factors do not explain. It was not producing random noise. Something real was being picked up; it just was not making its way into better returns. Figuring out why that gap existed became the most instructive part of the whole project.

Why the LLM Lost

The Correlation Problem

The portfolio universe was ten U.S. equity ETFs, and they are not independent assets.

Technology, growth, and large-cap equities move together across most market environments. Energy and financials have their own dynamics, but the universe was still heavily correlated on average. When assets move together like that, rotating between them does not produce much in the way of active returns. There was a structural ceiling on what any tactical allocation model could achieve here, and the LLM was up against it from the start.

The Noise Problem

To reduce randomness in the outputs, I ran Llama 3.1 8B five times each month and averaged the results. That helped, but language models are stochastic: the same inputs do not always produce the same outputs. Each monthly run could return meaningfully different portfolio weights. Averaged across five runs the signal-to-noise ratio improved, but it was never clean. Over five years of monthly decisions, that noise accumulates and it drags on performance.

The Ridge Regression Test

At some point I started asking a question I probably should have asked earlier: was the value coming from the LLM, or from the features it was given?

The model was working with earnings sentiment features, Fed sentiment features, momentum signals, volatility data, and yield curve information. Those inputs might carry predictive power on their own, regardless of how they are processed. To check, I trained a Ridge Regression model on exactly the same features.

Strategy	Sharpe Ratio
Ridge Regression	0.870
LLM-Enhanced Portfolio	0.767

The simple regularised model beat the LLM by a substantial margin.

That result reframed the picture. The features contained genuine predictive information. The LLM was not using them better than a well-tuned linear model. If anything, it was using them worse. The value was in the data, not in the reasoning layer on top.

The Discovery That Changed My Conclusion

The overall Sharpe numbers pointed one way: quantitative models win, close the notebook.

Breaking the results down by time period told a different story:

Period	LLM Sharpe Ratio
2021 to 2022	0.603
2023 to 2025	0.968

Same model, same architecture, same features, but completely different performance depending on what the market was doing. Averaged across five years the LLM looked mediocre. Split into two periods it looked like two different models. The aggregate number was not wrong, it was just not asking the right question.

When the AI Actually Worked

November 2023 is the clearest example of the model doing what it was built for.

The Fed had been hiking aggressively throughout 2022 and into 2023. By late 2023, inflation was coming down and the language coming out of the FOMC was shifting. The pivot from tightening to pausing was becoming visible in the text before it showed up clearly in the data.

The model saw it. Earnings calls across technology and consumer sectors were turning more optimistic. Fed communications were softening on inflation. Importantly, the model’s five runs that month came back with similar answers, which is what a high confidence score reflects in this system. It rotated toward growth with some conviction.

That call worked. Growth equities ran through the first half of 2024, and the portfolio was already there.

When the AI Failed

March 2025 looked completely different.

Tariff announcements were creating sharp uncertainty. Earnings calls were contradictory: companies were reporting decent recent results while simultaneously pulling forward guidance and flagging supply chain risk. The macro picture was moving faster than the corporate fundamentals. Earnings sentiment came in cautiously positive, Fed communications were hawkish, volatility was elevated, and yield curve signals were ambiguous. The signals pointed everywhere at once.

The model had no way to make sense of that, but it did not say so. Across five runs, it produced high confidence scores. Reading through the outputs, I found something specific: the model was referencing an inverted yield curve in its reasoning when the yield curve was not actually inverted. It had filled in context that was not there, and it had done it with apparent certainty.

That is a different problem from getting the allocation wrong. A bad call can come from bad luck or an unusual environment. Generating false context while appearing certain is a structural issue with how the confidence mechanism was working.

The Real Contribution

The key lesson was not the headline result. It was understanding when the model should and should not be trusted.

The confidence mechanism counted how many signals were firing above a threshold. More signals meant higher confidence. That works when markets are coherent. It breaks down when markets are full of contradictory information, because activity and agreement are not the same thing.

November 2023 worked because everything pointed in the same direction. Earnings sentiment, Fed tone, and macro signals were all telling a consistent story, and the five model runs agreed with each other. Confidence was high because agreement was high.

In March 2025, signals were active and contradictory. The confidence score was high because a lot was happening, not because any clear picture was forming. The mechanism measured volume when it should have been measuring coherence. A better system would distinguish between those two states: not just how many signals are firing, but whether they are saying the same thing.

Final Thoughts

The thesis did not establish that LLMs outperform quantitative models. The numbers are clear on that.

What it did show is that qualitative information is not irrelevant to portfolio decisions. The Fama-French alpha was real. The November 2023 call was not an accident. The signal was there; the architecture was not yet good enough to use it reliably under all conditions.

Small open-source models run locally are probably better understood as feature processors right now than as autonomous portfolio decision makers. They can translate text into signals. Synthesising those signals under ambiguous market conditions, without hallucinating context or overclaiming certainty, is still a hard problem.

The regime results leave room for optimism. More capable models, a confidence framework built around signal coherence rather than signal volume, and better access to real-time information could change what is achievable here. That is worth building toward.

Language carries information that moves markets. Extracting it reliably, without the noise eating the signal, is the part that still needs work.

Building an AI Portfolio Manager: Can LLMs Beat Quant Investing?

2026-06-04T00:00:00+02:00

Numbers run portfolio management. That sounds obvious, but it shapes everything: how risk is measured, how capital is allocated, which decisions get made and which get ignored.

Investors build models around returns, volatility, correlations, yield spreads, valuation metrics, and factor exposures. From Markowitz to modern factor investing, the toolkit has stayed largely quantitative.

And yet every quarter, thousands of companies tell investors exactly how they see the future.

Executives discuss demand trends, competitive pressures, supply chain issues, hiring plans, capital expenditure programs, and growth expectations during earnings calls. The Federal Reserve communicates its read on inflation, growth, labour markets, and rate policy through statements, minutes, and press conferences. Professional investors spend enormous amounts of time working through all of it.

So the question I kept coming back to was whether an AI could do the same: could a Large Language Model improve portfolio performance by combining qualitative signals from earnings calls and Fed communications with traditional quantitative data?

That question became the foundation of my master’s thesis in Portfolio Management at Frankfurt School of Finance & Management. Twelve thousand earnings call transcripts, hundreds of Federal Reserve documents, and thousands of AI-generated portfolio decisions later, I had an answer I did not expect.

The Traditional Approach to Portfolio Construction

Most portfolio models are fundamentally backward-looking.

They use historical returns to estimate risk and expected performance, identifying patterns such as momentum, volatility regimes, or factor exposures and allocating capital accordingly. These approaches work because markets often exhibit persistent behaviour. Assets that have performed well recently may continue to do so. Volatility regimes shape investor risk appetite. Certain sectors tend to outperform in specific economic environments.

What traditional models routinely ignore is language.

A technology company reports earnings: revenue beats expectations, margins improve, management raises guidance. Separately, the Federal Reserve signals that inflation risks are fading and rate cuts are becoming more likely. Both events carry information that may move prices. But converting thousands of pages of financial text into investable signals has historically been hard to do well, and most quantitative frameworks do not try.

That started to change with the development of large language models.

The Rise of Financial Language Models

Natural language processing looked very different ten years ago.

Early approaches relied on dictionaries that mapped words to positive or negative scores. These worked poorly in finance, where “liability” or “debt” sound alarming in everyday language but carry neutral or routine meanings in a corporate filing.

Transformer models changed things. BERT, FinBERT, GPT, Llama, and Mistral can process language in context rather than scoring words in isolation. FinBERT, trained specifically on financial text, became one of the more widely used tools for financial sentiment analysis.

Large language models added something beyond that: the ability to hold multiple signals in mind at once. An LLM can take in positive earnings sentiment alongside a deteriorating macroeconomic outlook, an inverted yield curve, and rising volatility, and try to arrive at a coherent view. Whether doing that actually improves portfolio decisions was a question that had not really been tested at the portfolio level.

Designing the Experiment

Rather than studying individual stock predictions, I wanted to test something more practical: whether an LLM could improve an entire portfolio allocation process.

I built a universe of ten major U.S. equity ETFs covering different sectors, styles, and market-cap segments: broad S&P 500 exposure, technology, financials, healthcare, energy, growth, value, and mid-to-small cap equities. The portfolio was rebalanced monthly, and each month followed the same process.

Step 1: Build a Quantitative Baseline

A minimum-variance portfolio was constructed using an exponentially weighted covariance matrix, creating a quantitative benchmark based purely on market data.

Step 2: Extract Information from Text

I collected 12,364 earnings call transcripts and 347 Federal Reserve documents, including FOMC statements, meeting minutes, and press conference transcripts. All were processed through FinBERT.

The resulting sentiment features captured overall earnings tone, forward guidance sentiment, management confidence, Federal Reserve communication style, inflation concerns, economic outlook signals, and labour market assessments.

Step 3: Let the LLM Make Portfolio Decisions

Llama 3.1 8B was deployed locally using Ollama. Each month, the model received the baseline portfolio weights alongside momentum signals, volatility data, yield curve information, macroeconomic indicators, and the earnings and Fed sentiment features, then proposed revised weights.

To reduce randomness, I ran the model five times each month and averaged the results. Across the full study, that produced 960 independent LLM portfolio decisions.

The Benchmark Problem

A lot of AI investment research compares against weak baselines. A model beats an equal-weight portfolio and claims it works. That is not a meaningful test.

The real question is whether an LLM can outperform a strategy that already incorporates well-known predictive signals. So I compared the LLM portfolio against five alternatives: Minimum Variance, Risk Parity, Equal Weight, Value Tilt, and Signal-Enhanced Minimum Variance.

The Signal-Enhanced portfolio was the one that mattered most. It already included momentum signals, volatility regime adjustments, and yield curve information, representing what a strong quantitative model could do with market data alone. The LLM needed to beat that to show that reading earnings calls and Fed minutes was actually adding something beyond what the numbers already said.

What I Expected

I thought the LLM would find relationships that traditional quantitative models missed: strong earnings optimism combined with a dovish Fed shift, sector-specific divergence in management confidence, narratives that were changing before the data caught up. These are exactly the kinds of signals that experienced analysts look for, and they are difficult to capture with linear models.

The reasoning seemed sound. Professional investors spend significant time interpreting these sources. A model that could process them at scale should, in theory, pick up something useful.

It was more complicated than that.

After processing over twelve thousand earnings calls and hundreds of Federal Reserve documents, the results pushed back on a lot of what I had assumed going in. What worked, what failed, and why it failed were not the story I had written in my head before the data came in.

That is what the next article covers.

Europe’s Balancing Act with the Digital Euro

2026-03-20T00:00:00+01:00

Central banks around the world are building digital currencies. But they are not all solving the same problem.

China’s digital yuan is partly about scale and state-backed payment infrastructure. The Bahamas’ Sand Dollar was built around access and inclusion. Other CBDC projects focus on payments modernisation or reducing dependence on cash.

The Digital Euro is different. It is Europe’s attempt to update public money for a digital economy without blowing up the role of private banks.

That tension matters more than the technology behind it.

What Europe Is Actually Building

The ECB’s own framing is careful. A digital euro would be a digital form of central bank money for everyday payments, available across the euro area, designed to complement cash rather than replace it. The current model does not imagine citizens opening accounts directly with the ECB. Distribution would run through supervised intermediaries, banks and payment service providers, which would remain the customer-facing layer.

There is also a sovereignty argument running through this. Cash use is falling across the euro area, and the payment rails that have filled the gap are mostly foreign: American card networks, US-headquartered tech platforms. The ECB has been candid that preserving a European form of public money, one not controlled by a private company or a foreign government, is part of what this project is about.

The ECB completed its investigation phase in 2023 and has since been working through the technical design, legislation, and coordination with commercial banks. No final decision to issue has been made yet.

Why Banks Sit at the Centre of the Debate

For users, a Digital Euro payment could feel almost uneventful. One scenario the ECB has designed for: you pay at a market stall with no internet connection, the transaction settles offline between devices, and the merchant’s balance updates when connectivity returns. That is something cards and apps currently cannot do, and it is one of the more concrete use cases being built around.

For banks, none of this is uneventful.

Retail deposits are one of the foundations of commercial banking. They help fund lending, support liquidity, and anchor the customer relationship. A retail CBDC introduces a new form of money that could compete with those deposits, particularly if consumers see central bank money as safer in times of stress.

That is why the Digital Euro is being designed with guardrails. The ECB has discussed holding limits of around 3,000 euros per person, keeping it useful as a payment tool while making it unattractive as a savings vehicle. Europe wants people to use the Digital Euro for paying, not for moving large sums out of bank deposits.

Banks and payment firms would still handle onboarding, wallets, customer support, fraud monitoring, compliance, and integration into existing apps. The ECB issues the money; the private sector shapes the experience. How intermediaries get compensated for that work, given the Digital Euro pays no interest, is a question the ECB has not fully resolved publicly.

How It Compares With Other Models

China’s e-CNY

China’s digital yuan is further along in pilot deployment. It uses authorised operators to distribute it and offers what official descriptions call “managed anonymity,” meaning transaction data can be accessed by the state under certain conditions but is not routinely visible to operators. That makes it a serious retail payments instrument, but one built around very different assumptions about government access than Europe would accept.

The Bahamas’ Sand Dollar

The Sand Dollar addressed financial inclusion across a dispersed island geography where traditional banking was uneven. Europe does not face that problem at scale. Its challenge is not expanding access to money, but preserving the role of public money in a market already crowded by banks, cards, large payment platforms, and private stablecoins now regulated under MiCA.

Why Privacy Matters So Much in Europe

The Digital Euro will not be judged by architecture diagrams. It will be judged by a simpler question: who can see my payments?

The ECB says it would not identify people from their payment data and has designed offline functionality partly to offer privacy levels closer to cash, where only payer and payee know a transaction happened. But the project must also operate within EU anti-money laundering rules, which require some transaction visibility. Those two things are in direct tension, and the legislation working through Brussels has not settled it cleanly.

That unresolved tension is what makes the Digital Euro genuinely interesting to watch.

Not because it is “digital,” since most money already is, but because Europe is trying to redesign money without breaking the banking model underneath it.

If it succeeds, the result may feel almost uneventful to the average user. That would be a job well done.

Reading the Tape on Twitter

2026-03-13T00:00:00+01:00

Financial markets move on information, and increasingly, that information flows through social media before it reaches anywhere else. A tweet from an analyst, a headline reposted by a trader, a single word like “downgrade” are all signals that can move prices. The question this project explores is simple: Can a machine be trained to reliably tell a bullish post from a bearish one?

The Data

The dataset contains 9,543 financial tweets, each labelled as Bearish, Bullish, or Neutral. Since the goal is binary classification, Neutral tweets were removed, leaving 3,365 tweets split roughly 57% Bullish and 43% Bearish.

The class imbalance is modest but worth noting. There are more Bullish tweets than Bearish ones in the data, which subtly influences how models behave. Interestingly, Bearish and Bullish tweets are nearly identical in length, so tweet length alone tells us nothing useful about sentiment.

Turning Words Into Numbers

Computers cannot deterministically make sense of text. They read numbers. The core challenge in any text classification problem is finding the right way to convert language into something a model can learn from.

Two approaches were used here:

TF-IDF (Term Frequency–Inverse Document Frequency) treats each word (and two-word phrase) as a feature and scores it by how often it appears in a tweet relative to how common it is across all tweets. A word like “downgrade” that appears frequently in Bearish tweets but rarely overall gets a high score. A word like “the” that appears everywhere gets a low score.

Before applying TF-IDF, tweets were cleaned:

URLs and special characters removed
Stock tickers normalised (e.g. $BYND → tickerbynd)
Words reduced to their root form via stemming (e.g. “raising”, “raised”, “raises” → “rais”)

FinBERT is an entirely different approach. Rather than counting words, it uses a large language model pre-trained specifically on financial text to understand the meaning of a tweet in context. More on this later.

The Models

Five traditional machine learning models were trained on the TF-IDF features, then FinBERT was tested separately as a more advanced alternative.

A 75/25 train/test split was used throughout, with stratification to preserve the class balance in both sets.

Results: What Worked — and What Didn’t

Model	Test Accuracy	ROC-AUC
Logistic Regression	74.9%	0.859
Naive Bayes	78.7%	0.928
Support Vector Classifier	84.3%	—
Neural Network	86.3%	0.935
Random Forest	81.2%	0.908
FinBERT + Logistic Regression	87.7%	0.943

ROC-AUC measures how well a model separates the two classes across all decision thresholds, a score of 1.0 is perfect, 0.5 is no better than a coin flip. All models performed well above chance.

The standout story: Naive Bayes

Naive Bayes achieved a reasonable overall accuracy of 78.7%, but its confusion matrix reveals a serious flaw. it classified 97.3% of Bullish tweets correctly while getting only 54% of Bearish tweets right. In practice, this model has a strong bias toward predicting Bullish, which would be dangerous in any real application. Missing a bearish signal in a portfolio context is exactly the kind of error that costs money.

Logistic Regression: balanced but limited

Logistic Regression was the most balanced model: 84% recall on Bearish tweets and 68% on Bullish. It was also the most interpretable: by examining the model’s coefficients, we can see directly which words drove each prediction.

Top words driving Bullish predictions: up, beat, raise, rise, higher, buy, upgrade, gain, jump

Top words driving Bearish predictions: down, miss, lower, downgrade, cut, fall, warn, slide, drop, loss

These align precisely with how a finance professional would intuitively read a headline. The model has, in effect, learned a basic financial vocabulary from the data alone.

The Neural Network leads the TF-IDF pack

The Neural Network (a small two-layer architecture) achieved the best balance of accuracy and class recall among the traditional models — 86.3% overall, with 80% recall on Bearish and 91% on Bullish. It also generalised well, suggesting the architecture was not simply memorising the training data.

Where the Models Struggled: Error Analysis

Examining the Neural Network’s 115 misclassified tweets reveals why this problem is genuinely hard, even for humans.

Predicted Bullish, actually Bearish:

“More stockmarket volatility, less buying the dip, and slower earnings per share growth ahead, Goldman Sachs says”

“EIA forecasts U.S. shale oil output to climb by 49,000 barrels a day in December”

The first contains words like “buying” that typically signal Bullish sentiment. The second includes “climb”, a positive word, despite describing an output increase that could be bearish for prices. Context and domain knowledge matter enormously here.

Predicted Bearish, actually Bullish:

“Fed’s ‘bazooka’ soothes dollar funding squeeze”

“Jobless Americans to see extra payments as soon as this week”

“Squeeze” and “Jobless” look negative in isolation, but both tweets carry a broadly positive market implication. A word-counting model has no way of knowing this.

FinBERT: Context Changes Everything

FinBERT is a version of BERT, one of the most significant advances in natural language AI, fine-tuned specifically on financial documents. Rather than treating words as isolated signals, it reads each tweet as a sequence and understands relationships between words.

The approach here was to use FinBERT purely as a feature extractor: each tweet was passed through the model to produce a rich numerical representation capturing its financial meaning, then a simple Logistic Regression was trained on those representations.

The result: 87.7% accuracy and 0.943 ROC-AUC, the best of all models tested, with strong recall on both Bearish (87.3%) and Bullish (87.9%) classes.

Crucially, FinBERT is the only model that could plausibly handle tweets like the “bazooka” example above; because it was trained on enough financial text to understand that a central bank intervention, however colourfully described, is typically market-positive.

Key Takeaways

1. Word-counting methods work, but have clear limits. TF-IDF models learned a credible financial vocabulary and performed well on straightforward tweets. They struggle the moment sentiment depends on context rather than individual words.

2. High accuracy can hide dangerous blind spots. Naive Bayes looked reasonable on paper but was nearly blind to Bearish signals. In any real-world application like risk management, trade signal generation, news monitoring, recall on each class matters as much as overall accuracy.

3. Financial language is genuinely ambiguous. Many misclassified tweets were ambiguous even by human standards. Words like “climb”, “squeeze”, and “bazooka” carry very different meanings depending on what surrounds them. This is precisely the problem FinBERT was built to address.

4. Pre-trained financial models meaningfully outperform bag-of-words. FinBERT’s improvement over the best TF-IDF model was modest in percentage terms but consistent across both classes and the gap would likely widen on more complex or nuanced financial text.

Built using Python, scikit-learn, and HuggingFace Transformers. Dataset: Twitter Financial News Sentiment (Zeroshot / HuggingFace).

MoneyWise: Gamified Financial Literacy

2026-03-12T00:00:00+01:00

Overview

MoneyWise is an interactive browser game built to make financial literacy accessible and engaging. Players navigate a series of real-life financial scenarios like managing credit, spending decisions, and savings while tracking their Credit Power and Savings in real time.

The core idea: learning personal finance should feel like playing a game, not reading a textbook.

Play a live Demo →

Motivation

Financial literacy is one of the most underdeveloped life skills for young adults. Most people learn about credit scores, interest rates, and budgeting only after making costly mistakes. MoneyWise flips that. It put players in scenarios where the consequences of poor financial decisions are felt in a safe, simulated environment.

The game format naturally drives engagement: every choice has a visible impact on your Credit Power score and Savings balance, creating an immediate feedback loop that textbooks simply can’t replicate.

Technical Stack

Layer	Technology
UI Framework	React
Routing	React Router v6
State Management	MobX
Animations	CSS Transitions
Analytics	Google Analytics
Deployment	Netlify

Core Features

Credit Power tracking — every decision affects your score, just like real life
Savings management — balance short-term spending against long-term goals
Sequential scenario gameplay — progress through increasingly complex financial situations
Interactive decision trees — multiple-choice paths with distinct financial outcomes
Progress tracking — visual feedback on how each choice moves your financial health

Gameplay Loop

Player is presented with a real-life financial scenario (e.g. “You need a new phone: buy outright, finance it, or wait?”)
Choose from 2–3 options, each with different financial implications
Credit Power and Savings update immediately based on the choice
A brief explanation shows why the outcome occurred
Move to the next scenario

This loop reinforces cause-and-effect thinking around financial decisions in a way that sticks.

Key Takeaways

Gamification is a genuinely effective delivery mechanism for financial education as engagement stays high when stakes feel real. The duolingo playbook.
MobX’s observable state makes it easy to keep UI perfectly in sync with game state without boilerplate.
Designing scenarios requires as much domain knowledge (personal finance) as technical skill. The game is only as good as the quality of its dilemmas.

Links

Full source on GitHub · Live demo at moneywisedemo.netlify.app

Yield Curve PCA & P&L Attribution

2026-03-11T00:00:00+01:00

Overview

This project decomposes US Treasury yield curve movements into their principal components, calibrates historical shocks, prices a fixed income portfolio, and attributes P&L under each scenario using FRED data.

The three dominant PCA factors — level, slope, and curvature — explain over 95% of historical yield curve variance, making them powerful tools for scenario analysis and risk attribution.

Motivation

Fixed income portfolio managers constantly face the question: why did my portfolio P&L change today? Breaking down P&L into interpretable factors (parallel shift, steepening/flattening, twist) is far more useful than a single unexplained number.

Methodology

1. Data Collection

Historical US Treasury yields pulled from FRED for maturities: 3M, 6M, 1Y, 2Y, 3Y, 5Y, 7Y, 10Y, 20Y, 30Y.

2. PCA Decomposition

from sklearn.decomposition import PCA
import pandas as pd
import numpy as np

# Compute daily yield changes
yield_changes = yields.diff().dropna()

# Fit PCA
pca = PCA(n_components=3)
pca.fit(yield_changes)

# Explained variance
print(f"PC1 (Level): {pca.explained_variance_ratio_[0]:.1%}")
print(f"PC2 (Slope): {pca.explained_variance_ratio_[1]:.1%}")
print(f"PC3 (Curvature): {pca.explained_variance_ratio_[2]:.1%}")

3. Portfolio Pricing

Each bond in the portfolio is priced under each PCA shock scenario using modified duration and convexity adjustments.

4. P&L Attribution

Total P&L is decomposed as:

ΔP&L = β₁·PC1_shock + β₂·PC2_shock + β₃·PC3_shock + residual

Results

Factor	Explained Variance	P&L Contribution
Level (PC1)	82.3%	-€45,200
Slope (PC2)	11.1%	+€12,800
Curvature (PC3)	4.2%	-€3,100
Residual	2.4%	+€1,500

Key Takeaways

The level factor dominates — a 25bps parallel shift accounts for the majority of P&L swings
Slope risk (steepening/flattening) is the second most important driver
Curvature effects are small but material for barbell/butterfly strategies

Code & Repo

Full implementation available on GitHub and on display at WebSite.