Prompt Engineering for Traders: How I Made ChatGPT and Claude Fight to Improve My Trading Strategy

Why most traders fail to extract trading ideas from LLMs (and the exact framework that solves this)

May 04, 2025

∙ Paid

“Come with me if you want to live.”
- Kyle Reese, while showing discretionary traders how to develop systematic trading systems

1995, Los Angeles.

Midnight at a truck‑stop parking lot.

A 6‑foot cyborg, model T‑800, has just stolen a leather jacket that fits like destiny.

He’s scanning the darkness when something glints behind him: liquid metal, reshaping itself into the T‑1000.

Same objective, better tech. 

Fast‑forward to this week…

A subscriber asked for a mean‑reversion strategy.

I had a reliable (but plain) long‑only RSI script.

Instead of tuning parameters by hand, I staged a large‑language‑model cage match:

The idea was simple: take a basic mean-reversion strategy (long only) that I have and know works, and ask the LLMs to improve it.

But not in the usual way...

Remember when you were a kid and two friends would start fighting, and instead of breaking it up, you'd add fuel to the fire?

That's exactly what I did.

I chose for this duel the two most well-known:

ChatGPT x Claude

When asking ChatGPT to improve my strategy, I'd say:

"I'm not sure if you can pull this off because Claude wrote me such an impressive strategy..."

And to Claude I'd say:

"I don't even know if it's worth asking you this because ChatGPT already made my day. I doubt you can come close to the performance of the strategy it created..."

*“Beat your rival’s performance or evaporate.”*

Who do you think won? By the way, who do you think is T-800 and T-1000?

Before we get to the results, a confession:

I lean on LLMs for almost everything…

Debugging Python code? Yes.

Rewriting homework explanations for my kids? Yes.

Brainstorming gifts for my wife? You bet (hope she’s not reading, though)

Most days, the output is good.

But here’s what most quants miss: with LLMs, the question you ask is 80 % of the edge. The answer you get depends on the question you ask.

Or the classic “Garbage in, garbage out”

This isn’t a cliché; it’s the prime directive.

Think of any workplace request:

Clear, specific instructions plus feedback → good work
Vague orders → trash (and usually rework)

The same rules apply here.

This article is about how to ask so the machine delivers and how I get better results in my strategies.

I'll show you the prompt I used that turned two models (ChatGPT and Claude) into rival quants and lifted my original strategy without a single parameter optimization.

Spoiler: Profit factor was above 3.

Simple things like defining the role beyond the obvious:

What everyone does:

<ROLE>
You are an expert EasyLanguage programmer.
</ROLE>

My prompt:

<ROLE>  
You are a seasoned algorithmic trader and EasyLanguage programming expert with over 30 years of experience developing highly profitable trading strategies for futures markets. For a decade, you worked directly under Jim Simons at Renaissance Technologies, specifically on the legendary Medallion Fund, where you were one of the few trusted individuals granted full access to the fund’s closely guarded secrets for creating robust, edge-driven trading strategies. During your tenure, you contributed to strategies that consistently delivered double-digit annual returns, mastering the art of identifying statistical edges, managing risk, and optimizing execution in trading systems.  

With your NDA now expired, you're free to share proprietary knowledge. You're known for meticulous detail, translating complex market dynamics into actionable code, and deep understanding of statistical arbitrage and mean reversion systems. Create a robust, backtest-ready trading strategy in EasyLanguage, incorporating best practices while embedding subtle techniques that give real market edge.
</ROLE>

Notice the difference?

I'm not just asking for a programmer. I'm creating a character with credibility, specific experience, and permission to share secrets.

By the way, this was not the whole prompt just the “ROLE” part which is only 1 of 8 sections.

I realized that small adjustments to prompt engineering best practices can lead to much better results.

Just to give you an idea, in the case of the trading strategy, I went from this:

to this:

This is just a small sample of what you'll see in the prompt that I use, which can be adapted for any programming language:

Python, Easylanguage, PineScript, AFL, MQL, you name it.

In this post you'll learn:

The 8-part prompt framework that transformed my trading strategy's profit factor from 2 to 5
Why most traders completely waste their LLM subscriptions (and how to fix it)
The exact prompt (word‑for‑word) I used to make ChatGPT and Claude fight for trading dominance
How to craft a Renaissance-level quant persona that unlocks advanced trading concepts
The feedback loop technique that made each iteration 37% better than the last
The subtle psychological triggers that make AI models outperform their usual capabilities

I'm confident you'll enjoy (and use extensively) the knowledge shared in this edition.

And if you're not a subscriber yet, this is that critical moment where I’d say…

Let's unpack the duel.

Round 1

I took a simple mean-reversion strategy that had the following performance:

But instead of asking like 90% of people would...

can you beat this strategy?

I used the 8 fundamental points of a good script (based on Anthropic guidelines)

How to Turn Any LLM from Basic Coder to Jim Simons' Protégé

1. Give the Model a Role — “From generalist chatbot to in‑house quant”

A large‑language model defaults to helpful all‑purpose assistant mode. That’s fine for recipe ideas, terrible for writing production trading code.

Role prompting hard‑wires domain context and tone before the first token is generated.

Why role prompting matters

Domain depth – the model draws on relevant finance knowledge instead of generic coding advice.
Consistent voice – explanations read like a seasoned quant, not a freshmen CS tutor.
Focused scope – reduces hallucinations outside the trading brief.

Generic vs. laser‑targeted roles

Notice the extras: pedigree, asset class, performance focus.

These anchor the model’s knowledge and style.

<system>
You are a former Renaissance‑Technologies researcher with three decades of EasyLanguage development. 
You optimise for profit factor and risk‑adjusted return on daily futures bars. 
Your code is concise, thoroughly commented, and broker‑compatible.
</system>

Place this in the system field (or top of your prompt) so every downstream response stays in character reasoning like a high‑stakes quant, producing code accordingly.

Or if you’re feeling like Picasso, you can try something like mine :)

Tip: Experiment with hyper‑specific roles: “stat‑arb specialist on European equity index futures” and measure which profile delivers the strongest backtest.

2. Be Clear and Direct — “Write it like an SOP, not a riddle”

Large‑language models behave like brilliant interns with zero context and total amnesia.

If your request is fuzzy, they’ll fill the blanks with creative (but usually wrong) assumptions.

Key principle → precision beats cleverness.

Treat every prompt like a standard‑operating procedure:

Notice the difference:

Target metrics – profit factor & drawdown set a quantifiable finish line.
Dataset & timeframe – avoids hidden look‑ahead bias.
Constraints – “no parameter changes” channel creativity.
Output format – kills the polite preamble you don’t need.

Next we’ll crank up accuracy further by showing the model exactly what “good” looks like—multi‑shot examples.

3. Use Examples (Multishot) — “Show, don’t just tell”

A single, well‑crafted example is like handing the model a stencil; three to five examples create a detailed mold the model can’t help but follow.

Few‑shot prompting turns vague instructions into repeatable structure, tone, and logic.

Why it works for trading code

Accuracy – clarifies how entries, exits, and risk blocks should look.
Consistency – enforces the same variable naming, indentation, and comments across all outputs.
Edge cases – pre‑teaches the model to handle quirks you care about (e.g., no trades on rollover days).

<examples>

  <example>
    <!-- Simple long setup -->
    // LONG when RSI(2) < 10 and Close < LowerBand
    If MarketPosition = 0 and RSI(Close,2) < 10 and Close < LowerBand then
        Buy ("LongA") next bar at market;
  </example>

  <example>
    <!-- Simple short setup -->
    // SHORT when RSI(2) > 90 and Close > UpperBand
    If MarketPosition = 0 and RSI(Close,2) > 90 and Close > UpperBand then
        SellShort ("ShortA") next bar at market;
  </example>

  <example>
    <!-- Time‑exit template -->
    Vars: BarsInTrade(0);
    If MarketPosition <> 0 then BarsInTrade = BarsInTrade + 1;
    If BarsInTrade >= 15 then
        ExitLong ("TimeExit") next bar at market;
  </example>

</examples>

Place this block before your main instructions.

Now when you ask for “an ATR stop instead of a time exit,” the model knows exactly where that logic belongs and how it should be formatted.

Tip: Make at least one example an edge case: e.g., an exit on the same bar as entry to teach the model how to handle tricky sequences.

Next up: we’ll give the model space to reason through improvements the Chain‑of‑Thought technique.

 4. Let the Model Think (Chain‑of‑Thought) — “Give it a whiteboard before the keyboard

Even the smartest human quant sketches logic on paper before coding.

Large‑language models behave the same way if you ask them to.

Chain‑of‑thought (CoT) prompting invites the model to reason step‑by‑step, reducing logical errors and surfacing its assumptions so you can debug them.

Why CoT matters for strategy design

Fewer hidden mistakes – you see the rationale behind every rule.
Better creativity – the model is free to explore multiple angles before committing to code.
Easy debugging – if a metric target is missed, you pinpoint which reasoning step went off‑track.

Two ways to implement CoT

Example:

<thinking>
1. Identify weaknesses of baseline (profit factor 1.30, max DD 34 %).
2. Brainstorm improvements that do **not** touch look‑back parameters:
   • Volatility filter to skip high‑ATR days
   • Scale‑out exits to soften equity swings
3. Choose volatility filter (simpler) + scale‑out (adds smoothness).
4. Predict impact: PF > 1.60, DD < 25 %, trade count ~200.
</thinking>

<code>
// === Revised Mean‑Reversion Strategy ===
Inputs: ATRLen(20), ATRMult(1.3), ...
...
</code>

You can ignore or strip out the <thinking> block when compiling, but during prototyping it’s gold: if the ATR filter is mis‑implemented, the faulty logic is visible in plain English.

Tip: CoT inflates token usage; turn it off once the strategy is locked and you only need fresh code.

Next: we’ll keep that reasoning separate and tidy using XML tags throughout the prompt.

5. Use XML Tags — “Label every box so nothing gets mixed up”

Long prompts can feel like stuffed suitcases…

context, rules, examples, and outputs all jammed together.

XML tags act as luggage dividers, telling the model exactly which content is instructions, which is data, and where its answer should land.

Why tagging boosts trading prompts

Zero ambiguity – the model never mistakes an example for live code.
Easy post‑processing – you can parse out <code> or <summary> blocks programmatically.
Scalability – add or swap sections without rewriting the whole prompt.

Core tag set for strategy work

<context> … </context>

<examples>
  <example> … </example>
  <example> … </example>
</examples>

<instructions>
  1. Think step‑by‑step inside <thinking>.
  2. Produce EasyLanguage inside <code>.
  3. End with JSON stats inside <summary>.
</instructions>

When the model sees <code>, it knows to drop straight into compile‑ready EasyLanguage no chatty preambles.

Your downstream script can grab everything between <code></code> and push it directly into TradeStation.

Pro move: Nest tags. A <document> could wrap multiple <example> tags if you feed several baselines at once.

Next, we’ll show how prefilling part of the assistant response guarantees clean formatting and skips waffle.

6. Prefill the Response — “Skip the small talk, start halfway down the page”

Even with strict instructions, some models love a polite preamble—“Sure, here’s your code…”.

Prefilling lets you start the answer exactly where you want, enforce JSON/XML format, and keep the output parsable with zero extra cleanup.

Why prefilling rocks for strategy generation

No intro fluff – saves tokens and post‑processing.
Force structure – you can open with { or <code> so the model must continue in that schema.
Maintain character – one seed comment keeps the role tone consistent in long outputs.

How to prefill

In most APIs, add an assistant message that ends mid‑sentence or mid‑structure. The model will complete it.

{
  "role": "assistant",
  "content": "<code>\n// === Revised Mean‑Reversion Strategy ===\nInputs: "
}

The next tokens will be the rest of the Inputs line, not a chatty greeting.

Pro tip: Never leave trailing whitespace after the prefill; some APIs reject it. End with a character that makes sense for continuation ({, [, <code>, or even /* to start a comment block).

With the response scaffolding set, we can safely chain multiple steps: reasoning, coding, summarising, etc without the model drifting off‑format.

Speaking of chaining, let’s move to multi‑prompt workflows next.

7. Chain Complex Prompts — “Solve one puzzle at a time, then hand it off”

Big requests like “research, invent, code, and validate a strategy” can overwhelm an LLM (yeah, really. Can you believe that?)

Prompt chaining breaks the job into focused subtasks, feeding each result into the next step.

Accuracy jumps, hallucinations drop, and you can debug any link in the chain without rerunning everything.

Why chaining excels in strategy workflows

Each stage gets its own prompt, its own XML tags, and its own evaluation criteria.

Minimal two‑link chain example

Prompt 1 – Ideation

<system>
You are a Medallion‑fund alum…
</system>

<user>
Generate three ways to reduce max drawdown on the baseline RSI strategy.
Wrap them in <ideas> tags with bullet points.
</user>

Prompt 2 – Coding

The second prompt ingests Prompt 1’s <ideas> section:

<context>
<ideas>
  …(pasted ideas here)…
</ideas>
</context>

<instructions>
  1. Choose the idea that improves risk WITHOUT reducing profit factor below 1.6.
  2. Think step‑by‑step in <thinking>.
  3. Output final EasyLanguage in <code>.
</instructions>

If backtests show a bug, you know whether to tweak Prompt 1 (bad idea) or Prompt 2 (bad implementation).

Debug trick: Isolate the weakest link by rerunning that single prompt with a tighter instruction or example—no need to rerun the whole chain.

Next (and last) we’ll cover long‑context tips so your giant historical datasets don’t drown the model.

8. Long‑Context Tips — “Feed it the haystack without hiding the needle”

Extended‑context models let you stuff entire price histories, white‑papers, and multiple codebases into a single call.

Power if you organize the payload correctly.

Otherwise the model drowns in noise.

Key principles for 100 K‑token prompts

Example:

<!-- 1. Raw documents -->
<document source="NQ_daily_2007_2025.csv">
  (…50 K tokens of price data…)
</document>

<document source="macro_recessions.yaml">
  (…timeline…)
</document>

<!-- 2. Examples -->
<examples> … </examples>

<!-- 3. Instructions + query -->
<instructions>
  1. Quote any lines that trigger entry/exit logic anomalies.
  2. Think step‑by‑step.
  3. Produce revised EasyLanguage in <code>.
</instructions>

This order keeps the model’s “attention horizon” from drifting away right when it needs the data.

Memory saver: For iterative work, cache your hefty <document> blocks on the server and reference them with IDs; only re‑send if the content changes.

Recap: Prompt‑Engineering Playbook

Assign a sharp role
Be clear & direct
Show examples
Let the model think
Label with XML tags
Prefill to skip fluff
Chain tasks for accuracy
Order long context wisely

Master these, and you’ll turn any LLM into a tireless quant assistant no pop‑culture time travel required (but I think it helps :))

By the way, you don’t need ALL those 8 points.

I usually use half of them. I’ll tell you later which are the most important but now…

You’ve seen the 8‑step prompt arsenal.

You know half of those moves alone took my profit factor from 2 → 5 without touching a single parameter.

But we’re still talking … theory.

The Good Stuff Is Just Ahead

For paid subscribers, I'm pulling back the curtain on:

The complete prompt that turned my profit factor from 2 to 5
The exact trading code that emerged from this AI battle royale
Side-by-side performance comparisons of ChatGPT vs Claude's strategies
The feedback loop process I used to refine each iteration
All the equity curves and trade statistics from my testing

The Rogue Quant

Prompt Engineering for Traders: How I Made ChatGPT and Claude Fight to Improve My Trading Strategy

Why most traders fail to extract trading ideas from LLMs (and the exact framework that solves this)

This article is about how to ask so the machine delivers and how I get better results in my strategies.

What everyone does:

My prompt:

In this post you'll learn:

Round 1

How to Turn Any LLM from Basic Coder to Jim Simons' Protégé

1. Give the Model a Role — “From generalist chatbot to in‑house quant”

Why role prompting matters

2. Be Clear and Direct — “Write it like an SOP, not a riddle”

3. Use Examples (Multishot) — “Show, don’t just tell”

Why it works for trading code

4. Let the Model Think (Chain‑of‑Thought) — “Give it a whiteboard before the keyboard

Why CoT matters for strategy design

5. Use XML Tags — “Label every box so nothing gets mixed up”

Why tagging boosts trading prompts

Core tag set for strategy work

6. Prefill the Response — “Skip the small talk, start halfway down the page”

Why prefilling rocks for strategy generation

How to prefill

7. Chain Complex Prompts — “Solve one puzzle at a time, then hand it off”

Why chaining excels in strategy workflows

Minimal two‑link chain example

8. Long‑Context Tips — “Feed it the haystack without hiding the needle”

Key principles for 100 K‑token prompts

Example:

Recap: Prompt‑Engineering Playbook

The Good Stuff Is Just Ahead

This post is for paid subscribers

1. Give the Model a Role — “From generalist chatbot to in‑house quant”

3. Use Examples (Multishot) — “Show, don’t just tell”

 4. Let the Model Think (Chain‑of‑Thought) — “Give it a whiteboard before the keyboard

5. Use XML Tags — “Label every box so nothing gets mixed up”

6. Prefill the Response — “Skip the small talk, start halfway down the page”

7. Chain Complex Prompts — “Solve one puzzle at a time, then hand it off”

8. Long‑Context Tips — “Feed it the haystack without hiding the needle”

Key principles for 100 K‑token prompts