ASK-Y is an AI-native analytics platform that uses context engineering to deliver accurate and explainable insights.

ASK-Y is designed for analysts, product managers, and business teams who want to leverage AI without losing analytical accuracy.

Learn about LLM

Strategic Prompting

Want to know how to make an LLM remember precise details from a long conversation?
Analytics workflows with LLMs automatically mean very long context windows.

What does the new video illustrate?
In the new video, The Floofies tackle a Thanksgiving casserole crisis to explain what most people get wrong about LLM memory: the difference between context windows and attention mechanics.

What’s the real problem with “LLM memory”?
It isn’t a lack of memory – modern context windows can hold huge amounts of information. The real issue is attention dilution.

What is attention dilution?
When context gets long, important details buried in the middle are less likely to be attended to and can get effectively “lost.”

Why do vague questions make it worse?
Vague follow-ups don’t give the model clues about what to focus on, so attention spreads thinly instead of locking onto the key details.

What’s the fix?
Strategic prompting: structure questions using “search” and “list,” provide multiple clues, and restate critical information at the end where attention is naturally strongest. That moves key details to a high-attention zone and creates backup retrieval paths through the transformer.

What should AI analysts take from this?
Long context windows are powerful, but attention mechanics determine what the model actually pulls out. Analytics workflows naturally produce long contexts, so you need increasingly strategic framing for follow-up questions as information accumulates.

How does Prism Prompting fit in?
If you’re using a generic LLM, you can learn the Prism Prompting Framework in the guide, or drop the article into your project files and prompt the model to use Prism – it’s structured for that purpose.
Try Prism, where specialized agents handle attention management for you.

Analytics workflows, LLM memory optimization, Long-context LLMs, Memory mechanics

Episode Transcript

(0:02) Attention, please! (0:04) Marcus has one job this Thanksgiving. (0:07) Bring the sweet potato casserole. (0:09) Easy, right? (0:10) Sure, if you are Martha Stewart.(0:13) So Marcus needs some help, (0:15) and turned to the Floofies. (0:17) Give me a sweet potato casserole recipe. (0:20) The Floofies' ears twitched.(0:22) The Floofies' ears can hear and remember (0:24) mountains of words. (0:26) When talking to the Floofies, (0:27) short questions don't mean short context. (0:30) And while context windows are long, (0:32) attention is limited.(0:34) This is attention dilution, (0:36) why Ask-Y Prism manages context (0:38) automatically for digital analysts. (0:43) The portal converted Marcus' question (0:46) into small groups of letters called tokens (0:48) and sent them through to the Floofies. (0:50) Answer generation began.(0:52) All tokens cycled through the transformation layers, (0:56) out popped one new token, (0:57) then all tokens, including the new one, (0:59) cycled again, over and over until, boom, (1:03) a full recipe materialized. (1:06) Sweet potato casserole, 300 tokens. (1:10) But as Marcus read through the recipe (1:12) and thought it was two calorie rich, (1:14) one with less sugar, (1:17) the Floofies' ears perked up, (1:19) hearing everything the original 300 tokens (1:21) plus Marcus' new question, 10 tokens.(1:24) All 310 tokens cycled through the layers together (1:28) to generate the revised answer, (1:31) compiling recipes with less sugar. (1:34) Marcus got a good recipe option with less sugar, (1:37) but thought it might be too dry, (1:39) so he revised his search to (1:41) actually sweet potato casserole using butter. (1:45) Now, the first recipe and the low sugar recipe (1:49) and the new question all queued up, (1:52) traveling through the layers (1:54) to generate the butter-rich version.(1:56) Marcus ping-ponged through possibilities (1:58) like a caffeinated squirrel, (2:00) and each recipe sparked a new idea. (2:03) More cinnamon. (2:05) Big marshmallows.(2:07) Small marshmallows. (2:09) Vegan instant pot sweet potato casserole. (2:11) Individual ramekins.(2:12) Mashed. Cubed. (2:14) Wait, are sweet potato fries at Thanksgiving a thing? (2:17) The Floofies were on a wild goose chase (2:20) and overflowing with possibilities.(2:23) Then Marcus realized the third recipe (2:25) was the one he would go with, (2:27) but wondered if he needed to buy more cinnamon. (2:31) How much cinnamon is in the third recipe? (2:34) The Floofies' ears heard Marcus' four little words perfectly (2:38) and looked at the whole casserole novel, (2:40) all 75,000 tokens. (2:44) But their eyes had a massive problem.(2:47) For the Floofies to work at maximum efficiency, (2:50) they need to focus on only the relevant tokens (2:53) to figure out which information matters. (2:56) This is so because of the golden Floofie rule. (2:59) Every token needs some attention.(3:02) So the Floofies were still looking at all 75,000 things (3:06) and were unable to focus. (3:08) This is called attention dilution. (3:11) But I only had a short question.(3:14) And Marcus' short question? (3:15) This is not helping the Floofies' eyes focus one bit. (3:20) The Floofies squinted. (3:22) They spun.(3:23) Their eyes darted frantically across 75,000 tokens, (3:27) like someone searching for their keys (3:28) in a stadium full of identical bags. (3:32) Um, we're seeing cinnamon mentions everywhere. (3:35) Each token is a small group of letters.(3:38) Think of them as leaves. (3:40) Marcus' short questions created 75,000 leaves (3:43) that we have to examine (3:44) to find the cinnamon amounts in the third recipe. (3:47) Marcus would get a better answer by asking... (3:50) Search our conversation (3:51) and list all the sweet potato casserole versions we discussed, (3:53) specifically recipe three, the butter-rich one.(3:56) What's the exact cinnamon amount in that version? (3:58) This helps the Floofies organize the chaos before answering. (4:03) Search and list make their eyes scan comprehensively (4:07) through everything they heard. (4:09) Multiple clues.(4:10) Recipe three, butter-rich. (4:12) Extra butter. (4:13) Give their eyes several different ways (4:15) to spot the same recipe among 75,000 competing tokens.(4:20) When the Floofies generate the summary, (4:23) those fresh tokens land at the end of the context window, (4:26) exactly where their eyes naturally focus strongest. (4:31) During the great training, (4:32) Floofies learned to focus on beginnings and endings, (4:35) where important information usually lives. (4:39) Recipe three's cinnamon measurement was lost in the middle, (4:43) where their eyes barely look.(4:46) Now it's repositioned to that high-attention end zone. (4:50) This repositioning solves another problem. (4:53) As tokens pass through the Floofies' internal processing layers (4:56) during generation, (4:57) weak attention in the middle lets one tablespoon (5:00) blur into vague spice.(5:02) Strong attention at the end (5:03) keeps one tablespoon precise through every layer. (5:06) The multiple clues create backup pathways, (5:10) so even if one detail blurs through the layers, (5:13) others stay sharp. (5:15) Marcus learned his lesson.(5:17) He restructured his question. (5:19) The Floofies' eyes lit up. (5:22) Recipe three, sweet potato casserole.(5:24) Butter-rich version. (5:26) Cinnamon. (5:27) One tablespoon.(5:28) Crystal clear. (5:30) Marcus made the casserole. (5:32) It was perfect.(5:33) Thanksgiving was saved. (5:35) The analysts' anti-casserole disaster cheat sheet. (5:40) When your context window fills up faster (5:43) than your uncle's plate at the buffet, remember, (5:46) Floofy ears can hold everything, (5:48) but Floofy's eyes need help focusing.(5:50) Quick fixes. (5:51) Structure your questions. (5:53) Use search, list, summarize.(5:56) Provide multiple clues. (5:58) Give those eyes several ways to find what you need. (6:01) Restate critical info.(6:02) Move it to the end where eyes naturally focus. (6:06) Or just use Ask-Y's PRISM platform, (6:09) where specially trained Floofies (6:11) handle all this attention chaos for you, (6:14) like having a Thanksgiving coordinator (6:16) who remembers everything without you (6:18) repeating yourself 75,000 times. (6:22) Just use Ask-Y, (6:23) because your analytics should be easier (6:25) than Thanksgiving dinner prep.

Katrin Ribant
November 20, 2025