remove context window statement due to unclear official info

2025-12-06 04:22:07 +00:00 · 2025-06-09 22:45:37 +01:00
parent 1a85f41c3d
commit e24f40507d
1 changed files with 1 additions and 3 deletions
--- a/src/routes/blog/post/chatbot-with-webllm-and-webgpu/+page.markdoc
+++ b/src/routes/blog/post/chatbot-with-webllm-and-webgpu/+page.markdoc
@@ -134,15 +134,13 @@ Notice that in the `div` with class `controls`, we have a `select` element for m
 | Phi-3.5-mini | 3.8 billion  | ~2,400 MB        | ~3,700 MB       |
 | Llama-3.1-8B | 8.03 billion | ~4,900 MB        | ~5,000 MB       |

-All models run with a 4,096-token context window when used in WebLLM, due to browser implementation constraints.
-
 When you're deciding which of these models to use in a browser environment with WebLLM, think first about what kind of work you want it to handle.

 **SmolLM2-360M** is the smallest by a wide margin, which means it loads quickly and puts the least strain on your device. If you're writing short notes, rewriting text, or making quick coding helpers that run in a browser, this might be all you need.

 **Phi-3.5-mini** brings more parameters and more capacity for reasoning, even though it still runs entirely in your browser. It's good for handling multi-step explanations, short document summarisation, or answering questions about moderately long prompts. If you're looking for a balance between size and capability, Phi-3.5-mini has a comfortable middle ground.

-**Llama-3.1-8B** is the largest of the three and carries more of the general knowledge and pattern recognition that bigger models can offer. It's more reliable if you're dealing with open-ended dialogue, creative writing, or complex coding tasks. But you'll need more memory, and it still caps out at the same 4,096-token context window as the others when you're using it in WebLLM.
+**Llama-3.1-8B** is the largest of the three and carries more of the general knowledge and pattern recognition that bigger models can offer. It's more reliable if you're dealing with open-ended dialogue, creative writing, or complex coding tasks. But you'll need more memory.

 Each of these models trades off size, memory use, and output quality in different ways. So choosing the right one depends on what your hardware can handle and what kind of prompts you plan to work with. All can run directly in modern browsers with WebGPU support.