Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Sure, but isn't that moving the goalposts? Why shouldn't we use LLMs + tools if it works?

Personally i do not see it like that at all as one is referring to LLMs specifically while the other is referring to LLMs plus a bunch of other stuff around them.

It is like person A claiming that GIF files can be used to play Doom deathmatches, person B responding that, no, a GIF file cannot start a Doom deathmatch, it is fundamentally impossible to do so and person A retorting that since the GIF format has a provision for advancing a frame on user input, a GIF viewer can interpret that input as the user wanting to launch Doom in deathmatch mode - ergo, GIF files can be used to play Doom deathmatches.





At the end of the day LLM + tools is asking the LLM to create a story with very specific points where "tool calls" are parts of the story, and "tool results" are like characters that provide context. The fact that they can output stories like that, with enough accuracy to make it worthwhile is, IMO, proof that they can "do" whatever we say they can do. They can "do" math by creating a story where a character takes NL and invokes a calculator, and another character provides the actual computation. Cool. It's still the LLM driving the interaction. It's still the LLM creating the story.

I think you have that last part backwards, it is not the LLM driving the interaction, it is the program that uses the LLM to generate the instructions that does the actual driving - that is the bit that makes the LLM start doing things. Though that is just splitting hairs.

The original point was about the capabilities LLMs themselves since the context was about the technology itself, not what you can do by making them part of a larger system that combines LLMs (perhaps more than one) with other tools.

Depending on the use case and context this distinction may or may not matter, e.g. if you are trying to sell the entire system, it probably is not any more important how the individual parts of the system work than what libraries you used to make the software.

However it can be important in other contexts, like evaluating the abilities of LLMs themselves.

For example i have written a script on my PC that my window manager calls to grab whatever text i have selected on whatever application i'm running and passes it to a program i've written in llama.cpp to load Mistral Small with a prompt that makes it check for spelling and grammar mistakes which in turn produces some script-readable input that another script displays in a window.

This, in a way, is an entire system. This system helps me find grammar and spelling mistakes in the text i have selected when i'm writing documents where i care about finding such mistakes. However it is not Mistral Small that has the functionality of finding grammar and spelling mistakes in my selected text, it only provides the part that does the text checking, the rest is done by other external non-LLM pieces. An LLM cannot intercept keystrokes in my computer, it cannot grab my selected text nor can create a window on my desktop, it doesn't even understand these concepts. In a way this can be thought as a limitation from the perspective of the end result i want, but i work around it with the other software i have attached to it.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: