Watching the recorded session about the use of LLMs of the personal knowledge management course I am following this fall provided an interesting question.
Fellow participant H asked different models questions about a paper he uploaded (and also wrote, so he knows what’s in it). One question was to give a summary, one was a highly targeted question for a specific fact in the paper.
He did so first in GPT4All both with local and with online models (ChatGPT etc.). The local models were Llama and Phi.
Here the local models summarised ok but failed the specific question. The online models in contrast did succeed at the targeted question.
He then did the same in LM Studio, and with the same local models got a different result. Both local models now performed well both on the summary and at the targeted question.
So same LLM, same uploaded paper, but a marked difference in output between GPT4All and LM Studio. What would make the difference? The tokenizer that processed the uploaded paper? Other reasons?