LLMs in research activities (papers, grants, reports)
The ginni is out of the bottle, let's get our wishes straigtened out!
Looking back with the wisdom of the first quarter of another year, I am convinced that we learn to live with LLMs (Large Language Models) and AI. While they are still problematic to use in an unsupervised fashion, most people derive sufficient value from their use and will be using them increasingly to automate tasks, including scientific tasks. Having the luxury of executing repetitive tasks fast, and the illusion that these tasks were done correctly by the #LLMs, asking individuals to go back to 2019 is simply NOT going to happen. In this context, I feel that it makes perfect sense to maximize transparency and perhaps leverage the opportunity to up our scientific education game, by not only asking humans to declare the use of LLMs in their scientific activities, but also show HOW these products were used in a particular project. In a sense LLMs have/will become "materials and methods"of the years past and we should treat them and their output as such.
From a practical perspective, declaring the utilization of LLMs could take the form of setting up repositories (#github/ #zenodo) etc that store the prompts used and the output received from the chatbot (including the chatbot version and the gateway used to converse with it). For example, if one were to use a chatbot to develop the plan for a scientific report and/or the first draft, the prompts and the output should be made public as research methods & supplementary material. Having access to this information, a third party could deploy standard differencing tools either automatically (e.g. as part of the review process) or even manually (post review phase) to show how the final product changed compared to the LLM output that was first received. Since most interactions with the AI chatbots are interactive, one could go one step further and include all iterations of prompting/response, as a form of “versioning” of the interaction. Even if one did not want to go down the fancy way, just contrasting the LLM output vs the final product would go miles to allow reviewers and the public at large to weigh the intellectual contribution of the human over the AI slop/hallucinations (or even AGI awesomeness😜, if you are in part of this “cult”). Watching how the product evolved, may provide some educational opportunities to our learners, many of whom did not go through their formative training WITHOUT an LLM and are thus addicted to simply using AI slop as the final product of assignments/reports or internal drafts(*). Learners may benefit by seeing how the tools are best used in practice to automate production of "research boilerplate" and maximize productivity, but without taking the human expert out of the equation. In case you are still not-convinced by my thesis, here is the non-nuanced version: LLMs are non-linear statistical analyzers and pattern generators; treating them as yet another piece of software that can increase productivity, while maintaining transparency of the software's contribution to the final product is just 2025’s version of versions of github repositories and preprint archives.
Here is a nuanced hope: by recording our iterative interactions with LLMs, and sharing these interactions *publicly* so that the human insights may further be ingested, we may actually solve the problem of training slop on AI slop. Since we cannot fight the use of the LLMs, let’s at least try to maximize their potential in improving our work in science in manner that is compatible with the traditions of transparency and openness of the field.
Disclaimer: I did not use any #LLMs to write this. I own the foolishness, naivete, grammatical and syntactic errors.
(*) This is actually a major problem at my daughter’s middle school.