Open Now
Open Now
Watch now

Is ChatGPT more akin to a human librarian than to Google?

An expert in search engine technology discusses the potential benefits and drawbacks of entrusting ChatGPT and similar models with the task of conducting web searches on your behalf.

Before search engines became ubiquitous, the prevalent method of accessing and retrieving information was through interactions with librarians or subject matter experts who could offer personalized, transparent, and authoritative assistance. However, relying solely on search engines, which typically provide results based on unknown ranking algorithms in response to a few keywords, is far from ideal for most people.

Related Content

Maybe AI-Written Scripts are a Bad Idea?

The traditional search engine model of input and output is being disrupted by a new generation of AI-powered information access systems, such as Microsoft's Bing/ChatGPT, Google/Bard, and Meta/LLaMA. These systems can process full sentences and paragraphs as input and provide personalized, natural language responses as output.

Although the prospect of receiving customized and personable responses alongside the vast expanse of information on the internet may seem appealing, as a researcher who specializes in studying search and recommendation systems, I believe the reality is far more nuanced.

Artificial intelligence systems such as ChatGPT and Bard rely on large language models, which utilize machine learning techniques to identify patterns in vast amounts of available texts, such as Wikipedia and PubMed articles. By analyzing these patterns, these models can generate sentences, paragraphs, and even entire pages that correspond to a user's query. On March 14th, 2023, OpenAI announced their latest iteration of this technology, GPT-4, which is capable of processing both text and image inputs. Microsoft also revealed that their conversational Bing platform is built on the GPT-4 model.

‘60 Minutes’ looked at the good and the bad of ChatGPT.

Thanks to their training on massive text corpora and utilization of fine-tuning and other machine learning techniques, these information retrieval methods have proven to be highly effective. Large language model-based systems can generate personalized responses that cater to individual information queries. The results have been so impressive that ChatGPT has amassed 100 million users in a third of the time it took TikTok to achieve that same milestone. People have used these systems not only to find answers to questions but also to generate medical diagnoses, create personalized diet plans, and make investment recommendations.

ChatGPT’s Opacity and AI ‘hallucinations’

Despite their effectiveness, there are several downsides to using large language model-based systems. First and foremost, these systems rely on a mechanism that connects words and their meanings, resulting in output that can appear intelligent, but lacks true comprehension. These systems are known to generate responses that are essentially mimicked statements, without genuine understanding. In essence, while the output from these systems may appear intelligent, it is simply a reflection of the underlying patterns of words that the AI has identified as relevant to a particular context.

Large language model systems have a vulnerability in that they can generate false information or "hallucinate" responses due to their limitations. These systems also lack the intelligence to recognize and correct flawed premises in questions. As an illustration, ChatGPT mistakenly identifies Benjamin Franklin as the U.S. president whose portrait appears on the $100 bill, despite the fact that Franklin never held that position and the assumption that a U.S. president's image adorns the $100 bill is incorrect.

The issue lies in the fact that while large language model systems may be incorrect only 10% of the time, it is difficult to determine which 10% of responses are erroneous. Furthermore, individuals lack the capacity to swiftly verify the accuracy of the systems' answers due to their lack of transparency. These systems do not disclose the data on which they were trained, the sources from which they obtained information, or the methods they employ to generate their responses.

As an illustration, one can ask ChatGPT to compose a technical report that includes citations. However, the system often fabricates these citations by "hallucinating" both the titles of scholarly papers and the names of their authors. Moreover, these systems lack the ability to verify the accuracy of their responses. Consequently, the responsibility of validating responses falls on the user, who may lack the motivation, expertise, or awareness to check the system's outputs. ChatGPT is unable to discern when a question is nonsensical because it lacks factual knowledge.

AI stealing content – and traffic

The absence of transparency in large language model systems can be detrimental not only to their users but also to the authors, artists, and creators of the original content from which the systems learned. These systems do not disclose their sources or provide proper attribution, which is unfair to the creators. In many instances, the creators are not acknowledged, compensated, or given the chance to grant their consent.

There is also an economic aspect to consider. In a typical search engine setting, the results are presented along with links to the sources. This enables users to confirm the accuracy of the answers and provides attribution to the sources, while also generating traffic for those sites. Many of these sources rely on this traffic as a source of revenue. However, since large language model systems offer direct answers without disclosing the sources they used, it is probable that these sites will experience a decline in their revenue streams.

Large language models can take away learning and serendipity

Lastly, this novel approach to accessing information may have the unintended consequence of disempowering individuals and depriving them of opportunities to learn. In a typical search process, users can explore a range of possibilities for their information needs, which often leads them to refine their search criteria. This process enables users to gain a better understanding of what is available and how various pieces of information are interconnected to achieve their objectives. Moreover, it facilitates accidental discoveries and serendipitous learning experiences.

These are crucial aspects of the search process, but when a system provides results without revealing their sources or guiding the user through the search process, it denies them these valuable opportunities.

Large language models represent a significant advancement in information accessibility, granting individuals the ability to engage in natural language-based interactions, receive tailored responses, and uncover insights and patterns that may otherwise be difficult for the average user to identify. However, these models possess significant limitations due to the way they learn and generate responses, which can result in inaccurate, toxic, or biased answers.

Although other information access systems may also experience these issues, large language model AI systems are particularly problematic due to their lack of transparency. Moreover, their natural language responses can create a misleading sense of trust and authority that may pose a significant risk to uninformed users.

Interested in learning more about AI, chatbots, and the future of machine learning? Explore our comprehensive coverage of artificial intelligence or peruse our guides on The Best Free AI Art Generators and Everything We Know About OpenAI's ChatGPT.

Chirag Shah, Professor of Information Science, University of Washington

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Sourse Gizmodo

Follow us on Google News