LibGuides: First-Year Foundations: AI Literacy and Data Literacy

AI Literacy and Data Literacy

What is Artificial Intelligence?

Whether you know it or not, artificial intelligence (AI) is already a large part of your daily life. Much of it is invisible to you, but it's working behind the scenes personalizing your video and music recommendations, customizing your social media feeds, analyzing your spending habits to detect fraud, and touching up your posts with photo filters.

So, what is artificial intelligence?

Broadly speaking, artificial intelligence refers to the development of computer systems that can perform tasks that would typically require human intelligence. These systems are designed to learn, reason, and make decisions based on large amounts of data.

Here is a good starting point to understand AI:

Artificial intelligence is the design, implementation, and use of programs, machines, and systems that exhibit human intelligence, with its most important activities being knowledge representation, reasoning, and learning. Artificial intelligence encompasses a number of important subareas, including voice recognition, image identification, natural language processing, expert systems, neural networks, planning, robotics, and intelligent agents.

Source: George M. Whitson. “Artificial Intelligence.” In Salem Press Encyclopedia of Science, 2023. https://discovery.ebsco.com/c/wnnu3f/viewer/html/vojdkpt4vj.

What is AI Literacy?

Some fundamental abilities that are useful to all students in today's information environment are:

1. Understand the basics of how AI works
2. Use AI effectively and ethically
3. Make informed decisions about using AI technologies

Source: Hennig, Nicole. “AI Literacy: May 17 Webinar.” Nicole Hennig, March 16, 2023. https://nicolehennig.com/ai-literacy-may-17-webinar/.

One of the most important things to understand about AI is that the decisions made by AI are based on probability and statistics. So, no matter how advanced the system is or how much data went into feeding the machine learning of a particular AI program, the decisions made by AI can not attain the same kind of nuanced and creative reasoning that can be accomplished by a human. Probability-based decisions are often correct, but not always--so there will be times when AI tools make errors or even spread misinformation.

Source: Education Week. “AI Literacy, Explained.” May 10, 2023, sec. Technology, Classroom Technology. https://www.edweek.org/technology/ai-literacy-explained/2023/05.

So, why are we talking about AI literacy?

In the last couple of years, there have been significant advancements in AI that have impacted the daily lives of students. In particular, the release of tools that fall under the category of generative AI. Generative AI refers to a specific class of AI tools that are able create new content. Here is a good definition:

Generative A.I.: Technology that creates content — including text, images, video and computer code — by identifying patterns in large quantities of training data, and then creating original material that has similar characteristics. Examples include ChatGPT for text and DALL-E and Midjourney for images.

Source: Pasick, Adam. “Artificial Intelligence Glossary: Neural Networks and Other Terms Explained.” The New York Times, March 27, 2023, sec. Technology. https://www.nytimes.com/article/ai-artificial-intelligence-glossary.html.

By now, you've probably heard of ChatGPT and likely also heard caution against using it for your coursework. But what is it, and how can you ethically use ChatGPT as a student? Read on.

Note: this section discusses ChatGPT as an example of a generative AI chatbot. It is not the only one that exists, but it's the one that is the most well-known at the moment.

What is ChatGPT?

If you know something about ChatGPT, you probably know that it is a generative chatbot. You feed it text prompts, and it generates its own text responses. There are all kinds of applications for this technology, from creative endeavors to research assistance (you'll read more about the ethical use of ChatGPT on the following page of this module).

I asked ChatGPT to describe itself in simple terms:

ChatGPT is a computer program that can understand and respond to natural language. It was trained by being fed a huge amount of text from the internet, and it uses that information to answer questions, have conversations, and even generate creative writing. Think of it like a very smart and knowledgeable friend who you can ask anything you want and it will try its best to give you a helpful answer.

For example, if you ask ChatGPT "What is the capital of France?", it will respond "The capital of France is Paris." It can answer questions about many different topics, such as history, science, sports, and more. It can even help you write stories or jokes.

ChatGPT is not perfect, and sometimes its responses might not make sense. But overall, it's a pretty amazing tool that shows how far computer technology has come in understanding and using human language.

What is a Large Language Model?

Generative AI chatbots, like ChatGPT, are powered by Large Language Models (LLMs) that use vast amounts of text data to understand and generate human-like responses. These advanced AI models enable chatbots to carry on coherent and contextually relevant conversations.

LLMs are built using deep learning techniques, trained on diverse text--or language--datasets to excel in various language tasks. The data is processed through a neural network, or more specifically, a "transformer" architecture (note, the "T" in GPT stands for "transformer"). Here's a good and simple explanation of how it works:

Specifically, a transformer can read vast amounts of text, spot patterns in how words and phrases relate to each other, and then make predictions about what words should come next. You may have heard LLMs being compared to supercharged autocorrect engines, and that's actually not too far off the mark: ChatGPT and Bard don't really “know” anything, but they are very good at figuring out which word follows another, which starts to look like real thought and creativity when it gets to an advanced enough stage.

Source: David Nield. “How ChatGPT and Other LLMs Work—and Where They Could Go Next.” Wired, April 30, 2023. https://www.wired.com/story/how-chatgpt-works-large-language-model/

ChatGPT is currently the most popular of the free, public large language models (LLMs). Other big names in LLMs include Google Gemini, Anthropic Claude, or Microsoft Copilot, but others are focused more on research, like Perplexity, Consensus, and Elicit.

So, is ChatGPT a search?

Importantly, ChatGPT is not a search engine. Whereas a search engine like Google will seek out existing text to answer your query, ChatGPT or other LLMs generate brand new text based on the probability of what it determines would be the most plausible response based on the text that was fed into its programming. However, efforts to combine generative AI LLMs with search tools are already appearing and research in this area is accelerating. Chapman University makes Microsoft's Copilot available to all students. Copilot adds the capabilities of the GPT-4 LLM to Microsoft's existing search engine Bing. Another example of integrating search capabilities with an LLM is the research-oriented tool Perplexity. Finally, on July 25, 2024, OpenAI--the company behind ChatGPT--announced SearchGPT, its first tool aimed squarely the search engine market.

Now that you have some foundation to understand what ChatGPT (or other LLMs) are capable of, you're probably wondering how you can use it. Read on in the next page.

What are some ethical ways a student can use ChatGPT (or other generative AI)?

Generate ideas to get started with research or creative pursuits
Ask for keywords to simplify a search to get better search results
Ask for ways to expand a topic into new research directions
Ask for suggestions to improve your writing or find weak spots in your research

All of these methods are useful in developing your research topic using ChatGPT or another generative AI tool. Notice that ChatGPT can be ethically used in these ways that occur throughout the research process.

What are ways to use ChatGPT (or other generative AI) that should be avoided?

Asking the bot to write all or part of an essay
- Not only is this unethical, but it probably won't be very good. The chatbot hasn't been in your class and doesn't know the context of what you're supposed to have learned throughout the semester.
Asking the bot to search for your sources
- This isn't necessarily unethical, but it's not a recommended method. ChatGPT will often make up sources that look real but are actually non-existent. Other generative AI bots like Gemini or Copilot will point you to internet resources, but they will not be able to find articles that are scholarly or that are as good as what you could find through human evaluation or through the use of library resources. Doing the research yourself will allow you to search creatively and will produce better results.

Gray areas in the ethical use of generative AI:

Simplify the language of a text in order to understand it better
- While this is a great use of the technology to aid in learning, there may be copyright concerns if the bot incorporates copyrighted text into its knowledge base.

What is Data Literacy?

In our research and our daily lives, we constantly interact with data, but we rarely reflect on how well we understand it.

In order to critically consume, produce, and think about data, we need a basic framework for understanding it. Data literacy is our ability to interpret and understand data. On this page, we’ll briefly focus on a few related key concepts.

First of all, we need some basic terminology.

We rarely encounter raw data in our everyday lives. Instead, we see data represented as numbers and charts that are meant to tell a data story or provide evidence for a claim.

A single data point, or datum, can be almost anything: a measurement, an observation, a response to a survey question. Usually, a single data point doesn’t tell us much. But a collection of data points, or data, has the potential for us to make larger observations or draw conclusions about the information that we collected.

When we collect similar or related data together, we have a data set.

To illustrate these terms, here’s an example:

"Suppose you were out hiking and you accidentally fell and broke your arm. Luckily, your friends are there to take you to the emergency room and help you fill out the mountain of paperwork the nurse hands you. Each answer you scribble down gives the doctor data they can use to decide how to help heal your arm. All the information together makes up a data set on you and your medical history.

Now some questions, like “Age?” or “Height?” give the doctor quantitative data or data that’s represented by numbers, or quantities.

But not everything about us can be written as a number. Like the question: how the heck did you break your arm? Information that’s describing qualities something has or a category it belongs to is called qualitative data. …

And after you make it through all the paperwork, the doctor might add data in other ways. If they take an x-ray, this photo becomes a data point in your medical history. Or they might record an audio clip of your heartbeat or video of an ultrasound. Your file has a long story to tell."

Source: Arizona State University: (“What are Data and Data Literacy”)

We encounter data everywhere in our lives, yet we don't always possess the skills to confidently interpret the facts and figures we see in news stories.

A common technique used in understanding a dataset is describing with measures of central tendency. One of the most common measures is the mean, which is often informally referred to as the average. Averages are useful in reporting because they summarize a large amount of data into a single value; and reflect there is variability around this single value within the data.

Knowing the definition of average and how it’s calculated allows you to understand that the number doesn’t reflect all items in a dataset. Understanding what is meant by average can help you to appreciate the importance of Data Literacy and will help you to comprehend the information presented to you in data journalism.

This video is used with the permission of its creator, Genevieve Milliken, Data Services Librarian, NYU Health Sciences Library.

"if you torture the data long enough, it will confess to anything" - economist Ronald Coase

A common misconception is that data is purely objective. We see charts and naturally feel that the information we're seeing is truthful and persuasive. But take caution! As Alberto Cairo says in his book, How Charts Lie:

Politicians, marketers, and advertisers throw numbers and charts at us with no expectation of our delving into them: the average family will save $100 a month thanks to this tax cut; the unemployment rate is at 4.5%, a historic low, thanks to our stimulus package; 59% of Americans disapprove of the president's performance; 9 out of 10 dentists recommend our toothpaste; there is a 20% chance of rain today; eating more chocolate may help you win the Nobel Prize (Cairo, xi).

Source: Cairo, A. (2019). How charts lie: Getting smarter about visual information (First edition). W. W. Norton & Company, Inc.

In fact, sometimes even real, accurate data can be used to deceive.

Take a look at these maps of the 2020 Presidential Election results by county. Does it look like most of the country voted red or blue?

2020 election results bubble chart by territory

“The [first] map is misleading because it’s being used to represent the citizens who voted for each candidate, but it doesn’t. Rather, it represents territory” (Cairo, pp. 1-4).

The second map represents counties according to their population, with the largest bubbles being the highest populated counties.

Each map seems to suggest a triumph by a landslide, but it wasn’t a landslide at all. The popular vote was split nearly in half:
Share of the popular vote in the 2020 presidential election
Joe Biden: 51.3% 81,283,501 votes
Donald Trump: 46.8% 74,223,975 votes

Source for maps:

- https://engaging-data.com/county-electoral-map-land-vs-population/
- https://engaging-data.com/sizing-states-electoral/

So, what's the point?

Not everyone will become an expert at reading graphs or understanding statistics. But that doesn't mean you can't develop some healthy skepticism in order to be an informed information consumer. Just remember, instead of accepting charts and statistics right away, try to question where the data came from, how they might have been collected, and whether they are accurately depicted in the source you're seeing.

How can I Improve My Data Literacy?

There are an enormous number of introductory tutorials, videos, and webinars available to you through a quick internet search. If you would like to dig deeper into the subject, SAGE Campus gives Chapman University students the ability to create a free account to take demo courses. Some of the many available courses are listed below:

See Numbers in Data
Statistical Significance
Introduction to Data Visualisation

Where can I Find Data?

Check out the Leatherby Libraries guide on Data Sets and Resources to find all sorts of data for your research!

Check your understanding of Research by completing the practice quiz below.

You may also open the quiz in a new tab or window using this link: AI Literacy and Data Literacy - FFC Practice Quiz