"

17 Chapter 1O: Using AI Chatbots in Data Analysis

Working with raw data can be overwhelming, requiring significant effort to make sense of unstructured and disorganized information. At first glance, trends, concepts, and patterns are not immediately visible which makes analysis a daunting task. Through data analysis, you can organize and structure your findings and transform scattered information into meaningful insights. This process allows you to extract key themes, validate conclusions, and generate reliable predictions.

AI has transformed data analysis into something faster and more approachable. What once required hours of manual effort can now be handled through AI-powered tools that automate the heavy lifting. This shift gives researchers space to focus on higher-order work such as interpreting findings and writing. In the past, analyzing qualitative data often meant investing in costly software and learning specialized techniques. Now, with AI, it is possible to carry out nuanced analysis with accuracy while sidestepping many of those technical hurdles.

Researchers are already putting these tools to work. Rasheed et al. (2024), for example, created a large language model–based system that uses multiple agents to manage different parts of the analysis. These agents sift through large sets of text, detect recurring patterns, and organize the information in clear categories. The result is a process that reduces manual workload and improves consistency. Their study showed that AI-driven analysis not only accelerates the pace of research but also expands its reach, making it easier to manage complex datasets while improving accuracy.

Expanding on this, Chew et al. (2023) introduced LLM-Assisted Content Analysis (LACA), which uses GPT-3.5 to automate deductive coding. Their study demonstrated that AI can achieve coding accuracy comparable to human researchers while also refining coding schemes, streamlining content analysis, and reducing manual effort.

Similarly, Törnberg (2023) examined how ChatGPT-4 identifies the political leanings of Twitter users and found that it performed better than both expert coders and crowd workers in terms of accuracy and consistency. The study shows that large language models can use context to make complex judgments, which in turn enables automated text analysis at a scale and depth of interpretation once thought possible only for human researchers.

Xiao et al. (2023) pushed AI’s role in qualitative research further by testing how GPT-3 supports deductive coding when paired with expert-developed codebooks. They found that GPT-3 reached fair to substantial agreement with human coders, which underscores its potential for large-scale qualitative analysis. They also tested how variations in prompt design influenced accuracy and stressed the value of structured AI-based coding frameworks.

Dai et al. (2023) extended this research with their LLM-in-the-loop framework for thematic analysis, integrating AI and human collaboration. Their approach used in-context learning (ICL) with GPT-3.5 to assist in code generation, refinement, and theme identification. Through two case studies: one analyzing music listening experiences and another on password manager usage, they demonstrated that AI-assisted thematic analysis achieves coding quality comparable to human-led approaches while significantly reducing time and labor. The study’s high inter-annotator agreement suggests that AI is a reliable tool for accelerating thematic analysis without sacrificing analytical rigor.

These studies demonstrate AI’s growing influence in research, particularly in qualitative data analysis. As LLMs continue to advance, their integration into research methodologies will become even more refined thus offering new possibilities for automation, pattern recognition, and predictive modeling.

Your choice of data analysis tools will depend on your research approach whether quantitative, qualitative, or mixed methods as well as the type of data you are working with, such as text, audio, video, or visual materials. While this book does not cover research methodologies in depth, the following sections will introduce a selection of AI-powered tools designed to support different data collection and analysis techniques, helping you streamline your research process.

I. Tips for Using AI in Data Analysis

Before we jump into the different AI tools, we need to share some important tips with you.  We always like to preface AI tools with a short section on best practices because having the right know-how is just as important as having the right tool. AI is an incredibly powerful tool, but if you don’t approach it with the right mindset and strategies, you risk misinterpreting data, introducing biases, or even compromising the integrity of your research. Familiarizing yourself with these foundational tips will help you make the most of AI in data analysis.

1.1. Practice with publicly available datasets

If you’re new to AI-powered data analysis, the best way to get comfortable with it is through hands-on experimentation. Instead of diving straight into your research data, start by practicing with hypothetical or publicly available datasets. One easy way to do this is by asking ChatGPT, Claude, or Gemini to generate a fake dataset tailored to your research area. You can use a prompt like:

“Generate a hypothetical dataset on [your topic], formatted as a CSV file, with [X] number of entries and columns for [variables you want].”

The AI will produce a structured dataset, which you can download as a CSV file and then upload into an AI data analysis tool we share in this chapter. This allows you to explore how the tool processes data, test its features, and refine your workflow before applying it to real research data. Alternatively, you can use online repositories that provide publicly available datasets, such as Kaggle.com and Data.gov. These platforms host a wide range of datasets that can be used for AI-assisted analysis. However, always review their usage policies to ensure compliance before using any data. Practicing with sample datasets helps you build confidence and fluency in AI-driven analysis. You’ll get a feel for how different AI tools work, learn what to expect in terms of insights and accuracy, and even compare multiple tools to decide which one best fits your research needs. Once you’ve gained familiarity, you’ll be in a much better position to start analyzing your actual research data.

1.2. Ensure AI understands your data

When you first upload your dataset to an AI tool, it’s important to verify that the system correctly understands your data before diving into deeper analysis. One effective way to do this is by asking the AI to generate a summary or extract key insights in a concise paragraph. For instance, if we were using Claude as our data analyst, we would upload our dataset and prompt it to summarize the data’s main patterns, trends, or anomalies. We’d use the same approach with ChatGPT and then compare their reports. This quick comparison helps ensure that the AI interprets the dataset correctly before proceeding.

Now, you don’t have to use multiple tools for this step, running the same analysis in two different AI models is optional. However, if you have the time and want an extra layer of verification, this approach can be useful, especially for smaller research projects. The goal here is not to analyze the data twice but simply to use AI tools for an initial summary check. If both models generate similar insights, you can confidently proceed with your preferred AI tool for further analysis. We find ChatGPT and Claude to be particularly effective for this task, but as you’ll see in this chapter, many other AI tools can assist in data analysis. The key takeaway here is to confirm AI’s understanding of your dataset early on, ensuring alignment between your research objectives and the AI’s interpretation before proceeding with deeper analysis.

1.3. Protect sensitive information

To leverage AI for data analysis, you’ll often need to upload your data in formats like CSV, Excel, or PDF to the AI platform of your choice. However, as we’ve discussed in the ethics chapter, sharing documents with AI tools comes with serious privacy considerations. Before uploading anything, you need to ensure your data won’t be used for model training, something not all AI platforms make transparent.

Fortunately, some tools offer opt-out options. For example, Claude does not use user inputs for training by default, while ChatGPT allows you to disable data retention in the settings. But no matter what privacy assurances an AI tool provides, you should always take your own precautions. When you upload data to AI, you’re essentially storing it in the cloud, on someone else’s server. Who owns that server? Who controls access to that data? These are valid questions to consider before handing over research materials, if ever.

To minimize risk, take a few precautionary steps: remove any identifiable information, anonymize sensitive data, and eliminate anything that could compromise privacy whether it’s yours, your institution’s, or your research participants’. AI tools are powerful enough to analyze anonymized data effectively, so why take unnecessary risks? Ensuring privacy and data security should always be a priority when incorporating AI into your research workflow.

1.4. Use effective prompt engineering

We’ve emphasized throughout this book that mastering the skill of prompt engineering is essential when working with AI tools. The quality of responses you get is only as good as the prompts you provide. If your prompt is vague, the AI’s output will be too. If your instructions are unclear, you might not get the insights you need.

We won’t go into the finer details of prompt engineering here, as we’ve covered it extensively in earlier chapters. If you’re interested in diving deeper into this skill, we’ve even written a book on the subject: ChatGPT for Teachers: Mastering the Skill of Prompt Engineering where we break down practical strategies for crafting effective AI prompts.

That said, as a quick refresher, here are key principles to follow when writing prompts for AI-driven data analysis:

  • Be clear and concise: Avoid vague or overly broad prompts; be as specific as possible.
  • Provide context: Give the AI enough background information to generate accurate and relevant responses.
  • Give specific instructions: If you need a particular type of analysis, specify the method or format you expect.
  • Iterate and refine: If the first response isn’t quite right, tweak your prompt and try again. AI works best through an interactive process.
  • Always fact-check AI-generated results: AI can analyze and summarize data quickly, but mistakes happen. Verify findings against reliable sources before using them in your research.

At the end of this chapter, you’ll also find a dedicated section with practical, ready-to-use prompts tailored for data analysis. These will help you get the most out of AI tools while ensuring precision and reliability in your research.

1.5. Watch out for AI biases

We’ve explored the issue of AI bias in depth in the ethics chapter, but it’s important to mention it here as well. Bias in AI-generated outcomes is a real and persistent problem, and as a researcher, you should always be aware of it and take steps to critically evaluate AI-generated insights.

Part of the problem lies in the training data used to build AI models, something you, as a user, have no control over. These models are trained on vast datasets, many of which reflect historical biases and systemic prejudices (Vallor, 2024). As a result, AI can unknowingly recycle and reinforce these biases in its outputs. While AI developers work to mitigate these issues, no bias-mitigation technique is perfect.

This is especially important when working with social or demographic data, where biased AI interpretations can lead to misleading conclusions. The best approach is to remain critical and skeptical, always cross-checking AI-generated insights with multiple sources and applying your own analytical judgment before drawing conclusions. AI is a tool, not an authority, your expertise remains essential in ensuring fairness and accuracy in your research.

1.6. Combine AI with human expertise

This point ties directly to the issue of AI bias and reinforces an important truth: AI is a tool, not a replacement for human judgment. While AI can process vast amounts of data, detect patterns, and automate calculations far more efficiently than a human could, it does not understand the context, nuances, or implications of the data in the same way you do as a researcher.

AI models operate based on probabilities and patterns derived from training data, meaning their outputs, while often impressive, lack deep reasoning, critical thinking, and ethical considerations. AI can only work with the data it is given, but you have the ability to question, cross-check, and contextualize findings. This is especially important when working with research that requires subjective interpretation, ethical considerations, or qualitative insights.

The best approach is to use AI as a collaborator, not a decision-maker. Treat it as an assistant that helps you process, analyze, and organize data, but always apply your own expertise to validate results, interpret insights, and draw conclusions. The combination of AI’s efficiency and your critical thinking is what ensures high-quality, reliable research.

II. Data Analysis AI Tools

For all of us in the world of research, we know that data analysis can be a headache. Whether you’re dealing with mountains of numbers, messy survey responses, or unstructured research findings, making sense of it all takes time, effort, and patience. Manually cleaning datasets, running statistical tests, and trying to spot patterns can feel like an endless grind. But here’s the good news: AI can take a huge chunk of that workload off your plate.

Now, you might be wondering, Which AI tool should I use? That depends on your research needs. If you want a versatile, conversational AI assistant, ChatGPT and Claude are great places to start. They can analyze your data, clean it up, run basic statistics, and even generate visualizations, all through simple prompts. If you need specialized analytics tools, options like Julius AI, Power BI, and Tableau can help you dig deeper into complex datasets, offering advanced statistical modeling and real-time data dashboards.

The best part? You don’t need to be a coding expert to use these tools. AI is making data analysis more accessible than ever, allowing researchers at all levels to leverage powerful techniques without having to master programming languages or statistical software. In the next section, we share a collection of some AI-powered tools that can help you with your data analysis. We start with  AI chatbots.

1. ChatGPT

When it comes to data analysis, ChatGPT offers a good place to start with. You can upload your data and get it to process, clean, analyze, and visualize it for you. The visuals generated are interactive meaning you can hover over charts to explore your data in a dynamic way. You can chat with it, ask questions, request changes, and get instant feedback. Let us walk you through some of the most useful things you can do with ChatGPT when analyzing data.

1.1. Cleaning and Structuring Your Data

One of the first things you’ll likely need to do is clean and structure your data and ChatGPT makes this simple. If your dataset includes missing values, duplicates, or inconsistent formats, you can ask ChatGPT to find and fix these issues. It can remove empty cells, standardize formats, correct mistakes, or flag outliers.

For instance, if your dataset has job titles written in different ways (like “Data Analyst,” “data analyst,” or “Analyst, Data”), you can ask ChatGPT to clean them up so they all follow the same style. You can also merge different datasets, combine columns, or restructure tables to better suit your analysis.

1.2. Exploring and Summarizing Insights

Once your data is cleaned up, ChatGPT can help you explore it and pull out key insights. You can ask simple questions like, “What are the main trends in this dataset?” or “Show me a summary of the sales by region.” It can calculate descriptive statistics like averages, medians, minimum and maximum values, standard deviations, and more. If your data includes categories, ChatGPT can show you frequency tables highlighting which values appear most often.

1.3. Creating Visualizations

You can ask ChatGPT to create charts and graphs from your data without writing a single line of code. These include bar charts, pie charts, scatter plots, histograms, box plots, heat maps. It can do them all. Even better, these charts are interactive. You can zoom in, hover over points to see exact values, and adjust the chart’s appearance just by asking. If a chart doesn’t look quite right, you can say things like “Change the colors” or “Add labels to the X-axis” and ChatGPT will update it instantly.

1.4. Running Statistical Analysis

Beyond basic summaries, ChatGPT can also help you run statistical tests and models. You can perform regression analysis, check correlations, run t-tests, conduct ANOVA, or even forecast future trends with time-series analysis. If you’re unsure which test to use, you can ask ChatGPT for advice based on your data and research question. Behind the scenes, it writes and runs Python code for these operations but you don’t need to touch the code unless you want to.

1.5. Working with Interactive Tables

Another feature that makes data work easier is how ChatGPT handles tables. When you upload your data, you can view it in an interactive table format. You can scroll through your data, highlight specific cells, and ask ChatGPT to calculate things like averages or totals from your selection. This makes navigating large datasets much more user-friendly compared to static outputs from other tools.

1.6. Generating Reports

After you’ve done your analysis, you can even ask ChatGPT to help write your research report. It can draft summaries of your findings, structure your results into sections, and format the report for clarity and flow. And if you’re using GPT-4o’s Canvas Mode, you can edit the draft right inside ChatGPT, fine-tuning the language or reorganizing content with ease.

1.7. Limitations to Keep in Mind

Of course, ChatGPT isn’t perfect. It works best with small to medium-sized datasets. Upload limits apply, you can only upload up to 10 files at a time, but very large or complex datasets might require more advanced tools. You’ll also need to prompt carefully. Clear, specific instructions lead to better results. And while ChatGPT is powerful, it’s not immune to errors. It can sometimes generate incorrect or misleading outputs, so it’s always good practice to manually double-check. Still, despite these limits, the combination of interactivity, flexibility, and ease of use makes ChatGPT one of the most accessible tools you can use to explore and analyze your research data.

2. Claude

Claude AI is another powerful tool for data analysis. It works similarly to ChatGPT but has some notable differences. While both AI models allow you to process data, extract insights, and generate visualizations, Claude AI operates using JavaScript for data processing, whereas ChatGPT relies on Python and its extensive numerical libraries. However, this occurs entirely in the background, you don’t need any technical knowledge or coding experience to use Claude. It is designed to be user-friendly allowing you to simply upload your data and interact with it through natural language prompts.

To use Claude’s data analysis features, you first need to enable Artifacts in your account. To do this, click on your initials in the lower-left corner, select Settings, and toggle on Artifacts under the Feature Preview section. Additionally, you must enable the Analysis Tool, which allows Claude to run code and analyze data. To activate it, go to Feature Preview in the same settings menu and switch on Analysis Tool. Once these features are enabled, you can upload CSV files, generate visualizations, and perform various analytical tasks directly within your Claude chat.

Like ChatGPT, Claude allows you to create interactive data visualizations directly within your chat. With a simple prompt like “Create a bar chart for this dataset,” Claude can generate an interactive visualization where you can hover over different data points for more details. This feature is useful if you need a quick way to explore your data without switching between different platforms. You can also edit, download, and share these visualizations which makes it easier to integrate them into your reports or presentations.

Claude also offers a built-in analysis tool, which enables basic computations, statistical summaries, and exploratory data analysis (EDA). You can upload CSV files and ask Claude to summarize trends, detect anomalies, or calculate key metrics such as mean, median, and standard deviation. However, compared to ChatGPT, Claude struggles with large datasets. While ChatGPT can handle files up to 512MB, Claude has a 30MB limit per file and can only process about 2,000 to 10,000 records at a time, even with its Pro version. If you’re working with big datasets, this limitation can be frustrating, especially when Claude refuses to process files that exceed its length constraints.

Despite its strengths, Claude’s data analysis tool still has major drawbacks. It has a tendency to hallucinate results when dealing with complex filtering or large datasets, sometimes generating incorrect insights or mislabeling data points. Additionally, its visualization capabilities while useful sometimes fail to display data labels correctly which makes it difficult to interpret results. Unlike ChatGPT, which can dynamically adjust chart formatting, labels, and colors, Claude’s output is more rigid and prone to truncating important information.

If your research involves lightweight data analysis, quick visualizations, or automated workflows, Claude AI can be a useful addition to your toolkit. However, if you need to process large datasets, run complex statistical models, or perform in-depth regression analysis, ChatGPT’s Advanced Data Analysis is the stronger choice. The best approach? Test both tools with your own research data and see which one aligns better with your workflow.

3. Prompt Samples

Here is a sample collection of prompts we prepared with the help of ChatGPT and Claude. Through several iterations of prompting, we refined these examples to make them as practical and effective as possible for academic research. You can use them with ChatGPT, Claude, or any AI tool that supports data analysis. Think of them as guiding posts, you can further tweak and adapt them to fit your specific research needs. We’ve divided them into two main sections: prompts for data analysis and prompts for data visualization.

  1. Exploring & Summarizing Your Data
  • “Give me an overview of this dataset, including the number of rows, columns, and missing values.”
  • “List all column names along with their data types. Identify which ones are numerical, categorical, or text-based.”
  • “Summarize the key trends in this dataset. What are the most frequently occurring values in categorical columns?”
  • “Calculate basic summary statistics for all numerical columns, including mean, median, mode, and standard deviation.”
  • “What percentage of the data is missing, and which columns have the most missing values?”
  1. Cleaning & Preprocessing Your Data
  • “Detect and remove duplicate rows in this dataset.”
  • “Fill missing values in numerical columns using the mean (or another imputation method).”
  • “Standardize date formats in this dataset so they are all consistent.”
  • “Identify inconsistencies in categorical variables and correct variations in spelling or capitalization.”
  • “Convert categorical variables into numerical values suitable for statistical analysis.”
  1. Detecting Patterns & Relationships
  • “Find correlations between numerical variables in this dataset. Display the strongest positive and negative correlations.”
  • “Generate a frequency distribution for [categorical column] and identify the most common categories.”
  • “Segment the dataset into meaningful clusters based on [column name] using a clustering algorithm.”
  • “Analyze seasonal or time-based trends in [time-series column]. Identify any patterns over time.”
  • “Perform a principal component analysis (PCA) to reduce dimensionality and highlight key patterns.”
  1. Statistical & Hypothesis Testing
  • “Conduct a t-test to compare the means of [two groups] in [column name].”
  • “Perform an ANOVA test to determine if there is a significant difference between multiple groups in [column name].”
  • “Run a chi-square test to check for independence between [categorical column A] and [categorical column B].”
  • “Conduct a regression analysis to determine the relationship between [independent variable] and [dependent variable].”
  • “Calculate confidence intervals for [numerical column] at a 95% confidence level.”
  1. Data Quality & Anomaly Detection
  • “Check for outliers in [column name] using the interquartile range (IQR) method.”
  • “Identify any errors, inconsistencies, or unrealistic values in the dataset.”
  • “Detect any data entry mistakes in numerical fields, such as negative values where they shouldn’t be.”
  • “Find and flag extreme values that may skew the results of the analysis.”
  • “Analyze data distributions to determine if any transformations (e.g., log transformation) are needed to normalize skewed data.”

Conclusion

In this chapter, we explored the transformative role of AI chatbots in data analysis, particularly within academic research. We began by addressing the challenges of working with raw, unstructured data and explained how AI had made the analysis process more efficient and accessible. Rather than relying on time-consuming manual methods or expensive software, we showed how researchers could use AI-driven tools like ChatGPT and Claude to clean data, extract insights, and generate visualizations, often achieving accuracy comparable to that of human experts. We discussed several studies that demonstrated how AI could perform complex tasks such as thematic coding and content analysis, revealing patterns and trends in large datasets with minimal human intervention.

We also provided an overview of how to use AI chatbots especially ChatGPT and Claude to analyze data and concluded with a curated collection of practical prompts we developed for AI-assisted data analysis. These prompts cover a range of tasks, from exploring and cleaning data to conducting hypothesis testing and detecting anomalies, and would definitely serve as useful templates to guide academic researchers in applying AI tools effectively in their research.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

The AI Turn in Academic Research Copyright © 2025 by Johanathan Woodworth and Mohamed Kharbach is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.