Streamlit & PandasAI - Prompt-Driven Data Analysis
In this post, we will explore how to build a Streamlit application that uses PandasAI, and let's users prompt for insights into a dataset. PandasAI is essentially a small tool that allows you to integrate both Pandas and OpenAI's language models for analytical purposes.
This application enables users to upload a CSV file, and then input a text-based prompt to query the dataset with PandasAI, and then view (or visualize) the results.
We'll walk through the code step-by-step, explaining each feature along the way.
The associated video for this post can be found below:
Objectives
In this post, we'll see how to:
- Build a Streamlit UI that lets users upload a file, enter a text-based prompt, click a button and display a spinner while requests are in-flight
- Take a user-prompt, and get some insight into the uploaded data via PandasAI
- Setup PandasAI to integrate with OpenAI language-models.
Setup and Building a Streamlit UI
To start with, let's install the dependencies into a virtual environment with the following command:
pip install pandasai streamlitAfter this, create a main.py file that will contain our Streamlit application code.
Let's import both streamlit and pandas into this file, and set a title for the application with the st.title() function.
import streamlit as st
import pandas as pd
st.title("Prompt-driven data analysis with PandasAI")
Now, the goal here is to allow users to upload a file. So we need a Streamlit UI widget that allows us to upload the file - we can create this with the st.file_uploader() function. This provides a file-upload input field.
uploaded_file = st.file_uploader("Upload a CSV file for analysis", type=['csv'])
This will display a widget on your page, allowing the user to upload a file.
We specify the allowed file type as 'csv' to ensure that users can only upload CSV files. If a file is uploaded, it is stored in the uploaded_file variable.
Now, if a file has been uploaded, we can add some logic in an if-statement to convert the CSV data to a Pandas DataFrame. Let's add that code, now:
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
st.write(df.head(3))
We use the pd.read_csv() function to create the DataFrame, and then we display the first 3 rows of the DataFrame using the df.head(3) call. This provides a preview of the uploaded data, written out to our UI with the st.write() function.
User Prompts
We can now ask our users for a text-based prompt. We'll pass the text to PandasAI, along with our DataFrame, to initiate a call to the OpenAI APIs and get back some results.
Let's add a st.text_area() widget to our page, along with a st.button() widget that let's users submit the prompt:
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
st.write(df.head(3))
# new code below...
prompt = st.text_area("Enter your prompt:")
# Generate output
if st.button("Generate"):
if prompt:
st.write("Generating response...")
else:
st.warning("Please enter a prompt.")
When the user clicks the button, the st.button() if-statement will evaluate to True, and the script will then check if a prompt has been entered by the user, and display a message to users.
Let's check how this UI looks:
After uploading the CSV file (of Titanic data, in this case), we get a preview of the DataFrame, and then a prompt that allows the user to type in their question/prompt. Below, we have the button.
Now, when we actually click the button and submit our prompt, we want to use PandasAI to get some results based on the prompt.
Let's add that now.
Submitting Prompt and DataFrame with PandasAI
To do this, we need an API key from OpenAI - this can be done here.
Once we have an API key, we need to use the key and create an OpenAI object, as below:
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
# create an LLM by instantiating OpenAI object, and passing API token
llm = OpenAI(api_token="sk-YOUR_KEY_HERE")
# create PandasAI object, passing the LLM
pandas_ai = PandasAI(llm)
Note: in production apps, you will want to read the API token with a secure library such as python-dotenv or python-decouple.
The OpenAI object is an interface into querying the OpenAI language-models, via their APIs.
Now, we have a PandasAI object - when the user enters a prompt and submits with the button, we want to use the prompt to get some results.
Let's add some code to call the pandas_ai.run() method, now:
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
st.write(df.head(3))
prompt = st.text_area("Enter your prompt:")
# Generate output
if st.button("Generate"):
if prompt:
# call pandas_ai.run(), passing dataframe and prompt
with st.spinner("Generating response..."):
st.write(pandas_ai.run(df, prompt))
else:
st.warning("Please enter a prompt.")Because the API requests to OpenAI can take some time, we have also added a Streamlit spinner widget, with the st.spinner() function.
We can now ask questions of the uploaded data. For example, for the Titanic data, I can ask how many people survived, and how many died, as per the screenshot below:
The output is informative, and we can get information without writing any code.
Very useful for non-technical users!
We can also produce Matplotlib charts using PandasAI. Check out the video if you want more information on this!
Summary
In this post, we've seen how to build a small Streamlit UI that lets users upload a file, and then enter a prompt that can be used to interrogate the uploaded data.
The interrogation occurs via the PandasAI library, which takes both the Pandas DataFrame and the user prompt in order to send requests to OpenAI's language models and generate a response.
The response can be in different formats - text-based outputs, summary statistics, and Matplotlib charts, for example.
If you enjoyed this post, please subscribe to our YouTube channel and follow us on Twitter to keep up with our new content!
Please also consider buying us a coffee, to encourage us to create more posts and videos!