5 Routine Duties That ChatGPT Can Deal with for Knowledge Scientists

August 5, 2025

53

Tasks That ChatGPT Can Handle for Data Scientists

Picture by Writer | Canva

Based on the information science report by Anaconda, knowledge scientists spend almost 60% of their time on cleansing and organizing knowledge. These are routine, time-consuming duties that make them preferrred candidates for ChatGPT to take over.

On this article, we’ll discover 5 routine duties that ChatGPT can deal with in the event you use the precise prompts, together with cleansing and organizing the information. We’ll use an actual knowledge mission from Gett, a London black taxi app just like Uber, used of their recruitment course of, to indicate the way it works in observe.

Case Research: Analyzing Failed Experience Orders from Gett

In this knowledge mission, Gett asks you to research failed rider orders by analyzing key matching metrics to know why some prospects didn’t efficiently get a automotive.

Right here is the information description.

Analyzing Failed Ride Orders from Gett

Now, let’s discover it by importing the information to ChatGPT.

Within the subsequent 5 steps, we’ll stroll by way of the routine duties that ChatGPT can deal with in a knowledge mission. The steps are proven beneath.

Analyzing Failed Ride Orders from Gett

Step 1: Knowledge Exploration and Evaluation

In knowledge exploration, we use the identical capabilities each time, like head, data, or describe.

After we ask ChatGPT, we’ll embrace the important thing capabilities within the immediate. We’ll additionally paste the mission description and fix the dataset.

Data Exploration and Analysis

We’ll use the immediate beneath. Simply substitute the textual content contained in the sq. brackets with the mission description. You could find the mission description right here:

Right here is the information mission description: [paste here ] 
Carry out fundamental EDA, present head, data, and abstract stats, lacking values, and correlation heatmap.

Right here is the output.

Data Exploration and Analysis

As you’ll be able to see, ChatGPT summarizes the dataset by highlighting key columns, lacking values, after which creates a correlation heatmap to discover relationships.

Step 2: Knowledge Cleansing

Each datasets include lacking values.

Data Cleaning

Let’s write a immediate to work on this.

Clear this dataset: establish and deal with lacking values appropriately (e.g., drop or impute based mostly on context). Present a abstract of the cleansing steps.

Right here is the abstract of what ChatGPT did:

Data Cleaning

ChatGPT transformed the date column, dropped invalid orders, and imputed lacking values to the m_order_eta.

Step 3: Generate Visualizations

To take advantage of your knowledge, it is very important visualize the precise issues. As an alternative of producing random plots, we will information ChatGPT by offering the hyperlink to the supply, which known as Retrieval-Augmented Era.

We’ll use this article. Right here is the immediate:

Earlier than producing visualizations, learn this text on selecting the best plots for various knowledge sorts and distributions: [LINK]. hen, present most fitted visualizations for this dataset and clarify why every was chosen and produce the plots on this chat by operating code on the dataset.

Right here is the output.

Generate Visualizations

We’ve six completely different graphs that we produced with ChatGPT.

Generate Visualizations

You will note why the associated graph has been chosen, the graph, and the reason of this graph.

Step 4: Make your Dataset Prepared for Machine Studying

Now that we now have dealt with lacking values and explored the dataset, the following step is to arrange it for machine studying. This entails steps like encoding categorical variables and scaling numerical options.

Right here is our immediate.

Put together this dataset for machine studying: encode categorical variables, scale numerical options, and return a clear DataFrame prepared for modeling. Briefly clarify every step.

Right here is the output.

Make your Dataset Ready for Machine Learning

Now your options have been scaled and encoded, so your dataset is able to apply a machine studying mannequin.

Step 5: Making use of Machine Studying Mannequin

Let’s transfer on to machine studying modeling. We’ll use the next immediate construction to use a fundamental machine studying mannequin.

Use this dataset to foretell [target variable]. Apply [model type] and report machine studying analysis metrics like [accuracy, precision, recall, F1-score]. Use solely related 5 options and clarify your modeling steps.

Let’s replace this immediate based mostly on our mission.

Use this dataset to foretell order_status_key. Apply a multiclass classification mannequin (e.g., Random Forest), and report analysis metrics like accuracy, precision, recall, and F1-score. Use solely the 5 most related options and clarify your modeling steps.

Now, paste this into the continued dialog and overview the output.

Right here is the output.

Applying Machine Learning Model

As you’ll be able to see, the mannequin carried out effectively, maybe too effectively?

Bonus: Gemini CLI

Gemini has launched an open-source agent which you could work together with out of your terminal. You’ll be able to set up it by utilizing this code. (60 mannequin requests per minute and 1,000 requests per day at no cost.)

Moreover ChatGPT, it’s also possible to use Gemini CLI to deal with routine knowledge science duties, akin to cleansing, exploration, and even constructing a dashboard to automate these duties.

The Gemini CLI supplies an easy command-line interface and is out there without charge. Let’s begin by putting in it utilizing the code beneath.

sudo npm set up -g @google/gemini-cli

After operating the code above, open your terminal and paste the next code to begin constructing with it:

When you run the instructions above, you’ll see the Gemini CLI as proven within the screenshot beneath.

Gemini CLI

Gemini CLI permits you to run code, ask questions, and even construct apps immediately out of your terminal. On this case, we’ll use Gemini CLI to construct a Streamlit app that automates every part we’ve achieved thus far, EDA, cleansing, visualization, and modeling.

To construct a Streamlit app, we’ll use a immediate that covers all steps. It’s proven beneath.

Constructed a streamlit app that automates EDA, Knowledge Cleansing, Creates Automated knowledge visualization, prepares the dataset for machine studying, and applies a machine studying mannequin after deciding on goal variables by the person.

Step 1 – Primary EDA:
• Show .head(), .data(), and .describe()
• Present lacking values per column
• Present correlation heatmap of numerical options
Step 2 – Knowledge Cleansing:
• Detect columns with lacking values
• Deal with lacking knowledge appropriately (drop or impute)
• Show a abstract of cleansing actions taken
Step 3 – Auto Visualizations
• Earlier than plotting, use these visualization ideas:
• Use histograms for numerical distributions
• Use bar plots for categorical distributions
• Use boxplots or violin plots to check classes
• Use scatter plots for numerical relationships
• Use correlation heatmaps for multicollinearity
• Use line plots for time sequence (if relevant)
• Generate probably the most related plots for this dataset
• Clarify why every plot was chosen
Step 4 – Machine Studying Preparation:
• Encode variables
• Scale numerical options
• Return a clear DataFrame prepared for modeling
Step 5 – Apply Machine Studying Mannequin:
• Supply the goal variable to the person.
• Apply a number of machine studying fashions.
• Report analysis metrics.
Every step ought to show in a unique tab. Run the Streamlit app after you constructed it.

It would immediate you for permission when creating the listing or operating code in your terminal.

Gemini CLI

After a couple of approval steps like we did, the Streamlit app might be prepared, as proven beneath.

Gemini CLI

Now, let’s take a look at it.

Gemini CLI

Ultimate Ideas

On this article, we first used ChatGPT to deal with routine duties, akin to knowledge cleansing, exploration, and knowledge visualization. Subsequent, we went one step additional by utilizing it to arrange our dataset for machine studying and utilized machine studying fashions.

Lastly, we used Gemini CLI to create a Streamlit dashboard that performs all of those steps with only a click on.

To show all of this, we now have used a knowledge mission from Gett. Though AI will not be but completely dependable for each process, you’ll be able to leverage it to deal with routine duties, saving you plenty of time.

Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to knowledge scientists put together for his or her interviews with actual interview questions from prime firms. Nate writes on the most recent developments within the profession market, provides interview recommendation, shares knowledge science tasks, and covers every part SQL.

Previous articleNo. 1 Place to Retire within the World Could Not Be On Your Radar

Next articleIs company sustainability dying? Business professionals reply

5 Routine Duties That ChatGPT Can Deal with for Knowledge Scientists

Case Research: Analyzing Failed Experience Orders from Gett

Step 1: Knowledge Exploration and Evaluation

Step 2: Knowledge Cleansing

Step 3: Generate Visualizations

Step 4: Make your Dataset Prepared for Machine Studying

Step 5: Making use of Machine Studying Mannequin

Bonus: Gemini CLI

Ultimate Ideas

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

Korea Innovation Basis selects 2 AI/IoT corporations for World Know-how Commercialisation Help Program

CRISPR Slashes ‘Dangerous Ldl cholesterol’ Ranges by 95 % in Early Outcomes

Portuguese on-line buying reaches €11 billion in 2025

swift – iOS Firebase seems to hold resulting from StoreKit (which is not getting used)

Recent Comments

ABOUT US

POPULAR POSTS

Korea Innovation Basis selects 2 AI/IoT corporations for World Know-how Commercialisation Help Program

CRISPR Slashes ‘Dangerous Ldl cholesterol’ Ranges by 95 % in Early Outcomes

Portuguese on-line buying reaches €11 billion in 2025

POPULAR CATEGORY