

Picture by Ideogram
Most of my days as an information scientist appear to be this:
- Stakeholder: “Are you able to inform us how a lot we made in promoting income within the final month and what number of that got here from search adverts?”
- Me: “Run an SQL question to extract the info and hand it to them.”
- Stakeholder: “I see. What’s our income forecast for the following 3 years?”
- Me: “Consolidate information from a number of sources, communicate to the finance workforce, and construct a mannequin that forecasts income.”
Duties just like the above are advert hoc requests from enterprise stakeholders. They take round 3–5 hours to finish and are often unrelated to the core challenge I am engaged on.
When data-related questions like these are available in, they typically require me to push the deadlines of present initiatives or work additional hours to get the job accomplished. And that is the place AI is available in.
As soon as AI fashions like ChatGPT and Claude had been made out there, the workforce’s effectivity improved, as did my capacity to reply to advert hoc stakeholder requests. AI dramatically decreased the time I spent writing code, producing SQL queries, and even collaborating with completely different groups for required data. Moreover, after AI code assistants like Cursor had been built-in with our codebases, effectivity beneficial properties improved even additional. Duties just like the one I simply defined above may now be accomplished twice as quick as earlier than.
Just lately, when MCP servers began gaining reputation, I assumed to myself:
Can I construct an MCP that automates these information science workflows additional?
I spent two days constructing this MCP server, and on this article, I’ll break down:
- The outcomes and the way a lot time I’ve saved with my information science MCP
- Sources and reference supplies used to create the MCP
- The fundamental setup, APIs, and companies I built-in into my workflow
# Constructing a Knowledge Science MCP
In case you do not already know what an MCP is, it stands for Mannequin Context Protocol and is a framework that permits you to join a big language mannequin to exterior companies.
This video is a superb introduction to MCPs.
// The Core Downside
The issue I needed to resolve with my new information science MCP was:
How do I consolidate data that’s scattered throughout varied sources and generate outcomes that may instantly be utilized by stakeholders and workforce members?
To perform this, I constructed an MCP with three parts, as proven within the flowchart beneath:


Picture by Creator | Mermaid
// Element 1: Question Financial institution Integration
As a data base for my MCP, I used my workforce’s question financial institution (which contained questions, a pattern question to reply the query, and a few context in regards to the tables).
When a stakeholder asks me a query like this:
What proportion of promoting income got here from search adverts?
I now not must look by means of a number of tables and column names to generate a question. The MCP as an alternative searches the question financial institution for the same query. It then beneficial properties context in regards to the related tables it ought to question and adapts these queries to my particular query. All I must do is name the MCP server, paste in my stakeholder’s request, and I get a related question in a couple of minutes.
// Element 2: Google Drive Integration
Product documentation is often saved in Google Drive—whether or not it is a slide deck, doc, or spreadsheet.
I linked my MCP server to the workforce’s Google Drive so it had entry to all our documentation throughout dozens of initiatives. This helps shortly extract information and reply questions like:
Are you able to inform us how a lot we made in promoting income within the final month?
I additionally listed these paperwork to extract particular key phrases and titles, so the MCP merely has to undergo the key phrase record primarily based on the question somewhat than accessing a whole bunch of pages without delay.
For instance, if somebody asks a query associated to “cellular video adverts,” the MCP will first search by means of the doc index to establish probably the most related recordsdata earlier than trying by means of them.
// Element 3: Native Doc Entry
That is the only element of the MCP, the place I’ve an area folder that the MCP searches by means of. I add or take away recordsdata as wanted, permitting me so as to add my very own context, data, and directions on prime of my workforce’s initiatives.
# Abstract: How My Knowledge Science MCP Works
Here is an instance of how my MCP at present works to reply advert hoc information requests:
- A query is available in: ”What number of video advert impressions did we serve in Q3, and the way a lot advert demand do we have now relative to produce?”
- The doc retrieval MCP searches our challenge folder for “Q3,” “video,” “advert,” “demand,” and “provide,” and finds related challenge paperwork
- It then retrieves particular particulars in regards to the Q3 video advert marketing campaign, its provide, and demand from workforce paperwork
- It searches the question financial institution for comparable questions on advert serves
- It makes use of the context obtained from the paperwork and question financial institution to generate an SQL question about Q3’s video marketing campaign
- Lastly, the question is handed to a separate MCP that’s linked to Presto SQL, which is robotically executed
- I then collect the outcomes, evaluate them, and ship them to my stakeholders
# Implementation Particulars
Right here is how I applied this MCP:
// Step 1: Cursor Set up
I used Cursor as my MCP consumer. You’ll be able to set up Cursor from this hyperlink. It’s primarily an AI code editor that may entry your codebase and use it to generate or modify code.
// Step 2: Google Drive Credentials
Virtually all of the paperwork utilized by this MCP (together with the question financial institution) had been saved in Google Drive.
To provide your MCP entry to Google Drive, Sheets, and Docs, you will must arrange API entry:
- Go to the Google Cloud Console and create a brand new challenge.
- Allow the next APIs: Google Drive, Google Sheets, Google Docs.
- Create credentials (OAuth 2.0 consumer ID) and save them in a file referred to as
credentials.json
.
// Step 3: Set Up FastMCP
FastMCP is an open-source Python framework used to construct MCP servers. I adopted this tutorial to construct my first MCP server utilizing FastMCP.
(Be aware: This tutorial makes use of Claude Desktop because the MCP consumer, however the steps are relevant to Cursor or any AI code editor of your selection.)
With FastMCP, you’ll be able to create the MCP server with Google integration (pattern code snippet beneath):
@mcp.device()
def search_team_docs(question: str) -> str:
"""Search workforce paperwork in Google Drive"""
drive_service, _ = get_google_services()
# Your search logic right here
return f"Trying to find: {question}"
// Step 4: Configure the MCP
As soon as your MCP is constructed, you’ll be able to configure it in Cursor. This may be accomplished by navigating to Cursor’s Settings window → Options → Mannequin Context Protocol. Right here, you will see a piece the place you’ll be able to add an MCP server. Whenever you click on on it, a file referred to as mcp.json
will open, the place you’ll be able to embody the configuration to your new MCP server.
That is an instance of what your configuration ought to appear to be:
{
"mcpServers": {
"team-data-assistant": {
"command": "python",
"args": ["path/to/team_data_server.py"],
"env": {
"GOOGLE_APPLICATION_CREDENTIALS": "path/to/credentials.json"
}
}
}
}
After saving your adjustments to the JSON file, you’ll be able to allow this MCP and begin utilizing it inside Cursor.
# Last Ideas
This MCP server was a easy aspect challenge I made a decision to construct to save lots of time on my private information science workflows. It is not groundbreaking, however this device solves my quick ache level: spending hours answering advert hoc information requests that take away from the core initiatives I am engaged on. I consider {that a} device like this merely scratches the floor of what is attainable with generative AI and represents a broader shift in how information science work will get accomplished.
The standard information science workflow is shifting away from:
- Spending hours discovering information
- Writing code
- Constructing fashions
The main focus is shifting away from hands-on technical work, and information scientists at the moment are anticipated to have a look at the larger image and clear up enterprise issues. In some circumstances, we’re anticipated to supervise product choices and step in as a product or challenge supervisor.
As AI continues to evolve, I consider that the traces between technical roles will change into blurred. What’s going to stay related is the ability of understanding enterprise context, asking the appropriate questions, decoding outcomes, and speaking insights. If you’re an information scientist (or an aspiring one), there isn’t a query that AI will change the way in which you’re employed.
You might have two decisions: you’ll be able to both undertake AI instruments and construct options that form this modification to your workforce, or let others construct them for you.
Natassha Selvaraj is a self-taught information scientist with a ardour for writing. Natassha writes on the whole lot information science-related, a real grasp of all information subjects. You’ll be able to join together with her on LinkedIn or try her YouTube channel.