

Picture by Writer | Canva
A powerful portfolio is commonly the distinction between making it and breaking it. However what precisely makes a portfolio robust? Quite a few difficult initiatives? Slick design? Spectacular knowledge visualization? Sure and no. Whereas these are crucial parts for a portfolio to be nice, they’re parts so apparent that everybody is aware of you possibly can’t make do with out them.
Nevertheless, many knowledge scientists make errors when making an attempt to transcend that. Consequently, they’re interviewing with portfolios that nominally have the whole lot however are literally not that nice.
# The Framework
Right here’s the framework that may show you how to keep away from widespread errors when constructing an awesome portfolio.
# The Errors
Let’s now speak concerning the portfolio-building errors and how one can keep away from them utilizing that framework.
// Mistake #1: Constructing Tasks You Do not Care About
Many portfolios give the impression that the initiatives are there simply to tick a field: Titanic survival, Iris dataset, MNIST digits. You realize — the everyday stuff. It’s not solely that you just’ll be drowned within the hundreds of comparable portfolios, it additionally exhibits a scarcity of originality and curiosity in what you’re doing. The autopilot initiatives.
Repair: Begin with domains that curiosity you, e.g., sports activities, finance, music. When the subject pursuits you, you’ll go deeper with out even making an attempt. In the event you’re a sports activities fan, you would possibly analyze shot effectivity within the NBA or select from these cool mission concepts for apply. A music fan would possibly mannequin playlist suggestions.
// Mistake #2: Utilizing No matter Information Falls Into Your Lap
Candidates typically seize the primary clear CSV they will discover. The issue is that actual knowledge science doesn’t work that method.
Repair: You must show that you understand how to search out the precise knowledge, entry it, and reshape it for additional modeling phases. In your initiatives, use APIs (e.g., Twitter/X API), open authorities datasets (e.g., knowledge.gov), and web-scraped sources (e.g., Superior Public Datasets on GitHub). Use as many knowledge sources as you possibly can, consider knowledge, merge them into one dataset, and put together it for modeling.
// Mistake #3: Treating Tasks Like Kaggle Competitions
Kaggle competitions concentrate on optimizing for a single metric. That is nice for apply however doesn’t reduce it in the actual world. Accuracy in itself isn’t a aim. You’ll must make a trade-off between the technical features of your mannequin and the precise enterprise or social impression.
Repair: Even if you happen to use widespread datasets from Kaggle, all the time supply a unique angle and body the issue so it has enterprise or social worth. For instance, don’t simply classify faux vs. actual information. Present which phrases, phrases, or matters drive misinformation. One other instance: Don’t simply predict churn.
Present how a ten% discount in churn may save $2M in annual income.
// Mistake #4: Displaying Solely Fashions, Not Workflows
A number of initiatives learn like a sequence of Jupyter notebooks: importing libraries, then preprocessing knowledge, then becoming fashions — right here’s accuracy. It’s incomplete and boring. What’s lacking is an indication of the way you deal with totally different phases of a mission and why you make sure choices.
Repair: Make them end-to-end initiatives. Present each stage, from knowledge assortment to deployment and the whole lot in between. Clarify why you made key decisions, e.g., why you picked one mannequin over one other, or why you engineered a sure function. Use instruments like Streamlit, Flask, or Energy BI dashboards for others to make use of. All this can make your initiatives seem like utilized problem-solving (e.g., Arch Desai’s portfolio), not a code walkthrough (e.g., this one).
// Mistake #5: Ending With a Mannequin, Not Motion
Information scientists typically finish at a technical stage, e.g., displaying the accuracy rating. OK, however what do you do with it? You have to keep in mind that what issues is the mannequin’s sensible use. The mannequin’s technical side is only one a part of that, the opposite being enterprise or social impression.
Repair: End the mission with a advice of what to do. For instance, “This mannequin suggests prioritizing inspections in eating places serving high-risk cuisines throughout winter.”
# Mission Instance: Forecasting Metropolis Vitality Demand to Reduce Prices
On this part, I’ll create a mock mission walkthrough to indicate you ways the framework can be utilized in apply.
Area: The area I picked is power consumption and sustainability. Dwelling in a giant metropolis made me conscious of how cities worldwide wrestle with excessive electrical energy demand throughout peak hours. Forecasting demand extra precisely might help utilities stability the grid, scale back prices, and reduce emissions.
Information: The primary supply may very well be the U.S. Vitality Data Administration (EIA). As well as, I may use the NOAA Climate API (e.g., for temperature and humidity), and vacation/occasion calendars (for spikes in demand).
Framing the Downside: As a substitute of framing the issue as “Predict electrical energy demand over time.”, I’ll body it as “How a lot cash may town save if it shifted peak masses utilizing higher demand forecasts?”. With that, I flip a technical forecasting drawback right into a useful resource allocation and cost-saving drawback.
Constructing Finish-to-Finish: The mission would come with these phases.
- Information Cleansing: Deal with lacking hours, align timestamps, normalize climate variables.
- Function Engineering:
- Lag options: demand in earlier hours/days
- Climate options: temperature, humidity
- Calendar options: weekday, vacation flag, main occasions
- Modeling:
- Deployment: For instance, I may create a dashboard displaying 24-hour forecast vs. precise demand and simulate “what if” situations, e.g., adjusting demand by shifting industrial masses.
Motion: We gained’t cease at “the forecast has low RMSE”. As a substitute, let’s give a advice that has enterprise and social impression, e.g., “If town incentivized giant companies to shift 5% of consumption away from peak hours (predicted by the mannequin), it may save $3.5M yearly in grid prices.”
# Bonus: Assets
As a bonus, listed below are some solutions on what platforms you should use for apply and the place to search out the information.
// Platforms for Working towards
// Open Information Sources
// APIs for Actual-Time Information
# Conclusion
You most likely seen that not one of the errors talked about are technical. That’s not unintentional; the most important mistake is forgetting {that a} portfolio is an indication of the way you remedy issues.
Deal with these two features — demonstration and problem-solving — and your portfolio will lastly begin trying like proof you are able to do the job.
Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to knowledge scientists put together for his or her interviews with actual interview questions from high firms. Nate writes on the most recent traits within the profession market, provides interview recommendation, shares knowledge science initiatives, and covers the whole lot SQL.