
print(df.groupby('12 months')['pop'].imply())
print(df.groupby('12 months')['gdpPercap'].imply())
To date, so good. However what if we need to group our information by a couple of column? We are able to do that by passing columns in lists:
print(df.groupby(['year', 'continent'])
[['lifeExp', 'gdpPercap']].imply())
lifeExp gdpPercap
12 months continent
1952 Africa 39.135500 1252.572466
Americas 53.279840 4079.062552
Asia 46.314394 5195.484004
Europe 64.408500 5661.057435
Oceania 69.255000 10298.085650
1957 Africa 41.266346 1385.236062
Americas 55.960280 4616.043733
Asia 49.318544 5787.732940
Europe 66.703067 6963.012816
Oceania 70.295000 11598.522455
1962 Africa 43.319442 1598.078825
Americas 58.398760 4901.541870
Asia 51.563223 5729.369625
Europe 68.539233 8365.486814
Oceania 71.085000 12696.452430
This .groupby() operation takes our information and teams it first by 12 months, after which by continent. Then, it generates imply values from the life-expectancy and GDP columns. This manner, you possibly can create teams in your information and rank how they’re to be introduced and calculated.
If you wish to “flatten” the outcomes right into a single, incrementally listed body, you should use the .reset_index() methodology on the outcomes:
gb = df.groupby(['year', 'continent'])
[['lifeExp', 'gdpPercap']].imply()
flat = gb.reset_index()
print(flat.head())
| 12 months continent lifeExp gdpPercap
| 0 1952 Africa 39.135500 1252.572466
| 1 1952 Americas 53.279840 4079.062552
| 2 1952 Asia 46.314394 5195.484004
| 3 1952 Europe 64.408500 5661.057435
| 4 1952 Oceana 69.255000 10298.085650
Grouped frequency counts
One thing else we regularly do with information is compute frequencies. The nunique and value_counts strategies can be utilized to get distinctive values in a collection, and their frequencies. For example, right here’s the way to learn the way many nations we have now in every continent:
print(df.groupby('continent')['country'].nunique())
continent
Africa 52
Americas 25
Asia 33
Europe 30
Oceana 2
Fundamental plotting with Pandas and Matplotlib
More often than not, while you need to visualize information, you’ll use one other library equivalent to Matplotlib to generate these graphics. Nonetheless, you should use Matplotlib immediately (together with another plotting libraries) to generate visualizations from inside Pandas.
To make use of the straightforward Matplotlib extension for Pandas, first ensure you’ve put in Matplotlib with pip set up matplotlib.
Now let’s take a look at the yearly life expectations for the world inhabitants once more:
global_yearly_life_expectancy = df.groupby('12 months')['lifeExp'].imply()
print(global_yearly_life_expectancy)
| 12 months
| 1952 49.057620
| 1957 51.507401
| 1962 53.609249
| 1967 55.678290
| 1972 57.647386
| 1977 59.570157
| 1982 61.533197
| 1987 63.212613
| 1992 64.160338
| 1997 65.014676
| 2002 65.694923
| 2007 67.007423
| Identify: lifeExp, dtype: float64
To create a primary plot from this, use:
import matplotlib.pyplot as plt
global_yearly_life_expectancy = df.groupby('12 months')['lifeExp'].imply()
c = global_yearly_life_expectancy.plot().get_figure()
plt.savefig("output.png")
The plot will probably be saved to a file within the present working listing as output.png. The axes and different labeling on the plot can all be set manually, however for fast exports this methodology works positive.
Conclusion
Python and Pandas provide many options you possibly can’t get from spreadsheets. For one, they allow you to automate your work with information and make the outcomes reproducible. Fairly than write spreadsheet macros, that are clunky and restricted, you should use Pandas to investigate, phase, and rework information—and use Python’s expressive energy and package deal ecosystem (for example, for graphing or rendering information to different codecs) to do much more than you can with Pandas alone.

