点击排行

您现在的位置：首页 > 技术文档 > Python编程技巧

用Python实现数据的透视表的方法

来源：中文源码网浏览：689 次日期：2024-04-27 02:39:32

用Python实现数据的透视表的方法
在处理数据时，经常需要对数据分组计算均值或者计数，在Microsoft Excel中，可以通过透视表轻易实现简单的分组运算。而对于更加复杂的分组运算，Python中pandas包可以帮助我们实现。
1 数据
首先引入几个重要的包：
import pandas as pd
import numpy as np
from pandas import DataFrame,Series
通过代码构造数据集：
data=DataFrame({'key1':['a','b','c','a','c','a','b','a','c','a','b','c'],'key2':['one','two','three','two','one','one','three','one','two','three','one','two'],'num1':np.random.rand(12),'num2':np.random.randn(12)})
得到数据集如下：
data
key1 key2 num1 num2
0 a one 0.268705 0.084091
1 b two 0.876707 0.217794
2 c three 0.229999 0.574402
3 a two 0.707990 -1.444415
4 c one 0.786064 0.343244
5 a one 0.587273 1.212391
6 b three 0.927396 1.505372
7 a one 0.295271 -0.497633
8 c two 0.292721 0.098814
9 a three 0.369788 -1.157426
2 交叉表―分类计数
按照不同类进行计数统计是最常见透视功能，可以通
（1）crosstab
#函数：
crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, dropna=True, normalize=False)
crosstab的index和columns是必须要指定复制的参数：
pd.crosstab(data.key1,data.key2)
结果如下：
key2 one three two
key1
a 3 1 1
b 0 1 1
c 1 1 1
想要在边框处增加汇总项可以指定margin的值为True：
pd.crosstab(data.key1,data.key2,margins=True)
结果：
key2 one three two All
key1
a 3 1 1 5
b 1 1 1 3
c 1 1 2 4
All 5 3 4 12
（2）pivot_table
函数：
pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')
使用pivot_table函数同样可以实现，运算函数默认值aggfunc='mean',指定为aggfunc='count'即可：
data.pivot_table('num1',index='key1',columns='key2',aggfunc='count')
结果相同：
key2 one three two
key1
a 3 1 1
b 1 1 1
c 1 1 2
（3）groupby
通过groupby相对来说会更加复杂，首先需要对data按照key1和key2进行聚类，然后进行count运算，再将key2的index重塑为columns：
data.groupby(['key1','key2'])['num1'].count().unstack()
结果：
key2 one three two
key1
a 3 1 1
b 1 1 1
c 1 1 2
3 其它透视表运算
（1）pivot_table
pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')
要进行何种运算，只需要指定aggfunc即可。
默认计算均值：
data.pivot_table(index='key1',columns='key2')
out：
num1 num2
key2 one three two one three two
key1
a 0.193332 0.705657 0.203155 -0.165749 2.398164 -1.293595
b 0.167947 0.204545 0.661460 0.555850 -0.522528 0.143530
c 0.496993 0.033673 0.206028 -0.115093 0.024650 0.077726
分类汇总呢并求和：
data.pivot_table(index='key1',columns='key2',aggfunc='sum')
结果：
num1 num2
key2 one three two one three two
key1
a 0.579996 0.705657 0.203155 -0.497246 2.398164 -1.293595
b 0.167947 0.204545 0.661460 0.555850 -0.522528 0.143530
c 0.496993 0.033673 0.412055 -0.115093 0.024650 0.155452
也可以使用其它自定义函数：
#定义一个最大值减最小值的函数
def max_min (group):
return group.max()-group.min()
data.pivot_table(index='key1',columns='key2',aggfunc=max_min)
结果：
num1 num2
key2 one three two one three two
key1
a 0.179266 0.0 0.000 3.109405 0.0 0.000000
b 0.000000 0.0 0.000 0.000000 0.0 0.000000
c 0.000000 0.0 0.177 0.000000 0.0 1.609466
（2）通过groupby
普通的函数如mean，sum可以直接应用：
data.groupby(['key1','key2']).mean().unstack()
返回结果：
num1 num2
key2 one three two one three two
key1
a 0.193332 0.705657 0.203155 -0.165749 2.398164 -1.293595
b 0.167947 0.204545 0.661460 0.555850 -0.522528 0.143530
c 0.496993 0.033673 0.206028 -0.115093 0.024650 0.077726
以上这篇用Python实现数据的透视表的方法就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持中文源码网。

上一篇：用python实现刷点击率的示例代码

下一篇：用Python实现筛选文件脚本的方法

点击排行

您现在的位置：首页 > 技术文档 > Python编程技巧

用Python实现数据的透视表的方法

相关内容