Pandas: Average value for the past n days(Pandas:过去 n 天的平均值)
问题描述
我有一个像这样的 Pandas 数据框:
I have a Pandas data frame like this:
test = pd.DataFrame({ 'Date' : ['2016-04-01','2016-04-01','2016-04-02',
'2016-04-02','2016-04-03','2016-04-04',
'2016-04-05','2016-04-06','2016-04-06'],
'User' : ['Mike','John','Mike','John','Mike','Mike',
'Mike','Mike','John'],
'Value' : [1,2,1,3,4.5,1,2,3,6]
})
如下所示,数据集不一定每天都有观测值:
As you can see below, the data set does not have observations for every day necessarily:
Date User Value
0 2016-04-01 Mike 1.0
1 2016-04-01 John 2.0
2 2016-04-02 Mike 1.0
3 2016-04-02 John 3.0
4 2016-04-03 Mike 4.5
5 2016-04-04 Mike 1.0
6 2016-04-05 Mike 2.0
7 2016-04-06 Mike 3.0
8 2016-04-06 John 6.0
如果至少有一天可用,我想添加一个新列,显示过去 n 天(在本例中 n = 2)每个用户的平均值,否则它将具有 nan代码>值.例如,在 2016-04-06,John 得到一个 nan,因为他没有 2016-04-05 和 2016 的数据-04-04.所以结果会是这样的:
I'd like to add a new column which shows the average value for each user for the past n days (in this case n = 2) if at least one day is available, else it would have nan value. For example, on 2016-04-06 John gets a nan because he has no data for 2016-04-05 and 2016-04-04. So the result will be something like this:
Date User Value Value_Average_Past_2_days
0 2016-04-01 Mike 1.0 NaN
1 2016-04-01 John 2.0 NaN
2 2016-04-02 Mike 1.0 1.00
3 2016-04-02 John 3.0 2.00
4 2016-04-03 Mike 4.5 1.00
5 2016-04-04 Mike 1.0 2.75
6 2016-04-05 Mike 2.0 2.75
7 2016-04-06 Mike 3.0 1.50
8 2016-04-06 John 6.0 NaN
看了论坛里的几篇帖子,好像应该把group_by和自定义的rolling_mean结合起来,但是我不太明白怎么做.
It seems that I should a combination of group_by and customized rolling_mean after reading several posts in the forum, but I couldn't quite figure out how to do it.
推荐答案
我想你可以使用先转换列 Date to_datetime,然后通过 Days//pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="noreferrer">groupby with resample 和最后一个 应用 滚动
I think you can use first convert column Date to_datetime, then find missing Days by groupby with resample and last apply rolling
test['Date'] = pd.to_datetime(test['Date'])
df = test.groupby('User').apply(lambda x: x.set_index('Date').resample('1D').first())
print df
User Value
User Date
John 2016-04-01 John 2.0
2016-04-02 John 3.0
2016-04-03 NaN NaN
2016-04-04 NaN NaN
2016-04-05 NaN NaN
2016-04-06 John 6.0
Mike 2016-04-01 Mike 1.0
2016-04-02 Mike 1.0
2016-04-03 Mike 4.5
2016-04-04 Mike 1.0
2016-04-05 Mike 2.0
df1 = df.groupby(level=0)['Value']
.apply(lambda x: x.shift().rolling(min_periods=1,window=2).mean())
.reset_index(name='Value_Average_Past_2_days')
print df1
User Date Value_Average_Past_2_days
0 John 2016-04-01 NaN
1 John 2016-04-02 2.00
2 John 2016-04-03 2.50
3 John 2016-04-04 3.00
4 John 2016-04-05 NaN
5 John 2016-04-06 NaN
6 Mike 2016-04-01 NaN
7 Mike 2016-04-02 1.00
8 Mike 2016-04-03 1.00
9 Mike 2016-04-04 2.75
10 Mike 2016-04-05 2.75
11 Mike 2016-04-06 1.50
print pd.merge(test, df1, on=['Date', 'User'], how='left')
Date User Value Value_Average_Past_2_days
0 2016-04-01 Mike 1.0 NaN
1 2016-04-01 John 2.0 NaN
2 2016-04-02 Mike 1.0 1.00
3 2016-04-02 John 3.0 2.00
4 2016-04-03 Mike 4.5 1.00
5 2016-04-04 Mike 1.0 2.75
6 2016-04-05 Mike 2.0 2.75
7 2016-04-06 Mike 3.0 1.50
8 2016-04-06 John 6.0 NaN
这篇关于Pandas:过去 n 天的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Pandas:过去 n 天的平均值
基础教程推荐
- kivy 应用程序中的一个简单网页作为小部件 2022-01-01
- 对多索引数据帧的列进行排序 2022-01-01
- Python,确定字符串是否应转换为 Int 或 Float 2022-01-01
- 在 Django Admin 中使用内联 OneToOneField 2022-01-01
- Python 中是否有任何支持将长字符串转储为块文字或折叠块的 yaml 库? 2022-01-01
- Kivy 使用 opencv.调整图像大小 2022-01-01
- 在 Python 中将货币解析为数字 2022-01-01
- 究竟什么是“容器"?在蟒蛇?(以及所有的 python 容器类型是什么?) 2022-01-01
- 比较两个文本文件以找出差异并将它们输出到新的文本文件 2022-01-01
- matplotlib 设置 yaxis 标签大小 2022-01-01
