Grouped Differences in Pandas with Multiple Fields
In this situation, we aim to calculate the differences in scores for different site and country combinations over time.
To achieve this, we begin by sorting the dataframe according to site, country, and date:
df = df.sort_values(by=['site', 'country', 'date'])
Next, we utilize groupby and diff to calculate the differences within each site and country group:
df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)
This generates differences within each site and country group and fills any missing values with 0.
Finally, we display the results:
print(df)
Output:
date site country score diff
8 2018-01-01 fb es 100 0.0
9 2018-01-02 fb gb 100 0.0
5 2018-01-01 fb us 50 0.0
6 2018-01-02 fb us 55 5.0
7 2018-01-03 fb us 100 45.0
1 2018-01-01 google ch 50 0.0
4 2018-01-02 google ch 10 -40.0
0 2018-01-01 google us 100 0.0
2 2018-01-02 google us 70 -30.0
3 2018-01-03 google us 60 -10.0
Please note that sorting by arbitrary order is not directly supported. For such scenarios, consider storing your order in a collection and making your column categorical. That way, sort_values will align with the provided order.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3