同比和环比:环比和同比用于描述统计数据的变化情况
- 环比:表示本次统计段与相连的上次统计段之间的比较。
- 比如 2010 年中国第一季度 GDP 为 G2010Q1 亿元,第二季度 GDP 为 G2010Q2 亿元,则第二季度 GDP 环比增长(G2010Q2-G2010Q1)/G2010Q1;
- 同比:即同期相比,表示某个特定统计段今年与去年之间的比较。
- 比如 2009 年中国第一季度 GDP 为 G2009Q1 亿元,则 2010 年第一季度的 GDP 同比增长为(G2010Q1-G2009Q1)/G2009Q1。
演示: 读取连续3年的天气数据
- 方法1:pandas.Series.pct_change
- 方法2:pandas.Series.shift
- 方法3:pandas.Series.diff
pct_change、shift、diff,都实现了跨越多行的数据计算
读取连续 3 年的天气数据
import pandas as pd %matplotlib inline fpath = "./datas/beijing_tianqi/beijing_tianqi_2017-2019.csv" df = pd.read_csv(fpath, index_col="ymd", parse_dates=True) df.head(3)
bWendu yWendu tianqi fengxiang fengli aqi aqiInfo aqiLevel ymd 2017-01-01 5℃ -3℃ 霾~晴 南风 1-2级 450 严重污染 6 2017-01-02 7℃ -6℃ 晴~霾 南风 1-2级 246 重度污染 5 2017-01-03 5℃ -5℃ 霾 南风 1-2级 320 严重污染 6
# 替换掉温度的后缀 ℃ df["bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32') df.head(3)
bWendu yWendu tianqi fengxiang fengli aqi aqiInfo aqiLevel ymd 2017-01-01 5 -3℃ 霾~晴 南风 1-2级 450 严重污染 6 2017-01-02 7 -6℃ 晴~霾 南风 1-2级 246 重度污染 5 2017-01-03 5 -5℃ 霾 南风 1-2级 320 严重污染 6
# 新的df,为每个月的平均最高温 df = df[["bWendu"]].resample("M").mean()
# 将索引按照日期升序排列 df.sort_index(ascending=True, inplace=True) df.head()
bWendu ymd 2017-01-31 3.322581 2017-02-28 7.642857 2017-03-31 14.129032 2017-04-30 23.700000 2017-05-31 29.774194
df.index
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31', '2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30', '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31', '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31', '2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30', '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31', '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'], dtype='datetime64[ns]', name='ymd', freq='M')
df.plot()
方法1:pandas.Series.pct_change
pct_change 方法直接算好了"(新-旧)/旧"的百分比
官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html
df["bWendu_way1_huanbi"] = df["bWendu"].pct_change(periods=1) df["bWendu_way1_tongbi"] = df["bWendu"].pct_change(periods=12) df.head(15)
bWendu bWendu_way1_huanbi bWendu_way1_tongbi ymd 2017-01-31 3.322581 NaN NaN 2017-02-28 7.642857 1.300277 NaN 2017-03-31 14.129032 0.848658 NaN 2017-04-30 23.700000 0.677397 NaN 2017-05-31 29.774194 0.256295 NaN 2017-06-30 30.966667 0.040051 NaN 2017-07-31 31.612903 0.020869 NaN 2017-08-31 30.129032 -0.046939 NaN 2017-09-30 27.866667 -0.075089 NaN 2017-10-31 17.225806 -0.381849 NaN 2017-11-30 9.566667 -0.444632 NaN 2017-12-31 4.483871 -0.531303 NaN 2018-01-31 1.322581 -0.705036 -0.601942 2018-02-28 4.892857 2.699477 -0.359813 2018-03-31 14.129032 1.887685 0.000000
方法2:pandas.Series.shift
shift用于移动数据,但是保持索引不变
官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html
# 见识一下shift做了什么事情 # 使用pd.concat合并Series列表变成一个大的df pd.concat( [df["bWendu"], df["bWendu"].shift(periods=1), df["bWendu"].shift(periods=12)], axis=1 ).head(15)
bWendu bWendu bWendu ymd 2017-01-31 3.322581 NaN NaN 2017-02-28 7.642857 3.322581 NaN 2017-03-31 14.129032 7.642857 NaN 2017-04-30 23.700000 14.129032 NaN 2017-05-31 29.774194 23.700000 NaN 2017-06-30 30.966667 29.774194 NaN 2017-07-31 31.612903 30.966667 NaN 2017-08-31 30.129032 31.612903 NaN 2017-09-30 27.866667 30.129032 NaN 2017-10-31 17.225806 27.866667 NaN 2017-11-30 9.566667 17.225806 NaN 2017-12-31 4.483871 9.566667 NaN 2018-01-31 1.322581 4.483871 3.322581 2018-02-28 4.892857 1.322581 7.642857 2018-03-31 14.129032 4.892857 14.129032
# 环比 series_shift1 = df["bWendu"].shift(periods=1) df["bWendu_way2_huanbi"] = (df["bWendu"]-series_shift1)/series_shift1 # 同比 series_shift2 = df["bWendu"].shift(periods=12) df["bWendu_way2_tongbi"] = (df["bWendu"]-series_shift2)/series_shift2
df.head(15)
bWendu bWendu_way1_huanbi ... bWendu_way2_huanbi bWendu_way2_tongbi ymd ... 2017-01-31 3.322581 NaN ... NaN NaN 2017-02-28 7.642857 1.300277 ... 1.300277 NaN 2017-03-31 14.129032 0.848658 ... 0.848658 NaN 2017-04-30 23.700000 0.677397 ... 0.677397 NaN 2017-05-31 29.774194 0.256295 ... 0.256295 NaN 2017-06-30 30.966667 0.040051 ... 0.040051 NaN 2017-07-31 31.612903 0.020869 ... 0.020869 NaN 2017-08-31 30.129032 -0.046939 ... -0.046939 NaN 2017-09-30 27.866667 -0.075089 ... -0.075089 NaN 2017-10-31 17.225806 -0.381849 ... -0.381849 NaN 2017-11-30 9.566667 -0.444632 ... -0.444632 NaN 2017-12-31 4.483871 -0.531303 ... -0.531303 NaN 2018-01-31 1.322581 -0.705036 ... -0.705036 -0.601942 2018-02-28 4.892857 2.699477 ... 2.699477 -0.359813 2018-03-31 14.129032 1.887685 ... 1.887685 0.000000 [15 rows x 5 columns]
方法3. pandas.Series.diff
pandas.Series.diff 用于新值减去旧值
官方文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.diff.html
pd.concat( [df["bWendu"], df["bWendu"].diff(periods=1), df["bWendu"].diff(periods=12)], axis=1 ).head(15)
bWendu bWendu bWendu ymd 2017-01-31 3.322581 NaN NaN 2017-02-28 7.642857 4.320276 NaN 2017-03-31 14.129032 6.486175 NaN 2017-04-30 23.700000 9.570968 NaN 2017-05-31 29.774194 6.074194 NaN 2017-06-30 30.966667 1.192473 NaN 2017-07-31 31.612903 0.646237 NaN 2017-08-31 30.129032 -1.483871 NaN 2017-09-30 27.866667 -2.262366 NaN 2017-10-31 17.225806 -10.640860 NaN 2017-11-30 9.566667 -7.659140 NaN 2017-12-31 4.483871 -5.082796 NaN 2018-01-31 1.322581 -3.161290 -2.00 2018-02-28 4.892857 3.570276 -2.75 2018-03-31 14.129032 9.236175 0.00
# 环比 series_diff1 = df["bWendu"].diff(periods=1) df["bWendu_way3_huanbi"] = series_diff1/(df["bWendu"]-series_diff1) # 同比 series_diff2 = df["bWendu"].diff(periods=12) df["bWendu_way3_tongbi"] = series_diff2/(df["bWendu"]-series_diff2)
df.head(15)
bWendu bWendu_way1_huanbi ... bWendu_way2_huanbi bWendu_way2_tongbi ymd ... 2017-01-31 3.322581 NaN ... NaN NaN 2017-02-28 7.642857 1.300277 ... 1.300277 NaN 2017-03-31 14.129032 0.848658 ... 0.848658 NaN 2017-04-30 23.700000 0.677397 ... 0.677397 NaN 2017-05-31 29.774194 0.256295 ... 0.256295 NaN 2017-06-30 30.966667 0.040051 ... 0.040051 NaN 2017-07-31 31.612903 0.020869 ... 0.020869 NaN 2017-08-31 30.129032 -0.046939 ... -0.046939 NaN 2017-09-30 27.866667 -0.075089 ... -0.075089 NaN 2017-10-31 17.225806 -0.381849 ... -0.381849 NaN 2017-11-30 9.566667 -0.444632 ... -0.444632 NaN 2017-12-31 4.483871 -0.531303 ... -0.531303 NaN 2018-01-31 1.322581 -0.705036 ... -0.705036 -0.601942 2018-02-28 4.892857 2.699477 ... 2.699477 -0.359813 2018-03-31 14.129032 1.887685 ... 1.887685 0.000000 [15 rows x 5 columns]