

  • 级联:pd.concat, pd.append
  • 合并:pd.merge, pd.join
import pandas as pd
import numpy as np
from pandas import DataFrame,Series

一. 使用pd.concat()级联


join='outer' / 'inner':表示的是级联的方式,outer会将所有的项进行级联(忽略匹配和不匹配),而inner只会将匹配的项级联到一起,不匹配的不级联



df1 = DataFrame(data=np.random.randint(0,100,size=(3,4)))

0 1 2 3
0 61 89 68 51
1 46 79 1 55
2 52 4 72 18
df2 = DataFrame(data=np.random.randint(0,100,size=(3,4)))

0 1 2 3
0 15 62 20 78
1 60 79 70 58
2 71 87 20 95
pd.concat((df1,df2),axis=0)  # axis=0表示Y轴级联

0 1 2 3
0 61 89 68 51
1 46 79 1 55
2 52 4 72 18
0 15 62 20 78
1 60 79 70 58
2 71 87 20 95

2) 不匹配级联



  • 外连接:补NaN(默认模式)

  • 内连接:只连接匹配的项

df1 = DataFrame(data=np.random.randint(0,100,size=(3,4)))
df2 = DataFrame(data=np.random.randint(0,100,size=(3,3)))

0 1 2 3
0 55 61 54 56.0
1 10 14 6 62.0
2 39 27 99 81.0
0 31 49 80 NaN
1 73 42 44 NaN
2 67 68 97 NaN
pd.concat((df1,df2),axis=0,join='inner')  # inner内连接,只级联匹配的项

0 1 2
0 55 61 54
1 10 14 6
2 39 27 99
0 31 49 80
1 73 42 44
2 67 68 97

二. 使用pd.merge()合并





  • how:outer取并集(外连接) inner取交集(内连接)

  • on:当有多列相同的时候,可以使用on来指定使用那一列进行合并,on的值为一个列表

1) 一对一合并

df1 = DataFrame({'employee':['Bob','Jake','Lisa'],

employee group
0 Bob Accounting
1 Jake Engineering
2 Lisa Engineering
df2 = DataFrame({'employee':['Lisa','Bob','Jake'],

employee hire_date
0 Lisa 2004
1 Bob 2008
2 Jake 2012
pd.merge(df1, df2)  # 按照employee进行了合并

employee group hire_date
0 Bob Accounting 2008
1 Jake Engineering 2012
2 Lisa Engineering 2004

2) 多对一合并

df3 = DataFrame({

employee group hire_date
0 Lisa Accounting 2004
1 Jake Engineering 2016
df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],

group supervisor
0 Accounting Carly
1 Engineering Guido
2 Engineering Steve
pd.merge(df3, df4)

employee group hire_date supervisor
0 Lisa Accounting 2004 Carly
1 Jake Engineering 2016 Guido
2 Jake Engineering 2016 Steve

3) 多对多合并

df1 = DataFrame({'employee':['Bob','Jake','Lisa'],

employee group
0 Bob Accounting
1 Jake Engineering
2 Lisa Engineering
df2 = DataFrame({'group':['Engineering','Engineering','HR'],

group supervisor
0 Engineering Carly
1 Engineering Guido
2 HR Steve
pd.merge(df1,df2,how='right')  # right表示右连接

employee group supervisor
0 Jake Engineering Carly
1 Lisa Engineering Carly
2 Jake Engineering Guido
3 Lisa Engineering Guido
4 NaN HR Steve

4) key的规范化

  • 当列冲突时,即有多个列名称相同时,需要使用on=来指定哪一个列作为key,配合suffixes指定冲突列名
df1 = DataFrame({'employee':['Jack',"Summer","Steve"],

employee group
0 Jack Accounting
1 Summer Finance
2 Steve Marketing
df2 = DataFrame({'employee':['Jack','Bob',"Jake"],

employee group hire_date
0 Jack Accounting 2003
1 Bob sell 2009
2 Jake ceo 2012
pd.merge(df1,df2,on='employee')  # 默认按照employee和group进行合并,可以指定列名

employee group_x group_y hire_date
0 Jack Accounting Accounting 2003
  • 当两张表没有可进行连接的列时,可使用left_on和right_on手动指定merge中左右两边的哪一列列作为连接的列
df1 = DataFrame({'employee':['Bobs','Linda','Bill'],

employee group hire_date
0 Bobs Accounting 1998
1 Linda Product 2017
2 Bill Marketing 2018
df2 = DataFrame({'name':['Lisa','Bobs','Bill'],

hire_dates name
0 1998 Lisa
1 2016 Bobs
2 2007 Bill

employee group hire_date hire_dates name
0 Bobs Accounting 1998.0 2016.0 Bobs
1 Linda Product 2017.0 NaN NaN
2 Bill Marketing 2018.0 2007.0 Bill
3 NaN NaN NaN 1998.0 Lisa

5) 内合并与外合并:out取并集 inner取交集

  • 内合并:只保留两者都有的key(默认模式)
df6 = DataFrame({'name':['Peter','Paul','Mary'],

food name
0 fish Peter
1 beans Paul
2 bread Mary
df7 = DataFrame({'name':['Mary','Joseph'],

drink name
0 wine Mary
1 beer Joseph
pd.merge(df6, df7)

food name drink
0 bread Mary wine
  • 外合并 how='outer':补NaN
pd.merge(df6, df7, how='outer')

food name drink
0 fish Peter NaN
1 beans Paul NaN
2 bread Mary wine
3 NaN Joseph beer


