Tutorial 1 - Pandas

O Pandas é uma biblioteca Python fundamental para análise de dados. Ela fornece estruturas de dados eficientes e ferramentas para manipulação, limpeza e análise de dados.

Você pode baixar os dados aqui.

import pandas as pd

Criando um dataframe

data = {'name': ['Alice', 'Bob', 'Charlie', 'Alice', 'David', 'Bob','Camille'],
        'age': [25, 30, 35, 25, 40, 30,20],
        'title':['Sherlock','The Walking Dead','Dark',
        'Friends','Orange Is the New Black',
        'The Walking Dead','Narcos']}
df = pd.DataFrame(data)
print(df)
      name  age                    title
0    Alice   25                 Sherlock
1      Bob   30         The Walking Dead
2  Charlie   35                     Dark
3    Alice   25                  Friends
4    David   40  Orange Is the New Black
5      Bob   30         The Walking Dead
6  Camille   20                   Narcos

Carregando os dados

netflix = pd.read_excel("data1/netflix_series_limpo.xlsx")
imdb = pd.read_excel("data1/imdb_series.xlsx")
print(netflix.head)
<bound method NDFrame.head of      series_title     season               episode       Date
0            Away   Season 1                  Home 2020-10-06
1            Away   Season 1                Spektr 2020-10-06
2            Away   Season 1           Vital Signs 2020-10-05
3            Away   Season 1        Goodnight Mars 2020-10-05
4            Away   Season 1        A Little Faith 2020-10-05
..            ...        ...                   ...        ...
918  Orphan Black   Season 1     Natural Selection 2015-08-21
919   Bates Motel   Season 2   The Immutable Truth 2015-08-15
920   Bates Motel   Season 2               The Box 2015-08-15
921   Bates Motel   Season 2              Meltdown 2015-08-14
922   Bates Motel   Season 2     Presumed Innocent 2015-08-13

[923 rows x 4 columns]>

Também podemos ver rapidamente a estrutura de cada dataset, utilizando a função dtypes e describe().

netflix.describe()
Date
count 923
mean 2018-05-29 23:58:26.392199424
min 2015-08-13 00:00:00
25% 2017-01-06 00:00:00
50% 2018-09-10 00:00:00
75% 2019-07-30 00:00:00
max 2020-10-06 00:00:00
netflix.dtypes
series_title            object
season                  object
episode                 object
Date            datetime64[ns]
dtype: object

Vamos converter a coluna season em variável categórica.

netflix['season'].astype('category')
0       Season 1
1       Season 1
2       Season 1
3       Season 1
4       Season 1
         ...    
918     Season 1
919     Season 2
920     Season 2
921     Season 2
922     Season 2
Name: season, Length: 923, dtype: category
Categories (32, object): [' 1ª temporada', ' Back in Business', ' Berry Bitty Adventures', ' Chapter Eight', ..., ' Turma da Mônica', ' Volume 1', ' Volume 2', ' Welcome to Ever After High']

Podemos ver as dimensões do nosso dataframe:

netflix.shape
(923, 4)

Selecionando colunas

Para selecionar colunas utilizamos a forma df[['coluna']] onde df é o nome do dataframe e ‘coluna’ é o nome da coluna ou colunas que queremos selecionar.

netflix[['series_title','season']]
series_title season
0 Away Season 1
1 Away Season 1
2 Away Season 1
3 Away Season 1
4 Away Season 1
... ... ...
918 Orphan Black Season 1
919 Bates Motel Season 2
920 Bates Motel Season 2
921 Bates Motel Season 2
922 Bates Motel Season 2

923 rows × 2 columns

Ordenando os dados

Para ordenar as linhas, podemos utilizar a função sort_values() de forma a termos, por exemplo, uma lista de maior a menor de um determinado valor. Vamos usar o dataframe imdb para exemplificar, ordenando as linhas por ordem crescente de UserRating:

imdb.head()
series_name Episode series_ep season season_ep url UserRating UserVotes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10
0 13 Reasons Why Tape 1, Side A 1 1 1 http://www.imdb.com/title/tt5174246/?ref_=ttep... 8.3 7016 0.055445 0.004418 0.004704 0.005844 0.014681 0.032640 0.105188 0.237030 0.246579 0.293472
1 13 Reasons Why Tape 1, Side B 2 1 2 http://www.imdb.com/title/tt5174248/?ref_=ttep... 8.0 5859 0.056665 0.004438 0.005803 0.008022 0.016726 0.045400 0.138420 0.311145 0.171872 0.241509
2 13 Reasons Why Tape 2, Side A 3 1 3 http://www.imdb.com/title/tt5174250/?ref_=ttep... 7.9 5509 0.058813 0.003993 0.005627 0.009984 0.022509 0.051734 0.160828 0.302414 0.140860 0.243238
3 13 Reasons Why Tape 2, Side B 4 1 4 http://www.imdb.com/title/tt5174252/?ref_=ttep... 8.1 5309 0.063477 0.003956 0.005086 0.007911 0.019213 0.045960 0.129215 0.298550 0.171219 0.255415
4 13 Reasons Why Tape 3, Side A 5 1 5 http://www.imdb.com/title/tt5174254/?ref_=ttep... 8.2 5252 0.066832 0.002666 0.004570 0.008378 0.018850 0.037129 0.114242 0.257997 0.203542 0.285796
imdb = imdb.sort_values('UserRating',
                ascending=False)

Agora vamos ver como ficou:

imdb[['series_name','UserRating']]
series_name UserRating
2359 Lúcifer 9.8
134 Dark 9.7
1404 The Walking Dead 9.7
1183 Friends 9.7
2371 Lúcifer 9.7
... ... ...
773 Pânico: A Série de TV 5.5
774 Pânico: A Série de TV 5.2
1908 Dracula 5.2
42 13 Reasons Why 5.2
137 Dracula 5.2

2597 rows × 2 columns

Filtrando Linhas

Podemos filtrar linhas utilizando o método df.query() onde df é o nome do dataframe.

imdb.query('UserVotes > 10000')
series_name Episode series_ep season season_ep url UserRating UserVotes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10
2359 Lúcifer A Devil of My Word 55 3 24 http://www.imdb.com/title/tt8253126/?ref_=ttep... 9.8 10712 0.021004 0.001587 0.000840 0.000747 0.002054 0.002801 0.007468 0.018764 0.063107 0.881628
134 Dark The Paradise 26 3 8 http://www.imdb.com/title/tt12557704/?ref_=tte... 9.7 17468 0.020151 0.002633 0.003263 0.002805 0.004351 0.006698 0.012308 0.022556 0.059824 0.865411
1404 The Walking Dead No Way Out 57 6 9 http://www.imdb.com/title/tt4575388/?ref_=ttep... 9.7 24358 0.027424 0.003490 0.002094 0.002669 0.005665 0.007390 0.015519 0.033008 0.101979 0.800764
1371 The Walking Dead Too Far Gone 24 4 8 http://www.imdb.com/title/tt2948638/?ref_=ttep... 9.7 22061 0.044876 0.002584 0.001541 0.001496 0.004624 0.006029 0.013100 0.039255 0.126286 0.760210
985 The Walking Dead No Way Out 57 6 9 http://www.imdb.com/title/tt4575388/?ref_=ttep... 9.7 24358 0.027424 0.003490 0.002094 0.002669 0.005665 0.007390 0.015519 0.033008 0.101979 0.800764
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1411 The Walking Dead Last Day on Earth 64 6 16 http://www.imdb.com/title/tt4589574/?ref_=ttep... 6.6 26981 0.263667 0.036878 0.037285 0.037211 0.052259 0.049590 0.068826 0.090693 0.104221 0.259368
992 The Walking Dead Last Day on Earth 64 6 16 http://www.imdb.com/title/tt4589574/?ref_=ttep... 6.6 26981 0.263667 0.036878 0.037285 0.037211 0.052259 0.049590 0.068826 0.090693 0.104221 0.259368
1337 Stranger Things Chapter Seven: The Lost Sister 15 2 7 http://www.imdb.com/title/tt6020810/?ref_=ttep... 6.1 21371 0.166955 0.045716 0.047869 0.059239 0.100791 0.118198 0.157784 0.116794 0.058958 0.127696
1417 The Walking Dead Swear 70 7 6 http://www.imdb.com/title/tt5207734/?ref_=ttep... 5.6 11971 0.197978 0.042686 0.047615 0.060647 0.102164 0.140506 0.133823 0.087127 0.037591 0.149862
998 The Walking Dead Swear 70 7 6 http://www.imdb.com/title/tt5207734/?ref_=ttep... 5.6 11971 0.197978 0.042686 0.047615 0.060647 0.102164 0.140506 0.133823 0.087127 0.037591 0.149862

124 rows × 18 columns

Se quisermos combinar filtrado de linhas com seleção de colunas:

imdb_10 = imdb.query('UserVotes > 10000')[['series_name','Episode','UserRating','UserVotes']]
imdb_10.head(10)
series_name Episode UserRating UserVotes
2359 Lúcifer A Devil of My Word 9.8 10712
134 Dark The Paradise 9.7 17468
1404 The Walking Dead No Way Out 9.7 24358
1371 The Walking Dead Too Far Gone 9.7 22061
985 The Walking Dead No Way Out 9.7 24358
1314 Sherlock The Reichenbach Fall 9.7 33904
1307 Friends The Last One 9.7 11634
952 The Walking Dead Too Far Gone 9.7 22061
961 The Walking Dead No Sanctuary 9.6 23608
133 Dark Between the Time 9.6 13301
imdb_10.shape
(124, 4)

vamos identificar se há duplicados.

duplicates = imdb_10.duplicated()
print(duplicates)
2359    False
134     False
1404    False
1371    False
985      True
        ...  
1411    False
992      True
1337    False
1417    False
998      True
Length: 124, dtype: bool

Notamos que existem duplicados, portanto precisamos removê-los utilizando o método drop_duplicates().

imdb_10 = imdb_10.drop_duplicates(subset=['Episode'])

Vamos ver quantas linhas temos agora.

imdb_10.shape
(91, 4)

também podemos filtrar por strings. Por exemplo, vamos filtrar pelo seriado “Dark”:

imdb['series_name'].isin(['Dark'])
2359    False
134      True
1404    False
1183    False
2371    False
        ...  
773     False
774     False
1908    False
42      False
137     False
Name: series_name, Length: 2597, dtype: bool

Vemos que o código acima retorna uma série com valores lógicos (True-False) para cada linha. Para efetivamente ver o resultado do seriado Dark devemos:

imdb[imdb['series_name'].isin(['Dark'])]
series_name Episode series_ep season season_ep url UserRating UserVotes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10
134 Dark The Paradise 26 3 8 http://www.imdb.com/title/tt12557704/?ref_=tte... 9.7 17468 0.020151 0.002633 0.003263 0.002805 0.004351 0.006698 0.012308 0.022556 0.059824 0.865411
133 Dark Between the Time 25 3 7 http://www.imdb.com/title/tt12557700/?ref_=tte... 9.6 13301 0.020374 0.001880 0.002556 0.002707 0.004285 0.006842 0.014811 0.031577 0.073904 0.841065
124 Dark An Endless Cycle 16 2 6 http://www.imdb.com/title/tt10454734/?ref_=tte... 9.6 11149 0.007983 0.000807 0.001256 0.001435 0.001884 0.005292 0.016055 0.045834 0.124765 0.794690
131 Dark Life and Death 23 3 5 http://www.imdb.com/title/tt12557688/?ref_=tte... 9.5 10359 0.021238 0.001834 0.002510 0.002993 0.005502 0.010908 0.020948 0.044985 0.123661 0.765421
126 Dark Endings and Beginnings 18 2 8 http://www.imdb.com/title/tt10457654/?ref_=tte... 9.5 11379 0.012040 0.002109 0.001846 0.002373 0.004482 0.009667 0.020301 0.045698 0.123473 0.778012
122 Dark The Travelers 14 2 4 http://www.imdb.com/title/tt10454726/?ref_=tte... 9.5 9968 0.007925 0.000903 0.001204 0.001505 0.002307 0.006120 0.019161 0.056180 0.166132 0.738563
123 Dark Lost and Found 15 2 5 http://www.imdb.com/title/tt10454732/?ref_=tte... 9.4 9210 0.007600 0.000869 0.001303 0.001629 0.002606 0.007166 0.023127 0.075570 0.197720 0.682410
125 Dark The White Devil 17 2 7 http://www.imdb.com/title/tt10457652/?ref_=tte... 9.3 9050 0.008066 0.001657 0.001878 0.001768 0.003757 0.009392 0.026851 0.083315 0.204972 0.658343
132 Dark Light and Shadow 24 3 6 http://www.imdb.com/title/tt12557694/?ref_=tte... 9.3 10060 0.024453 0.001889 0.002783 0.004175 0.006362 0.010934 0.025547 0.063221 0.127137 0.733499
116 Dark As You Sow, so You Shall Reap 8 1 8 http://www.imdb.com/title/tt7313316/?ref_=ttep... 9.2 9294 0.007209 0.001076 0.000753 0.001829 0.003443 0.011298 0.030773 0.101033 0.257586 0.585001
118 Dark Alpha and Omega 10 1 10 http://www.imdb.com/title/tt7313322/?ref_=ttep... 9.2 9758 0.010248 0.001947 0.001230 0.003279 0.005944 0.011273 0.032589 0.086801 0.229248 0.617442
121 Dark Ghosts 13 2 3 http://www.imdb.com/title/tt10454722/?ref_=tte... 9.2 9138 0.007004 0.000766 0.001970 0.002079 0.004487 0.008645 0.030422 0.100897 0.226855 0.616875
114 Dark Sic Mundus Creatus Est 6 1 6 http://www.imdb.com/title/tt7313312/?ref_=ttep... 9.1 9434 0.006996 0.000742 0.001272 0.001272 0.004240 0.014204 0.035616 0.115222 0.264151 0.556286
130 Dark The Origin 22 3 4 http://www.imdb.com/title/tt12557686/?ref_=tte... 9.1 9529 0.022248 0.003883 0.004722 0.004198 0.006611 0.013118 0.033582 0.082695 0.163816 0.665128
113 Dark Truths 5 1 5 http://www.imdb.com/title/tt7313308/?ref_=ttep... 9.0 9563 0.005751 0.000837 0.001046 0.002301 0.004183 0.014744 0.045070 0.134790 0.278992 0.512287
120 Dark Dark Matter 12 2 2 http://www.imdb.com/title/tt10454716/?ref_=tte... 9.0 9221 0.006941 0.001084 0.001193 0.002603 0.003579 0.010303 0.034920 0.122655 0.247479 0.569244
127 Dark Deja-vu 19 3 1 http://www.imdb.com/title/tt10414808/?ref_=tte... 9.0 10609 0.022622 0.003205 0.003959 0.004430 0.006221 0.014704 0.039872 0.098407 0.202658 0.603921
129 Dark Adam and Eva 21 3 3 http://www.imdb.com/title/tt12557682/?ref_=tte... 8.9 9364 0.022747 0.002777 0.004806 0.004058 0.008330 0.016339 0.040474 0.109141 0.194682 0.596647
128 Dark The Survivors 20 3 2 http://www.imdb.com/title/tt12557670/?ref_=tte... 8.9 9666 0.022967 0.003311 0.004138 0.004552 0.006104 0.017381 0.045520 0.119284 0.202773 0.573971
119 Dark Beginnings and Endings 11 2 1 http://www.imdb.com/title/tt7787482/?ref_=ttep... 8.9 9969 0.007122 0.001505 0.001404 0.003611 0.004213 0.013442 0.049955 0.154780 0.238239 0.525730
115 Dark Crossroads 7 1 7 http://www.imdb.com/title/tt7305824/?ref_=ttep... 8.8 8725 0.005845 0.001261 0.001032 0.002636 0.005501 0.015931 0.051920 0.194269 0.275186 0.446418
117 Dark Everything Is Now 9 1 9 http://www.imdb.com/title/tt7313320/?ref_=ttep... 8.8 8599 0.006978 0.001744 0.001512 0.001279 0.006280 0.018025 0.055472 0.185719 0.260147 0.462845
111 Dark Past and Present 3 1 3 http://www.imdb.com/title/tt7305820/?ref_=ttep... 8.7 9585 0.006051 0.001148 0.001669 0.002608 0.006886 0.023996 0.067710 0.208242 0.255921 0.425769
109 Dark Secrets 1 1 1 http://www.imdb.com/title/tt6305578/?ref_=ttep... 8.3 11242 0.007917 0.001245 0.003202 0.003914 0.011564 0.030422 0.106298 0.274239 0.194094 0.367105
112 Dark Double Lives 4 1 4 http://www.imdb.com/title/tt7305818/?ref_=ttep... 8.3 9088 0.005722 0.001761 0.001540 0.003631 0.009793 0.029820 0.102993 0.281690 0.180238 0.382812
110 Dark Lies 2 1 2 http://www.imdb.com/title/tt7305776/?ref_=ttep... 8.2 9903 0.006160 0.001616 0.003130 0.002827 0.011108 0.034939 0.119661 0.293547 0.165303 0.361709

vamos guardar este dataframe com o nome darkdf:

darkdf = imdb[imdb['series_name'].isin(['Dark'])]

Agrupando dados por categorias

Vamos agrupar os dados de imdb e calcular algumas informações: a contagem de capítulos por seriado, a média do UserRating por seriado e a soma total dos votos.

imdb_summary = imdb.groupby(['series_name']).agg({'series_name':"count",'UserRating':'mean','UserVotes':'sum'})
imdb_summary
series_name UserRating UserVotes
series_name
13 Reasons Why 49 7.248980 151452
Arquivo X 356 7.988202 969628
Away 10 7.080000 5388
Como Defender um Assassino 180 8.568889 228352
Dark 26 9.076923 264631
Dracula 6 6.966667 28858
Era Uma Vez 312 8.266026 397332
Friends 235 8.451915 853287
Greenleaf 60 7.586667 2777
Grimm: Contos de Terror 123 8.435772 104967
Hemlock Grove 33 7.490909 12538
La Casa de Papel 31 8.209677 161782
Lúcifer 75 8.744000 298412
Motel Bates 50 8.592000 78773
Narcos 30 8.760000 122160
O Bom Lugar 50 8.272000 98562
Orange Is the New Black 91 8.172527 156565
Orphan Black 50 8.528000 60532
Os Originais 184 8.828261 179686
Pânico: A Série de TV 29 7.537931 28078
Ratched 8 7.812500 8234
Sherlock 15 8.800000 373114
Sleepy Hollow 62 7.680645 33079
Sobrenatural 211 8.389573 499046
Stranger Things 25 8.676000 391288
The Walking Dead 288 8.006944 2394586
The Witcher 8 8.487500 122656

Podemos fazer o mesmo para o dataframe darkdf:

darkdf_grouped = darkdf.groupby(['series_name']).count()

darkdf_grouped
Episode series_ep season season_ep url UserRating UserVotes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10
series_name
Dark 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26

Uma boa prática é retornar o nome da coluna, neste caso series_name para uma nova coluna, e desta forma, deixar o index novamente numérico, para isto utilizamos a função reset_index.

darkdf_grouped.reset_index()
series_name Episode series_ep season season_ep url UserRating UserVotes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10
0 Dark 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26

Vamos fazer o mesmo para o imdb_summary:

imdb_summary2 = imdb_summary.rename(columns={'series_name':'Episodes'})
imdb_summary2.reset_index(inplace=True)

imdb_summary2.head(7)
series_name Episodes UserRating UserVotes
0 13 Reasons Why 49 7.248980 151452
1 Arquivo X 356 7.988202 969628
2 Away 10 7.080000 5388
3 Como Defender um Assassino 180 8.568889 228352
4 Dark 26 9.076923 264631
5 Dracula 6 6.966667 28858
6 Era Uma Vez 312 8.266026 397332

Modificando ou criando novas colunas

Similar à função mutate() no R, podemos usar o método df.assign em Python.

imdb_rtotal = imdb.assign(r_total=imdb['r1']+imdb['r2']+imdb['r3']+imdb['r4']+imdb['r5']+imdb['r6']+imdb['r7']+imdb['r8']+imdb['r9']+imdb['r10'])

Agora podemos selecionar apenas as colunas que nos interessam.

imdb_rtotal[['series_name','Episode','r_total']]
series_name Episode r_total
2359 Lúcifer A Devil of My Word 1.0
134 Dark The Paradise 1.0
1404 The Walking Dead No Way Out 1.0
1183 Friends The One Where Everybody Finds Out 1.0
2371 Lúcifer Who's da New King of Hell? 1.0
... ... ... ...
773 Pânico: A Série de TV Blindspots 1.0
774 Pânico: A Série de TV Endgame 1.0
1908 Dracula The Dark Compass 1.0
42 13 Reasons Why Senior Camping Trip 1.0
137 Dracula The Dark Compass 1.0

2597 rows × 3 columns

também podemos criar novas colunas utilizando outra notação, vamos assumir que queremos criar uma nova coluna no dataframe imdb_summary :

imdb_summary2['VotesPerEpisode']=imdb_summary2['UserVotes']/imdb_summary2['Episodes']

imdb_summary2.head(8)
series_name Episodes UserRating UserVotes VotesPerEpisode
0 13 Reasons Why 49 7.248980 151452 3090.857143
1 Arquivo X 356 7.988202 969628 2723.674157
2 Away 10 7.080000 5388 538.800000
3 Como Defender um Assassino 180 8.568889 228352 1268.622222
4 Dark 26 9.076923 264631 10178.115385
5 Dracula 6 6.966667 28858 4809.666667
6 Era Uma Vez 312 8.266026 397332 1273.500000
7 Friends 235 8.451915 853287 3631.008511

e agora vamos ordenar pela nova coluna VotesPerEpisode e arredondar para dois decimais:

imdb_summary2.sort_values('VotesPerEpisode',
                ascending=False).round(decimals=2)
series_name Episodes UserRating UserVotes VotesPerEpisode
21 Sherlock 15 8.80 373114 24874.27
24 Stranger Things 25 8.68 391288 15651.52
26 The Witcher 8 8.49 122656 15332.00
4 Dark 26 9.08 264631 10178.12
25 The Walking Dead 288 8.01 2394586 8314.53
11 La Casa de Papel 31 8.21 161782 5218.77
5 Dracula 6 6.97 28858 4809.67
14 Narcos 30 8.76 122160 4072.00
12 Lúcifer 75 8.74 298412 3978.83
7 Friends 235 8.45 853287 3631.01
0 13 Reasons Why 49 7.25 151452 3090.86
1 Arquivo X 356 7.99 969628 2723.67
23 Sobrenatural 211 8.39 499046 2365.15
15 O Bom Lugar 50 8.27 98562 1971.24
16 Orange Is the New Black 91 8.17 156565 1720.49
13 Motel Bates 50 8.59 78773 1575.46
6 Era Uma Vez 312 8.27 397332 1273.50
3 Como Defender um Assassino 180 8.57 228352 1268.62
17 Orphan Black 50 8.53 60532 1210.64
20 Ratched 8 7.81 8234 1029.25
18 Os Originais 184 8.83 179686 976.55
19 Pânico: A Série de TV 29 7.54 28078 968.21
9 Grimm: Contos de Terror 123 8.44 104967 853.39
2 Away 10 7.08 5388 538.80
22 Sleepy Hollow 62 7.68 33079 533.53
10 Hemlock Grove 33 7.49 12538 379.94
8 Greenleaf 60 7.59 2777 46.28

Juntando dois ou mais df

merge une dois DataFrames (df1 e df2) com base em colunas especificadas (on, left_on, right_on). O tipo de junção (how) define como as tabelas serão combinadas:

  • 'inner': Mantém apenas as linhas onde há correspondência nas colunas de junção (interseção).
  • 'left': Mantém todas as linhas do DataFrame esquerdo (df1) e as correspondentes do direito (df2).
  • 'right': Mantém todas as linhas do DataFrame direito (df2) e as correspondentes do esquerdo (df1).
  • 'outer': Mantém todas as linhas de ambos os DataFrames, preenchendo com NaN onde não há correspondência (união).

Use left_on e right_on quando as colunas de junção têm nomes diferentes nos DataFrames. Vamos a usar o pequeno df que foi criado no início deste tutorial para verificar o funcionamento do merge.

df
name age title
0 Alice 25 Sherlock
1 Bob 30 The Walking Dead
2 Charlie 35 Dark
3 Alice 25 Friends
4 David 40 Orange Is the New Black
5 Bob 30 The Walking Dead
6 Camille 20 Narcos

e vamos criar um pequeno df com dados do Netflix:

netflix_5 = {'season': ['Season 1', 'Season 2', 'Season 3', 'Season 4', 'Season 5', 'Part 1','Part 2'],
        'title': ['Sherlock', 'The Walking Dead', 'Dark', 'Friends', 'Orange Is the New Black', 'The Walking Dead','Narcos']}

netflix_5_df = pd.DataFrame(netflix_5)

netflix_5_df
season title
0 Season 1 Sherlock
1 Season 2 The Walking Dead
2 Season 3 Dark
3 Season 4 Friends
4 Season 5 Orange Is the New Black
5 Part 1 The Walking Dead
6 Part 2 Narcos

Vamos realizar o merge:

netflix_5_df.merge(df, on='title', how='left')
season title name age
0 Season 1 Sherlock Alice 25
1 Season 2 The Walking Dead Bob 30
2 Season 2 The Walking Dead Bob 30
3 Season 3 Dark Charlie 35
4 Season 4 Friends Alice 25
5 Season 5 Orange Is the New Black David 40
6 Part 1 The Walking Dead Bob 30
7 Part 1 The Walking Dead Bob 30
8 Part 2 Narcos Camille 20