Из-за того, как устроена таблица, она не так проста, как pd.read_html()
, в то время как это начало, вам нужно будет сделать некоторые манипуляции, чтобы получить желаемый формат:
import pandas as pd
link = "https://en.wikipedia.org/wiki/2018_in_film"
tables = pd.read_html(link,header=0)[5]
# find na values and shift cells right
i = 0
while i < 2:
row_shift = tables[tables['Unnamed: 7'].isnull()].index
tables.iloc[row_shift,:] = tables.iloc[row_shift,:].shift(1,axis=1)
i+=1
# create new column names
tables.columns = ['Month', 'Day', 'Title', 'Studio', 'Cast and crew', 'Genre', 'Country', 'Ref.']
# forward fill values
tables['Month'] = tables['Month'].ffill()
tables['Day'] = tables['Day'].ffill()
out:
Month Day Title Studio Cast and crew Genre Country Ref.
0 JANUARY 5 Insidious: The Last Key Universal Pictures / Blumhouse Productions Adam Robitel (director); Leigh Whannell (scree... Horror, Thriller US [33]
1 JANUARY 5 The Strange Ones Vertical Entertainment Lauren Wolkstein (director); Christopher Radcl... Drama US [34]
2 JANUARY 5 Stratton Momentum Pictures Simon West (director); Duncan Falconer, Warren... Action, Thriller IT, UK [35]
3 JANUARY 10 Sweet Country Samuel Goldwyn Films Warwick Thornton (director); David Tranter, St... Drama AUS [36]
4 JANUARY 12 The Commuter Lionsgate / StudioCanal / The Picture Company Jaume Collet-Serra (director); Byron Willinger... Action, Crime, Drama, Mystery, Thriller US, UK [37]
5 JANUARY 12 Proud Mary Screen Gems Babak Najafi (director); John S. Newman, Chris... Action, Thriller US [38]
6 JANUARY 12 Acts of Violence Lionsgate Premiere Brett Donowho (director); Nicolas Aaron Mezzan... Action, Thriller US [39]
...