Косинус Сходство строк на основе выбранных панд столбцов - PullRequest
1 голос
/ 23 сентября 2019

Я пытаюсь рекомендовать фильм по предметам.В наборе данных фильмов у меня есть метаданные о фильмах, таких как название, жанры, режиссеры, актеры, продюсеры, писатели, year_of_release и т. Д. В настоящее время я вычисляю сходство, основываясь на столбце жанра, только с помощью векторизатора tf-idf, разделив столбец жанра насписок, и он работает совершенно нормально.Вот код, который я использую:

def vector_cosine(df, index):
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(movies['genres'])
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
return cosine_sim[index]

# Function that get movie recommendations based on the cosine similarity score of movie genres 
def genre_recommendations(title, idx):
sim_scores = list(enumerate(vector_cosine(movies,idx)))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:6]
movie_indices = [i[0] for i in sim_scores]
recommendations = list(titles.iloc[movie_indices])
if title in recommendations: recommendations.remove(title)
return recommendations

, и это пример данных:

{"movie_id":"217d1207-effc-4605-bdef-899339615fe6","title":"Mamma Mia! Here We Go Again","release_date":"2018","actor_names":["Amanda Seyfried","Andy Garc\u00eda","Cher","Christine Baranski","Colin Firth","Dominic Cooper","Jessica Keenan Wynn","Julie Walters","Lily James","Meryl Streep","Pierce Brosnan","Stellan Skarsg\u00e5rd"],"director_names":["Ol Parker"],"producer_names":["Gary Goetzman","Judy Craymer"],"genres":"['Comedy', 'Music', 'Romance']","rating_aus":"PG","rating_nzl":"PG","rating_usa":null} {"movie_id":"59a0a1cd-5b83-401d-9b30-caa2c7e4f46c","title":"The Spy Who Dumped Me","release_date":"2018","actor_names":["Carolyn Pickles","Fred Melamed","Gillian Anderson","Hasan Minhaj","Ivanna Sakhno","James Fleet","Jane Curtin","Justin Theroux","Kate McKinnon","Mila Kunis","Paul Reiser","Sam Heughan"],"director_names":["Susanna Fogel"],"producer_names":null,"genres":"['Action', 'Comedy']","rating_aus":null,"rating_nzl":"R16","rating_usa":null} {"movie_id":"35e19192-6fb0-4c2b-9591-0d70deb30db6","title":"Beirut","release_date":"2018","actor_names":["Alon Aboutboul","Dean Norris","Douglas Hodge","Jon Hamm","Jonny Coyne","Kate Fleetwood","Larry Pine","Le\u00efla Bekhti","Mark Pellegrino","Rosamund Pike","Shea Whigham","Sonia Okacha"],"director_names":["Brad Anderson"],"producer_names":["Mike Weber","Monica Levinson","Shivani Rawat","Ted Field","Tony Gilroy"],"genres":"['Action', 'Thriller']","rating_aus":"MA15+","rating_nzl":"M","rating_usa":null} {"movie_id":"6e3c57d8-51c9-4b20-afe7-e09d949b2d7f","title":"Mile 22","release_date":"2018","actor_names":["Alexandra Vino","Iko Uwais","John Malkovich","Lauren Cohan","Lauren Mary Kim","Mark Wahlberg","Nikolai Nikolaeff","Poorna Jagannathan","Ronda Jean Rousey","Sala Baker","Sam Medina","Terry Kinney"],"director_names":["Peter Berg"],"producer_names":["Mark Wahlberg","Peter Berg","Stephen Levinson"],"genres":"['Action']","rating_aus":"MA15+","rating_nzl":"R16","rating_usa":null} {"movie_id":"cde85555-c944-44ef-a3b1-a5bf842646ad","title":"The Meg","release_date":"2018","actor_names":["Cliff Curtis","James Gaylyn","Jason Statham","Jessica McNamee","Li Bingbing","Masi Oka","Page Kennedy","Rainn Wilson","Robert Taylor","Ruby Rose","Tawanda Manyimo","Winston Chao"],"director_names":["Jon Turteltaub"],"producer_names":["Belle Avery","Colin Wilson","Lorenzo di Bonaventura"],"genres":"['Action', 'Science Fiction', 'Thriller']","rating_aus":"M","rating_nzl":"M","rating_usa":null} {"movie_id":"d1496722-2713-4b29-a323-18a0e2a0d6f3","title":"Monster's Ball","release_date":"2001","actor_names":["Amber Rules","Billy Bob Thornton","Charles Cowan Jr.","Coronji Calhoun","Gabrielle Witcher","Halle Berry","Heath Ledger","Peter Boyle","Sean Combs","Taylor LaGrange","Taylor Simpson","Yasiin Bey"],"director_names":["Marc Forster"],"producer_names":["Lee Daniels"],"genres":"['Drama', 'Romance']","rating_aus":null,"rating_nzl":"R16","rating_usa":null} {"movie_id":"844b309e-34ef-47bf-b981-eadc1d915886","title":"How to Be a Latin Lover","release_date":"2017","actor_names":["Anne McDaniels","Eugenio Derbez","Kristen Bell","Mckenna Grace","Michael Cera","Michaela Watkins","Raquel Welch","Rob Corddry","Rob Huebel","Rob Lowe","Rob Riggle","Salma Hayek"],"director_names":["Ken Marino"],"producer_names":null,"genres":"['Comedy']","rating_aus":"M","rating_nzl":"R13","rating_usa":null} {"movie_id":"991e3711-9918-41a0-b660-3c53ffa4901c","title":"Good Fortune","release_date":"2016","actor_names":null,"director_names":["Joshua Tickell","Rebecca Harrell Tickell"],"producer_names":null,"genres":"['Documentary']","rating_aus":"PG","rating_nzl":null,"rating_usa":null} {"movie_id":"7bf1935e-7d1a-437c-a71a-b4c70eb4f853","title":"Paper Heart","release_date":"2009","actor_names":["Charlyne Yi","Gill Summers","Jake Johnson","Martin Starr","Michael Cera","Seth Rogen"],"director_names":["Nicholas Jasenovec"],"producer_names":null,"genres":"['Comedy', 'Drama', 'Romance']","rating_aus":"M","rating_nzl":"M","rating_usa":null} {"movie_id":"4cd0e423-acde-4bf7-bf93-f77777a4de6f","title":"Daybreakers","release_date":"2009","actor_names":["Claudia Karvan","Emma Randall","Ethan Hawke","Harriet Minto","Day","Isabel Lucas","Jay Laga'aia","Michael Dorman","Mungo McKay","Sam Neill","Tiffany Lamb","Vince Colosimo","Willem Dafoe"],"director_names":["Michael Spierig","Peter Spierig"],"producer_names":["Bryan Furst","Chris Brown","Sean Furst","Todd Fellman"],"genres":"['Action', 'Fantasy', 'Horror', 'Science Fiction']","rating_aus":"MA15+","rating_nzl":"R16","rating_usa":null} {"movie_id":"c0a84525-46a4-4977-86af-7fc8cb014683","title":"Requiem for a Dream","release_date":"2000","actor_names":["Charlotte Aronofsky","Christopher McDonald","Ellen Burstyn","Janet Sarno","Jared Leto","Jennifer Connelly","Joanne Gordon","Louise Lasser","Marcia Jean Kurtz","Mark Margolis","Marlon Wayans","Suzanne Shepherd"],"director_names":["Darren Aronofsky"],"producer_names":["Eric Watson","Palmer West"],"genres":"['Crime', 'Drama']","rating_aus":null,"rating_nzl":"R18","rating_usa":null} {"movie_id":"fe5367fe-b558-4fbe-872f-ab041ef58213","title":"Grizzly Man","release_date":"2005","actor_names":["David Letterman","Jewel Palovak","Kathleen Parker","Sam Egli","Timothy Treadwell","Warren Queeney","Werner Herzog","Willy Fulton"],"director_names":["Werner Herzog"],"producer_names":["Erik Nelson"],"genres":"['Documentary']","rating_aus":"M","rating_nzl":null,"rating_usa":null} {"movie_id":"6bd66f6b-2834-4013-884f-7eaf257e09fb","title":"The Great Buck Howard","release_date":"2008","actor_names":["Adam Scott","Colin Hanks","Debra Monk","Emily Blunt","Griffin Dunne","John Malkovich","Jonathan Ames","Patrick Fischler","Ricky Jay","Steve Zahn","Tom Hanks","Wallace Langham"],"director_names":["Sean McGinly"],"producer_names":["Gary Goetzman","Tom Hanks"],"genres":"['Comedy', 'Drama']","rating_aus":"G","rating_nzl":"G","rating_usa":null} {"movie_id":"78b22fc2-5069-40da-b4ac-790ec3902a32","title":"Boo! A Madea Halloween","release_date":"2016","actor_names":["Andre Hall","Bella Thorne","Brock O'Hurn","Cassi Davis","Diamond White","Jimmy Tatro","Kian Lawley","Lexy Panterra","Liza Koshy","Patrice Lovely","Tyler Perry","Yousef Erakat"],"director_names":["Tyler Perry"],"producer_names":null,"genres":"['Comedy', 'Drama', 'Horror']","rating_aus":"M","rating_nzl":null,"rating_usa":null} {"movie_id":"bc5c3635-bbfb-4dd7-b0ed-408787fd5f43","title":"Fantastic Beasts and Where to Find Them","release_date":"2016","actor_names":["Alison Sudol","Carmen Ejogo","Colin Farrell","Dan Fogler","Eddie Redmayne","Ezra Miller","Johnny Depp","Jon Voight","Katherine Waterston","Ron Perlman","Samantha Morton","Zo\u00eb Kravitz"],"director_names":["David Yates"],"producer_names":["David Heyman","J.K. Rowling","Lionel Wigram","Steve Kloves"],"genres":"['Adventure', 'Family', 'Fantasy']","rating_aus":"M","rating_nzl":"M","rating_usa":null} {"movie_id":"1ec3a043-a6c4-44a0-9fb1-c948eb07cf85","title":"Silver Linings Playbook","release_date":"2012","actor_names":["Anupam Kher","Bonnie Aarons","Bradley Cooper","Brea Bee","Chris Tucker","Dash Mihok","Jacki Weaver","Jennifer Lawrence","John Ortiz","Julia Stiles","Robert De Niro","Shea Whigham"],"director_names":["David O. Russell"],"producer_names":["Bruce Cohen","Donna Gigliotti","Jonathan Gordon","Mark Kamine"],"genres":"['Comedy', 'Drama', 'Romance']","rating_aus":"M","rating_nzl":"M","rating_usa":null} {"movie_id":"4c0bbde5-7a34-4556-a481-3357ef69b651","title":"The Equalizer","release_date":"2014","actor_names":["Alex Veadov","Bill Pullman","Chlo\u00eb Grace Moretz","David Harbour","David Meunier","Denzel Washington","E. Roger Mitchell","Haley Bennett","Johnny Skourtis","Marton Csokas","Melissa Leo","Vladimir Kulich"],"director_names":["Antoine Fuqua"],"producer_names":["Alex Siskin","Denzel Washington","Jason Blumenthal","Mace Neufeld","Michael Sloan","Richard Wenk","Steve Tisch","Todd Black","Tony Eldridge"],"genres":"['Action', 'Crime', 'Thriller']","rating_aus":"MA15+","rating_nzl":"R18","rating_usa":null} {"movie_id":"809f3131-7445-4ffb-8b77-6de6699c85c4","title":"The Notebook","release_date":"2004","actor_names":["David Thornton","Ed Grady","Gena Rowlands","James Garner","James Marsden","Jennifer Echols","Joan Allen","Kevin Connolly","Rachel McAdams","Ryan Gosling","Sam Shepard","Starletta DuPois"],"director_names":["Nick Cassavetes"],"producer_names":["Lynn Harris","Mark Johnson"],"genres":"['Drama', 'Romance']","rating_aus":"PG","rating_nzl":"PG","rating_usa":null} {"movie_id":"49652b6d-b818-4fc4-9c57-9ec7b5c346cc","title":"The Matrix","release_date":"1999","actor_names":["Anthony Ray Parker","Belinda McClory","Carrie","Anne Moss","Gloria Foster","Hugo Weaving","Joe Pantoliano","Julian Arahanga","Keanu Reeves","Laurence Fishburne","Marcus Chong","Paul Goddard","Robert Taylor"],"director_names":["Lana Wachowski","Lilly Wachowski"],"producer_names":["Joel Silver"],"genres":"['Action', 'Science Fiction']","rating_aus":"M","rating_nzl":"M","rating_usa":null} {"movie_id":"d356a087-4e89-420b-867c-618544969302","title":"The Hunger Games","release_date":"2012","actor_names":["Alexander Ludwig","Donald Sutherland","Elizabeth Banks","Isabelle Fuhrman","Jennifer Lawrence","Josh Hutcherson","Lenny Kravitz","Liam Hemsworth","Stanley Tucci","Toby Jones","Wes Bentley","Woody Harrelson"],"director_names":["Gary Ross"],"producer_names":["Jon Kilik","Nina Jacobson"],"genres":"['Adventure', 'Fantasy', 'Science Fiction']","rating_aus":"M","rating_nzl":"M","rating_usa":null} {"movie_id":"0ef84c8c-ffc5-4896-9e3b-5303acba0ff3","title":"The Wolf of Wall Street","release_date":"2013","actor_names":["Brian Sacca","Henry Zebrowski","Jon Bernthal","Jon Favreau","Jonah Hill","Kenneth Choi","Kyle Chandler","Leonardo DiCaprio","Margot Robbie","Matthew McConaughey","P. J. Byrne","Rob Reiner"],"director_names":["Martin Scorsese"],"producer_names":["Emma Tillinger Koskoff","Joey McFarland","Leonardo DiCaprio","Martin Scorsese","Riza Aziz"],"genres":"['Comedy', 'Crime', 'Drama']","rating_aus":"R18+","rating_nzl":"R18","rating_usa":null}

Далее я хочу вычислить сходство на основе нескольких столбцов.Подскажите, пожалуйста, как мне этого добиться.

1 Ответ

1 голос
/ 23 сентября 2019

Вы можете использовать cosine_similarity из sklearn.metrics.pairwise, например:

sim_df = cosine_similarity(df)

Затем вы можете получить 10 лучших похожих фильмов для определенного фильма, например:

sim_df[movie_index].argsort()[-10:][::-1]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...