Я пытался изучить команду SQL over partition
с использованием sqlite3 и pandas.
Вот база данных и запрос:
import sqlite3
import numpy as np
import pandas as pd
# create db if does not exist
con = sqlite3.connect("student.db")
data = [ ('Jolly', 'Female', 20, 500),
('Jon', 'Male', 22, 545),
('Sara', 'Female', 25, 600),
('Laura', 'Female', 18, 400),
('Alan', 'Male', 20, 500),
('Kate', 'Female', 22, 500),
('Joseph', 'Male', 18, 643),
('Mice', 'Male', 23, 543),
('Wise', 'Male', 21, 499),
('Elis', 'Female', 27, 400)]
df = pd.DataFrame(data,columns=['name','gender','age','total_score'])
df['id'] = range(len(df))
df_copy = df.copy()
# load df to database
df.to_sql("student", con, if_exists='replace',index=False)
# join
# This method is memory inefficient
q = """ SELECT id, name,
Aggregation.gender, Aggregation.Total_students,
Aggregation.Average_Age, Aggregation.Total_Score
FROM student
INNER JOIN
(SELECT gender, count(gender) AS Total_students, AVG(age) AS Average_Age, SUM(total_score) AS Total_Score
FROM student
GROUP BY gender) AS Aggregation
on Aggregation.gender = student.gender
"""
display(pd.read_sql_query(q,con))
Я получаю вывод, подобный этому: 
Теперь, когда я пытаюсь выполнить ту же команду, используя OVER PARTITION, я получаю OperationalError: near "(": синтаксическая ошибка.
Код, который создаетошибки следующие:
# over partition
q3 = """ SELECT id, name, gender,
COUNT(gender) OVER (PARTITION BY gender) AS Total_students,
AVG(age) OVER (PARTITION BY gender) AS Average_Age,
SUM(total_score) OVER (PARTITION BY gender) AS Total_Score
FROM student
"""
pd.read_sql_query(q3,con)
Как решить проблему?