Question

У меня есть этот список иерархических URL:

data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]

И я хочу динамически создать вложенный словарь для каждого URL-адреса, для которого существует другой URL-адрес, являющийся его поддоменом / подпапкой.

Я уже попробовал следующее, но оно не возвращает того, что я ожидаю:

result = []
for key,d in enumerate(data):
    form_dict = {}
    r_pattern = re.search(r"(http(s)?://(.*?)/)(.*)",d)
    r = r_pattern.group(4)
    if r == "":
        parent_url = r_pattern.group(3)
    else:
        parent_url = r_pattern.group(3) + "/"+r
    print(parent_url)
    temp_list = data.copy()
    temp_list.pop(key)
    form_dict["name"] = parent_url
    form_dict["children"] = []
    for t in temp_list:
        child_dict = {} 
        if parent_url in t:
            child_dict["name"] = t
            form_dict["children"].append(child_dict.copy())
    result.append(form_dict)

Это ожидаемый результат.

{
   "name":"https://python-rq.org/",
   "children":[
      {
         "name":"https://python-rq.org/a",
         "children":[
            {
               "name":"https://python-rq.org/a/b",
               "children":[

               ]
            }
         ]
      },
      {
         "name":"https://python-rq.org/c",
         "children":[

         ]
      }
   ]
}

Любой совет?

Akaisteph7 · Answer 1 · 09 июля 2019

Это была хорошая проблема. Я попытался продолжить с вашим методом регулярных выражений, но застрял и обнаружил, что сплит на самом деле подходит для этого случая. Следующие работы:

data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]
temp_list = data.copy()
# This removes the last "/" if any URL ends with one. It makes it a lot easier 
# to match the URLs and is not necessary to have a correct link.
data = [x[:-1] if x[-1]=="/" else x for x in data]
print(data)

result = []

# To find a matching parent
def find_match(d, res):
    for t in res:
        if d == t["name"]:
            return t
        elif ( len(t["children"])>0 ):
            temp = find_match(d, t["children"])
            if (temp):
                return temp 
    return None

while len(data) > 0:
    d = data[0]
    form_dict = {}
    l = d.split("/")
    # I removed regex as matching the last parentheses wasn't working out 
    # split does just what you need however
    parent = "/".join(l[:-1])
    data.pop(0)
    form_dict["name"] = d
    form_dict["children"] = []
    option = find_match(parent, result)
    if (option):
        option["children"].append(form_dict)
    else:
        result.append(form_dict)

print(result)

[{'name': 'https://python-rq.org', 'children': [{'name': 'https://python-rq.org/a', 'children': [{'name': 'https://python-rq.org/a/b', 'children': []}]}, {'name': 'https://python-rq.org/c', 'children': []}]}]

Как сделать вложенный словарь на основе списка URL-адресов?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как сделать вложенный словарь на основе списка URL-адресов?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов