Запись данных в JSON с использованием Python - PullRequest
0 голосов
/ 16 октября 2019

Я пытаюсь сохранить мои данные в файл JSON, используя python. ниже мой код. Я могу удалить данные, но не могу сохранить их в файл JSON. может кто-нибудь сказать мне, где проблема? Я не использовал такую ​​вещь раньше. Я много искал решение, но нет точного решения.

Вот мой КОД

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json

for page in range(1,2):
    url = "https://stackoverflow.com/questions?tab=unanswered&page={}".format(page)
    html = urlopen(url)
    soup = BeautifulSoup(html,"html.parser")
    Container = soup.find_all("div", {"class":"question-summary"})
    for i in Container:
        try:
            title = i.find("a", {"class":"question-hyperlink"}).get_text()
            det = i.find("div", {"class":"excerpt"}).get_text()
            tags = i.find("div",{"class":"tags"}).get_text()
            votes = i.find("div",{"class":"votes"}).get_text()
            ans = i.find("div",{"class":"status"}).get_text()
            views = i.find("div",{"class":"views"}).get_text()
            time = i.find("span",{"class":"relativetime"}).get_text()
            print(title, det, tags, votes, ans, views, time )
        except: AttributeError
        ## the problem starts from here.
def questions(f):
    job_dict = {}
    job_dict['Title'] = title
    job_dict['Description'] = det
    job_dict['Tags'] = tags
    job_dict['Votes'] = votes
    job_dict['Answers'] = ans
    job_dict['Views'] = views
    job_dict['Time'] = time

    json_job = json.dumps(job_dict)
    f.seek(0)
    txt = f.readline()
    if txt.endswith("}"):
        f.write(",")
    f.write(json_job)

Ответы [ 2 ]

0 голосов
/ 16 октября 2019

Перед циклом вы должны создать список all_jobs для всех данных.

Внутри try вы должны создать словарь с данными задания и добавить его в список all_jobs

После цикла вымогу написать все сразу.

Если вы попытаетесь написать каждое задание по отдельности, вы можете создать неправильный файл JSON, потому что для него потребуется [ в начале и ] в конце, который я не добавляю в файл в вашем коде.

А в except нужно добавить любой код - хотя бы команду pass, но лучше отобразить сообщение, что возникла проблема. Если вы используете только pass, вы никогда не узнаете, что получили ошибку - и иногда эта ошибка может ответить на вопрос, почему код не дает результатов.


РЕДАКТИРОВАТЬ: Обычно он записывает все в одну строку, но это правильная строка JSON, и нет проблем с чтением в других инструментах. Но если вы хотите отформатировать данные в файле, то вы можете добавить отступы - т.е. json_dump(all_jobs, indent=2).

Вы также можете очистить текст перед сохранением - .get_text(strip=True)


from urllib.request import urlopen
from bs4 import BeautifulSoup
import json

all_jobs = []

for page in range(1, 2):
    url = "https://stackoverflow.com/questions?tab=unanswered&page={}".format(page)
    html = urlopen(url)
    soup = BeautifulSoup(html,"html.parser")
    Container = soup.find_all("div", {"class":"question-summary"})
    for i in Container:
        try:
            title = i.find("a", {"class":"question-hyperlink"}).get_text() # .get_text(strip=True)
            det = i.find("div", {"class":"excerpt"}).get_text()
            tags = i.find("div",{"class":"tags"}).get_text()
            votes = i.find("div",{"class":"votes"}).get_text()
            ans = i.find("div",{"class":"status"}).get_text()
            views = i.find("div",{"class":"views"}).get_text()
            time = i.find("span",{"class":"relativetime"}).get_text()

            print(title, det, tags, votes, ans, views, time )

            job_dict = {}
            job_dict['Title'] = title
            job_dict['Description'] = det
            job_dict['Tags'] = tags
            job_dict['Votes'] = votes
            job_dict['Answers'] = ans
            job_dict['Views'] = views
            job_dict['Time'] = time

            all_jobs.append(job_dict)

        except AttributeError as ex:
            print('Error:', ex)

# --- after loop ---

f = open('output.json', 'w')
#f.write(json.dumps(all_jobs)) # all in one line
f.write(json.dumps(all_jobs, ident=2))
f.close()

РЕДАКТИРОВАТЬ: Импорт напрямую в Elasticsearch с модулем Elastichsearch

from urllib.request import urlopen
from bs4 import BeautifulSoup
from elasticsearch import Elasticsearch

es = Elasticsearch()

for page in range(2):
    url = "https://stackoverflow.com/questions?tab=unanswered&page={}".format(page)
    html = urlopen(url)
    soup = BeautifulSoup(html,"html.parser")

    container = soup.find_all("div", {"class":"question-summary"})
    for item in container:
        try:
            job = {
                'Title': item.find("a", {"class":"question-hyperlink"}).get_text(strip=True),
                'Description': item.find("div", {"class":"excerpt"}).get_text(strip=True),
                'Tags': item.find("div",{"class":"tags"}).get_text(strip=True),
                'Votes': item.find("div",{"class":"votes"}).get_text(strip=True),
                'Answers': item.find("div",{"class":"status"}).get_text(strip=True),
                'Views': item.find("div",{"class":"views"}).get_text(strip=True),
                'Time': item.find("span",{"class":"relativetime"}).get_text(strip=True),
            }
        except AttributeError as ex:
            print('Error:', ex)
            continue

        # --- importing job to Elasticsearch ---

        res = es.index(index="stackoverflow", doc_type='job', body=job) # without `id` to autocreate `id` 
        print(res['result'])


# --- searching ---

#es.indices.refresh(index="stackoverflow")

res = es.search(index="stackoverflow", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total']['value'])
for hit in res['hits']['hits']:
    #print(hit)
    print("%(Title)s: %(Tags)s" % hit["_source"])
0 голосов
/ 16 октября 2019

Код ниже работает

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json

data = []
for page in range(1, 2):
    url = "https://stackoverflow.com/questions?tab=unanswered&page={}".format(page)
    html = urlopen(url)
    soup = BeautifulSoup(html, "html.parser")
    Container = soup.find_all("div", {"class": "question-summary"})
    for i in Container:
        entry = {'title': i.find("a", {"class": "question-hyperlink"}).get_text(),
                 'det': i.find("div", {"class": "excerpt"}).get_text()}
        data.append(entry)
        # TODO Add more attributes


print(json.dumps(data))

вывод

[{"title": "JsTestDriver on NetBeans stops testing after a failed assertion", "det": "\r\n            I have set up JavaScript unit testing with JS Test Driver on Netbeans as per this Link. However, unlike the results in that tutorial, no more tests are executed after an assertion fails. How can I ...\r\n        "}, {"title": "Receiving kAUGraphErr_CannotDoInCurrentContext when calling AUGraphStart for playback", "det": "\r\n            I'm working with AUGraph and Audio Units API to playback and record audio in my iOS app. Now I have a rare issue when an AUGraph is unable to start with the following error:\r\n  result = ...\r\n        "}, {"title": "SilverStripe PHP Forms - If I nest a SelectionGroup inside a FieldGroup, one of the related SelectionGroup_Items' Radio Box does not show up. Why?", "det": "\r\n            I have a form that has two FieldGroups, and in one of the FieldGroups I have a SelectionGroup.\n\nThe SelectionGroup_Items show up in the form FieldGroup but the radio boxes to select one of the options ...\r\n        "}, {"title": "Implementing a Neural Network in Haskell", "det": "\r\n            I'm trying to implement a neural network architecture in Haskell, and use it on MNIST.\n\nI'm using the hmatrix package for the linear algebra.\nMy training framework is built using the pipes package.\n\n...\r\n        "}, {"title": "dequeueBuffer: can't dequeue multiple buffers without setting the buffer count", "det": "\r\n            I'm getting the error below on Android 4.4.2 Moto X 2013 in a Rhomobile 5.0.2 WebView app. The app is compiled with SDK 19 and minAPI 17.\n\nAfter some research it seems that this is an issue with ...\r\n        "}, {"title": "How to read audio data from a 'MediaStream' object in a C++ addon", "det": "\r\n            After sweating blood and tears I've finally managed to set up a Node C++ addon and shove a web-platform standard MediaStream object into one of its C++ methods for good. For compatibility across ...\r\n        "}, {"title": "Akka finite state machine instances", "det": "\r\n            I am trying to leverage Akka's finite state machine framework for my use case. I am working on a system that processes a request that goes through various states.\n\nThe request here is the application ...\r\n        "}, {"title": "How to use classes to \u201ccontrol dreams\u201d?", "det": "\r\n            Background\n\nI've been playing around with Deep Dream and Inceptionism, using the Caffe framework to visualize layers of GoogLeNet, an architecture built for the Imagenet project, a large visual ...\r\n        "}, {"title": "iOS : Use of HKObserverQuery's background update completionHandler", "det": "\r\n            HKObserverQuery has the following method that supports receiving updates in the background:\n\n- initWithSampleType:predicate:updateHandler:\r\nThe updateHandler has a completionHandler which has the ...\r\n        "}, {"title": "Representing Parametric Survival Model in 'Counting Process' form in JAGS", "det": "\r\n            The Problem\n\nI am trying to build a survival-model in JAGS that allows for time-varying covariates. I'd like it to be a parametric model - for example, assuming survival follows the Weibull ...\r\n        "}, {"title": "Separate cookie jar per WebView in OS X", "det": "\r\n            I've been trying to achieve the goal of having a unique (not shared) cookie jar per WebView in macOS (cookies management works different for iOS).\n\nAfter reading a lot of StackOverflow questions and ...\r\n        "}, {"title": "Flexible Space in Android", "det": "\r\n            Using this tutorial to implement a Flexible Space pattern (the one with the collapsing toolbar).\n\nI'm trying to achieve a similar effect as in the Lollipop Contacts activity, which at the beginning ...\r\n        "}, {"title": "How do I upgrade to jlink (JDK 9+) from Java Web Start (JDK 8) for an auto-updating application?", "det": "\r\n            Java 8 and prior versions have Java Web Start, which auto-updates the application when we change it.  Oracle has recommended that users migrate to jlink, as that is the new Oracle technology.  So far, ...\r\n        "}, {"title": "Newly Published App reporting Version as \u201cUnknown\u201d in iTunes Connect", "det": "\r\n            New version of my app is 1.2. But in \"Sales and Trends\" in iTunes Connect I see \"unknown\" app version. Also new reviews not showing in App Store in \"Current version\" tab (only in all versions tab).\n...\r\n        "}, {"title": "VerifyError: Uninitialized object exists on backward branch / JVM Spec 4.10.2.4", "det": "\r\n            The JVM Spec 4.10.2.4 version 7, last paragraph, says\r\n  A valid instruction sequence must not have an uninitialized object on the operand stack or in a local variable at the target of a backwards ...\r\n        "}, {"title": "Pandas read_xml() method test strategies", "det": "\r\n            Interestingly, pandas I/O tools does not maintain a read_xml() method and the counterpart to_xml(). However, read_json proves tree-like structures can be implemented for dataframe import and read_html ...\r\n        "}, {"title": "Visual bug in Safari using jQuery Mobile - Content duplication", "det": "\r\n            I'm building a mobile app using jQuery Mobile 1.3.0, EaselJs 0.6.0 and TweenJs 0.4.0.\n\nSo, when I load the page, some content gets visually duplicated. The DIVs are not really duplicated, it is just ...\r\n        "}, {"title": "Saving child collections with OrmLite on Android with objects created from Jackson", "det": "\r\n            I have a REST service which I'm calling from my app, which pulls in a JSON object as a byte[] that is then turned into a nice nested collection of objects -- all of that bit works fine. What I then ...\r\n        "}, {"title": "Transitions with GStreamer Editing Services freezes, but works OK without transitions", "det": "\r\n            I'm trying to use gstreamer's GStreamer Editing Services to concatenate 2 videos, and to have a transition between the two.\n\nThis command, which just joins 2 segments of the videos together without a ...\r\n        "}, {"title": "Cannot log-in to rstudio-server", "det": "\r\n            I have previously successfully installed rstudio-server with brew install rstudio-server on a Mac OS X 10.11.4.\n\nNow, I am trying to login to rstudio-server 0.99.902 without success. From the client ...\r\n        "}, {"title": "How to implement Isotope with Pagination", "det": "\r\n            I am trying to implement isotope with pagination on my WordPress site (which obviously is a problem for most people). I've come up with a scenario which may work if I can figure a few things out.\n\nOn ...\r\n        "}, {"title": "Input range slider not working on iOS Safari when clicking on track", "det": "\r\n            I have a pretty straight-forward range input slider.  It works pretty well on all browsers except on iOS Safari.  \n\nThe main problem I have is when I click on the slider track, it doesn't move the ...\r\n        "}, {"title": "Could not load IOSurface for time string. Rendering locally instead swift 4", "det": "\r\n            Could you help me with this problem when I running my project : \r\n  Could not load IOSurface for time string. Rendering locally instead\r\nI don't know what is going on with my codding ..... pleas help ....\r\n        "}, {"title": "Creating multiple aliases for the same QueryDSL path in Spring Data", "det": "\r\n            I have a generic Spring Data repository interface that extends QuerydslBinderCustomizer, allowing me to customize the query execution.  I am trying to extend the basic equality testing built into the ...\r\n        "}, {"title": "React Native WebView html <select> not opening options on Android tablets", "det": "\r\n            I am experiencing a very strange problem in React Native's WebView with HTML <select> tags on Android tablets.\n\nFor some reason, tapping on the rendered <select> button does not open the ...\r\n        "}, {"title": "In Xamarin.Forms Device.BeginInvokeOnMainThread() doesn\u2019t show message box from notification callback *only* in Release config on physical device", "det": "\r\n            I'm rewriting my existing (swift) iOS physical therapy app \"On My Nerves\" to Xamarin.Forms. It's a timer app to help people with nerve damage (like me!) do their desensitization exercises. You have ...\r\n        "}, {"title": "iOS 11: ATS (App Transport Security) no longer accepts custom anchor certs?", "det": "\r\n            I am leasing a self signed certificate using NSMutableURLRequest and when the certificate is anchored using a custom certificate with SecTrustSetAnchorCertificates IOS 11 fails with the following ...\r\n        "}, {"title": "What is an appropriate type for smart contracts?", "det": "\r\n            I'm wondering what is the best way to express smart contracts in typed languages such as Haskell or Idris (so you could, for example, compile it to run on the Ethereum network). My main concern is: ...\r\n        "}, {"title": "USB bulkTransfer between Android tablet and camera", "det": "\r\n            I would like to exchange data/commands between a camera and an Android tablet device using the bulkTransfer function. I wrote this Activity, but the method bulkTransfer returns -1 (error status). Why ...\r\n        "}, {"title": "ember-cli-code-coverage mocha showing 0% coverage when there are tests", "det": "\r\n            I'm using ember-cli-code-coverage with ember-cli-mocha. When I run COVERAGE=true ember test I'm getting 0% coverage for statements, functions, and lines. Yet, I have tests that are covering those ...\r\n        "}, {"title": "SNIReadSyncOverAsync Performance issue", "det": "\r\n            Recently I used dot Trace profiler to find the bottle necks in my application.\n\nSuddenly I have seen that in most of the places which is taking more time and more cpu usage too is ...\r\n        "}, {"title": "IOS: Text Selection in WKWebView (WKSelectionGranularityCharacter)", "det": "\r\n            I've got an app that uses a web view where text can be selected. It's long been an annoyance that you can't select text across a block boundary in UIWebView.  WKWebView seems to fix this with a ...\r\n        "}, {"title": "Creating a shadow copy using the \u201cBackup\u201d context in a PowerShell", "det": "\r\n            I am in the process of writing a PowerShell script for backing up a windows computer using rsync. To this end, I am attempting to use WMI from said script to create a non-persistent Shadow copy with ...\r\n        "}, {"title": "`std::variant` vs. inheritance vs. other ways (performance)", "det": "\r\n            I'm wondering about std::variant performance. When should I not use it? It seems like virtual functions are still much better than using std::visit which surprised me!\n\nIn \"A Tour of C++\" Bjarne ...\r\n        "}, {"title": "Resources, scopes, permissions and policies in keycloak", "det": "\r\n            I want to create a fairly simple role-based access control system using Keycloak's authorizaion system. The system Keycloak is replacing allows us to create a \"user\", who is a member of one or more \"...\r\n        "}, {"title": "iOS Internal testing - Unable to download crash information?", "det": "\r\n            I have recently uploaded my app to the App Store for internal testing (TestFlight, iOS 8). I am currently the only tester. When I test using TestFlight, my app crashes; however, the same operation ...\r\n        "}, {"title": "Traversing lists and streams with a function returning a future", "det": "\r\n            Introduction\n\nScala's Future (new in 2.10 and now 2.9.3) is an applicative functor, which means that if we have a traversable type F, we can take an F[A] and a function A => Future[B] and turn them ...\r\n        "}, {"title": "Symfony2: how to get all entities of one type which are marked with \u201cEDIT\u201d ACL permission?", "det": "\r\n            Can someone tell me how to get all entities of one type which are marked with \"EDIT\" ACL permission?\n\nI would like to build a query with the Doctrine EntityManager.\r\n        "}, {"title": "How can we calculate \u201cflex-basis: auto & min-width\u201d and \u201cwidth at cross axis\u201d?", "det": "\r\n            I want to know how flex-basis: auto & min-width: 0 and width: auto is calculated (width is not set for parent element and flex item) . Therefore, I confirmed the specification of W3C. Is my ...\r\n        "}, {"title": "Rendering Angular components in Handsontable Cells", "det": "\r\n            In a project of mine I try to display Angular Components (like an Autocomplete Dropdown Search) in a table. Because of the requirements I have (like multi-selecting different cells with ctrl+click) I ...\r\n        "}, {"title": "Getting Symbols from debugged process MainModule", "det": "\r\n            I started writing a debugger in C#, to debug any process on my operating system. For now, it only can handle breakpoints (HW, SW, and Memory), but now I wanted to show the opcode of the process.\n\nMy ...\r\n        "}, {"title": "@Transactional in super classes not weaved when using load time weaving", "det": "\r\n            The project I am working on has a similar structure for the DAOs to the one bellow:\n\n/** \n* Base DAO class\n*/\n@Transactional    \npublic class JPABase {\n\n  @PersistenceContext\n  private EntityManager ...\r\n        "}, {"title": "Alert, confirm, and prompt not working after using History API on Safari, iOS", "det": "\r\n            After calling history.pushState in Safari on iOS, it's no longer possible to use alert(), confirm() or prompt(), when using the browser back button to change back.\n\nIs this an iOS bug? Are there any ...\r\n        "}, {"title": "ExoPlayer AudioTrack Stuttering", "det": "\r\n            I have my own implementation of TrackRenderer for a mp3 decoder, that I integrated. When a lollipop device goes to standby and comes back, its not always repeatable but the audio starts to stutter ...\r\n        "}, {"title": "How to diagnose COM-callable wrapper object creation failure?", "det": "\r\n            I am creating a COM object (from native code) using CoCreateInstance:\n\nconst \n   CLASS_GP2010: TGUID = \"{DC55D96D-2D44-4697-9165-25D790DD8593}\";\n\nhr = CoCreateInstance(CLASS_GP2010, nil, ...\r\n        "}, {"title": "libMobileGestalt MobileGestalt.c:890: MGIsDeviceOneOfType is not supported on this platform", "det": "\r\n            I am using Xcode 9 I kept getting this error when I load my app \r\n  libMobileGestalt MobileGestalt.c:890: MGIsDeviceOneOfType is not supported on this platform.\r\nHow to stop that?\r\n        "}, {"title": "How to add a builtin function in a GCC plugin?", "det": "\r\n            It is possible for a GCC plugin to add a new builtin function? If so, how to do it properly?\n\nGCC version is 5.3 (or newer). The code being compiled and processed by the plugin is written in C.\n\nIt is ...\r\n        "}, {"title": "Chain is null when retrieving private key", "det": "\r\n            I'm encrypting data in my app using a RSA keypair that I am storing in the Android keystore.\n\nI've been seeing NullPointerExceptions in the Play Store, but I have not been able to reproduce them:\n\n...\r\n        "}, {"title": "Managing the lifetimes of garbage-collected objects", "det": "\r\n            I am making a simplistic mark-and-compact garbage collector. Without going too much into details, the API it exposes is like this:\n\n/// Describes the internal structure of a managed object.\npub struct ...\r\n        "}, {"title": "Sneaking lenses and CPS past the value restriction", "det": "\r\n            I'm encoding a form of van Laarhoven lenses in OCaml but am having difficulty due to the value restriction.\n\nThe relevant code is as follows\n\nmodule Optic : sig\n  type (-'s, +'t, +'a, -'b) t\n  val ...\r\n        "}]
...