解析讨论论坛只能让我获得第一个用户评论,但不能获得其他用户回复

问题描述 投票:0回答:1

有人可以帮我一下吗,我似乎无法弄清楚这个问题。 我有一个如下所示的 url 文件列表:

https://community.appian.com/discussions/f/administration/14/integrate-token-device-with-appian
https://community.appian.com/discussions/f/administration/27/how-do-we-configure-enable-appian-tempo
https://community.appian.com/discussions/f/administration/31/how-to-download-get-pdf-of-the-documentation
https://community.appian.com/discussions/f/administration/39/we-need-to-establish-a-single-signon-with-an-outside-web-site-for-which-a-certif
https://community.appian.com/discussions/f/administration/43/is-there-a-way-to-import-an-application-exported-from-appian-6-6-1and-import-in
https://community.appian.com/discussions/f/administration/47/we-are-having-issues-with-oracle-db-integration-for-appian-6-6-1-we-are-install

问题是当尝试抓取它们时我只得到第一个用户评论。

你们知道我怎样才能完成这项工作吗?

尝试使用 BS4 但惨败,我只收到第一个用户评论,而没有收到其他用户回复

这是我正在使用的:

import json
from scrapegraphai.graphs import SmartScraperGraph

def main():
    graph_config = {
        "llm": {
            "model": "ollama/llama3",
            "temperature": 0,
            "base_url": "http://localhost:11434",
            "format": "json",  # Ollama needs the format to be specified explicitly
        },
        "embeddings": {
            "model": "ollama/nomic-embed-text",
        }
    }

    source_urls = []
    with open('cleaned-urls.txt', 'r') as f:
        sources = [line.strip() for line in f if line.strip()]
        source_urls.extend(sources)

    for source_url in source_urls:
        try:
            prompt = "find the best way to extract data, eliminate unneccesary fields and organise to only show the entire conversation and code snippets. make sure to include all text from the conversation and the users answers. always the first text is the question and what follows is from other user replies"
            smart_scraper_graph = SmartScraperGraph(prompt=prompt, source=source_url, config=graph_config)
            result = smart_scraper_graph.run()
            output = json.dumps(result, indent=2)
            print(output)
        except Exception as e:
            print(f"An error occurred: {e}")

if __name__ == "__main__":
    main()
python web-scraping beautifulsoup
1个回答
0
投票

您只能获得第一个用户帖子(基本上是问题)的原因是通过对 API 端点的 XHR 调用在页面中进行回复。 有一种方法可以使用 requests 和 BeautifulSoup 来完全抓取该页面,但是它非常复杂,并且为了最小化复杂性预算,我建议在这种情况下使用 Selenium。

以下是如何使用 Selenium 实现此目的的示例。请记住,此代码无法在无头机器或 Google Colab 上运行,您需要一些额外的软件包才能使其运行。这意味着可以在安装了 Chrome 的标准机器上、在 Jupyter 中运行或作为独立的 Python 文件运行。

由于SoF不是代码编写服务,我只在终端中打印输出。您可以进一步深入了解所选元素并获取时间、作者、清理内容、解析 JSON、将其放入数据帧、将其保存到磁盘等。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,1080")

urls = [
'https://community.appian.com/discussions/f/administration/14/integrate-token-device-with-appian',
'https://community.appian.com/discussions/f/administration/27/how-do-we-configure-enable-appian-tempo',
'https://community.appian.com/discussions/f/administration/31/how-to-download-get-pdf-of-the-documentation',
'https://community.appian.com/discussions/f/administration/39/we-need-to-establish-a-single-signon-with-an-outside-web-site-for-which-a-certif',
'https://community.appian.com/discussions/f/administration/43/is-there-a-way-to-import-an-application-exported-from-appian-6-6-1and-import-in',
'https://community.appian.com/discussions/f/administration/47/we-are-having-issues-with-oracle-db-integration-for-appian-6-6-1-we-are-install'
]
with webdriver.Chrome(options=chrome_options) as driver:
    wait = WebDriverWait(driver, 15)
    for url in urls:
        driver.get(url)
        main_q = json.loads(wait.until(EC.presence_of_element_located((By.XPATH, '//script[@type="application/ld+json"]'))).get_attribute('innerHTML'))
        print(main_q)
        print('____________________________________________________________')
        print('____________________________________________________________')
        replies = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//li[@class="threaded content-item  "]')))
        for reply in replies:
            print(reply.text)
            print('____________________________________________________________')
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')

终端结果:

{'@context': 'https://schema.org', '@type': 'QAPage', 'mainEntity': {'@type': 'Question', 'name': 'Integrate token device with Appian', 'text': 'how can I add login token device? \n OriginalPostID-22130 \n OriginalPostID-22130', 'answerCount': 1, 'upvoteCount': 0, 'dateCreated': '2012-01-21T23:04:00.6170000Z', 'author': {'@type': 'Person', 'name': 'alex.he38'}, 'suggestedAnswer': [{'@type': 'Answer', 'text': 'Yes', 'dateCreated': '2022-02-27T06:57:12.6030000Z', 'upvoteCount': 0, 'url': 'https://community.appian.com/discussions/f/administration/14/integrate-token-device-with-appian/91615', 'author': {'@type': 'Person', 'name': 'kalicharans0001'}}]}}
____________________________________________________________
____________________________________________________________
phillip.russell
Appian Employee
over 12 years ago
Can you provide a little more information about this request? Were you looking at implementing multi-factor authentication, or were you speaking about something different?
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
alex.he38 over 12 years ago
Hi Phillip,
Thanks for your prompt response, I’m bidding for a financial group for 3 banks and one of this bank already using Token device ref# C100 to login users. www.ftsafe.com/.../epass.html
I am organizing a pilot to demo Appian and I would to like demo how Appian could integrate with Token devices as well. Is there any possibility or adapter?
Vote Up
0
Vote Down
Sign in to reply
Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to alex.he38
Appian supports single sign on via SAML. If your identity provider supports these tokens, you should be good to go.
____________________________________________________________
Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to alex.he38
Appian supports single sign on via SAML. If your identity provider supports these tokens, you should be good to go.
____________________________________________________________
kalicharans0001 over 2 years ago
Yes
____________________________________________________________
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'@context': 'https://schema.org', '@type': 'QAPage', 'mainEntity': {'@type': 'Question', 'name': 'How do we configure/Enable Appian Tempo', 'text': 'How do we configure/Enable Appian Tempo ? \n OriginalPostID-23560 \n OriginalPostID-23560', 'answerCount': 1, 'upvoteCount': 0, 'dateCreated': '2012-02-07T17:28:48.6600000Z', 'author': {'@type': 'Person', 'name': 'santoshks'}, 'acceptedAnswer': {'@type': 'Answer', 'text': 'You need to configure a primary datasource. To do this, please review sections 6.1, 6.2 and 6.3 of the following: forum.appian.com/.../Appian_6.6_Windows_Installation_Guide_for_JBoss', 'dateCreated': '2012-02-07T17:32:08.5000000Z', 'upvoteCount': 0, 'url': 'https://community.appian.com/discussions/f/administration/27/how-do-we-configure-enable-appian-tempo/54', 'author': {'@type': 'Person', 'name': 'phillip.russell'}}}}
____________________________________________________________
____________________________________________________________
+1
phillip.russell
Appian Employee
over 12 years ago
You need to configure a primary datasource. To do this, please review sections 6.1, 6.2 and 6.3 of the following: forum.appian.com/.../Appian_6.6_Windows_Installation_Guide_for_JBoss
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'@context': 'https://schema.org', '@type': 'QAPage', 'mainEntity': {'@type': 'Question', 'name': 'How to download/get PDF of the documentation', 'text': 'Hi, I am a newbie to appian. I just downloaded the software for installation. Upon looking for documentation, I only see wiki pages and no zip archive or pdf document that I can download. Am I missing something or is the documentation only available via wiki pages ? Regards Zacharia \n OriginalPostID-23695 \n OriginalPostID-23695', 'answerCount': 1, 'upvoteCount': 0, 'dateCreated': '2012-02-09T00:56:55.0170000Z', 'author': {'@type': 'Person', 'name': 'zacm'}, 'acceptedAnswer': {'@type': 'Answer', 'text': 'In order to keep it updated and available at any time the documentation is only provided via the Documentation section online. This allows you to get the most recent information at any time anywhere.', 'dateCreated': '2012-02-09T01:00:32.7370000Z', 'upvoteCount': -1, 'url': 'https://community.appian.com/discussions/f/administration/31/how-to-download-get-pdf-of-the-documentation/71', 'author': {'@type': 'Person', 'name': 'Eduardo Fuentes'}}}}
____________________________________________________________
____________________________________________________________
+1
Eduardo Fuentes
Appian Employee
over 12 years ago
In order to keep it updated and available at any time the documentation is only provided via the Documentation section online. This allows you to get the most recent information at any time anywhere.
Vote Up
-1
Vote Down
Sign in to reply
____________________________________________________________
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'@context': 'https://schema.org', '@type': 'QAPage', 'mainEntity': {'@type': 'Question', 'name': 'We need to establish a single signon with an outside web site for which a certif', 'text': 'We need to establish a single signon with an outside web site for which a certificate needs to be installed on the server. I'm assuming this is done through JBoss. Has anyone done this and what procedure did you follow? OriginalPostID-23833 OriginalPostID-23833', 'answerCount': 0, 'upvoteCount': 0, 'dateCreated': '2012-02-11T00:18:30.8670000Z', 'author': {'@type': 'Person', 'name': 'craigt'}}}
____________________________________________________________
____________________________________________________________
Jacob Rank
Appian Employee
over 12 years ago
Craig, do you mean that you need to setup SSL for your site? That's enabled via a web server. For an example see forum.appian.com/.../Configuring_Apache_Web_Server_with_JBoss

On the other hand if you need to use the cert for client authentication or as part of the system truststore it would be a completely different configuration involving the JBoss JVM.
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
craigt over 12 years ago
It's not for our site. It's to do a single signon to another website to display some data.
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
Jacob Rank
Appian Employee
over 12 years ago
I wasn't sure if perhaps the SSO solution required your site to be using certain SSL cert. Which SSO provider are you working with?
____________________________________________________________
phillip.russell
Appian Employee
over 12 years ago
Why the need to deploy a certificate? Is it an Intermediate CA that's not being sent in the chain? Do you have OpenSSL installed on the Appian server, and if so, what's the result of running "openssl s_client -connect" against the SSO URL?
____________________________________________________________
craigt over 12 years ago
It's a certificate the site author provided and indicated it needed to be sent when prompted. We don't have OpenSSL currently installed.
____________________________________________________________
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'@context': 'https://schema.org', '@type': 'QAPage', 'mainEntity': {'@type': 'Question', 'name': 'Is there a way to import an application exported from Appian 6.6.1and import  in', 'text': 'Is there a way to import an application exported from Appian 6.6.1and import into Appian 6.6.0? OriginalPostID-24338 OriginalPostID-24338', 'answerCount': 0, 'upvoteCount': 0, 'dateCreated': '2012-02-16T16:49:28.3970000Z', 'author': {'@type': 'Person', 'name': 'Bill'}}}
____________________________________________________________
____________________________________________________________
Eduardo Fuentes
Appian Employee
over 12 years ago
As a best practice you shouldn't be doing this, even though, some basic items can be still compatible and importable by just changing the Appian-Version in the META-INF/MANIFEST.MF to 6.6.0.0 , this approach is not supported and Appian cannot guarantee this import completely fine, which makes sense if we take in consideration that new versions have new features that may not be present/compatible with previous versions.
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
Bill over 12 years ago
I tried your solution and worked with some errors. 36 of 48 imported successfully. I have the import log. How do I post the log?
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
Eduardo Fuentes
Appian Employee
over 12 years ago
Go this post: forum.appian.com/.../3557 and use the "Add Attachments" button; follow these steps: forum.appian.com/.../Uploading_a_Forum_Attachment

Let me know the name of the folder where you uploaded it so I can take a look.
____________________________________________________________
Myles Weber
Appian Employee
over 12 years ago
Basically, this puts the system in an unsupported status. Nobody should be doing this that cares about their production system.
____________________________________________________________
Bill over 12 years ago
I have gone to the discussion, hit add attachment, selected Default Community, Appian KC, Discussion Topic Attachements, Eduardo Fuentes, selected Upload Document, browsed to the file, gave it a description, selected create, It indicated that it worked but I can not find the file to select it an add as an attachment. The file is named import_failure_log.zip. I don't know what I'm doing wrong but it is not being loaded.
____________________________________________________________
Eduardo Fuentes
Appian Employee
over 12 years ago
The import log confirms what Myles and I said; this is not supported because some features are not compatible with previous versions, if you see your import log, you have two problems; the first one is the target envrionment doesn't have a primary data soruce configured, and second, data stores from 6.6.1 are internally different from 6.6.1 therefore they are not importable into an old version of Appian. Please use an installation of 6.6.1 instead.
____________________________________________________________
Bill over 12 years ago
Thanks, That is what I thought. However, I did set up a primary data source. I wonder why it doesn't see it?
____________________________________________________________
Eduardo Fuentes
Appian Employee
over 12 years ago
Although this is definitely not going to solve your problem of using the unsupported approach of importing a 6.6.1 one app in 6.6 you need to make sure you have configured the primary data source correctly to take advantage of the new features that require the data source.

Take a look at the beginning of the application server log, if the configuration is right you will see something like this:

Validating and initializing the primary data source: java:/AppianPrimaryDS
[java:/AppianPrimaryDS] Checking schema and migrating if necessary...
[java:/AppianPrimaryDS] Schema check/migration completed successfully.
...
____________________________________________________________
Eduardo Fuentes
Appian Employee
over 12 years ago
If you see any errors related to your primary data source during JBoss startup (search for your JNDI name in the most recent entries in your log) share them with me so we can see what the issue is.
____________________________________________________________
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'@context': 'https://schema.org', '@type': 'QAPage', 'mainEntity': {'@type': 'Question', 'name': 'We are having issues with Oracle DB integration for Appian 6.6.1. We are install', 'text': 'We are having issues with Oracle DB integration for Appian 6.6.1. We are installing 6.6.1 and reusing the existing Primary DataSource configured for our Appian 6.5.1 installation.But the JBoss App server throwing error saying "Invalid Schema" for the Primary DataSource. As I understand from the documentation, Appian automatically updates the Schema for the newer version if we use an existing schema(for older version). Can someone please provide insight into what might be causing this issue? OriginalPostID-24783 OriginalPostID-24783', 'answerCount': 0, 'upvoteCount': 0, 'dateCreated': '2012-02-21T14:25:44.6330000Z', 'author': {'@type': 'Person', 'name': 'prosenjitd'}}}
____________________________________________________________
____________________________________________________________
Eduardo Fuentes
Appian Employee
over 12 years ago
Is this error thrown on 6.6.1 or when you try to run the old 6.5.1? Once Appian 6.6.1 updates the schema you can't use that one with the old install. If this error is happening on the upgraded environment, check the complete application-server.log, there should be more information on why the schema cannot be updated (e.g. permissions/privileges)
Vote Up
0
Vote Down
Sign in to reply
____________________________________________________________
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Selenium 文档可以在这里找到。

© www.soinside.com 2019 - 2024. All rights reserved.