simplexml_load_file 返回空对象

问题描述 投票:0回答:1

我正在编写一个脚本,通过 RSS feed 在 SEC Edgar 上查找一些表格。 RSS feed 链接没问题,但是当我尝试 print_r 它时,用 simplexml_load_file() 解析没有显示任何对象

错误代码:

// Parse the RSS feed
$feed = simplexml_load_file($rss_feed_url);`

:-(

我已经为此撕扯了我的头发好几个小时了 这是我的 php 完整代码

<?php
function get_sec_filings_with_phrase($phrase, $days_back = 2) {
    // Define the RSS feed URL
    $rss_feed_url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=8-K&count=100&output=atom";

    // Calculate date range
    $end_date = new DateTime();
    $start_date = (new DateTime())->sub(new DateInterval('P' . $days_back . 'D'))->setTime(0, 0);

    // Parse the RSS feed
    $feed = simplexml_load_file($rss_feed_url);

print_r($feed); // 

    $filings = [];

    // Iterate over entries in the feed
    foreach ($feed->entry as $entry) {
        $entry_date = new DateTime($entry->published);

        // Check if the entry is within the desired date range
        if ($entry_date >= $start_date && $entry_date <= $end_date) {
            // Fetch the content of the filing
            $filing_content = $entry->summary;

            // Check if the phrase is present in the filing content
            if (stripos($filing_content, $phrase) !== false) {
                $filings[] = [
                    "title" => (string)$entry->title,
                    "link" => (string)$entry->link['href'],
                    "date" => $entry_date->format("Y-m-d H:i:s")
                ];
            }
        }
    }

    return $filings;
}

// Keyword
$phrase = "bank";
$filings = get_sec_filings_with_phrase($phrase);

// show results as  HTML
if (!empty($filings)) {
    echo "<table border='1'>";
    echo "<tr><th>Title</th><th>Date</th><th>Link</th></tr>";
    foreach ($filings as $filing) {
        echo "<tr>";
        echo "<td>".$filing['title']."</td>";
        echo "<td>".$filing['date']."</td>";
        echo "<td><a href='".$filing['link']."'>".$filing['link']."</a></td>";
        echo "</tr>";
    }
    echo "</table>";
} else {
    echo "No filings found in the last 48 hours containing the keyword '". $phrase. "'.";
}
?>

RSS 提要示例:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Latest Filings - Thu, 02 May 2024 15:27:03 EDT</title>
<link rel="alternate" href="/cgi-bin/browse-edgar?action=getcurrent"/>
<link rel="self" href="/cgi-bin/browse-edgar?action=getcurrent"/>
<id>https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent</id>
<author><name>Webmaster</name><email>[email protected]</email></author>
<updated>2024-05-02T15:27:03-04:00</updated>
<entry>
<title>8-K - KKR FS Income Trust (0001930679) (Filer)</title>
<link rel="alternate" type="text/html" href="https://www.sec.gov/Archives/edgar/data/1930679/000110465924056303/0001104659-24-056303-index.htm"/>
<summary type="html">
 &lt;b&gt;Filed:&lt;/b&gt; 2024-05-02 &lt;b&gt;AccNo:&lt;/b&gt; 0001104659-24-056303 &lt;b&gt;Size:&lt;/b&gt; 193 KB
&lt;br&gt;Item 8.01: Other Events
&lt;br&gt;Item 9.01: Financial Statements and Exhibits
</summary>
<updated>2024-05-02T15:19:50-04:00</updated>
<category scheme="https://www.sec.gov/" label="form type" term="8-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0001104659-24-056303</id>
</entry>
<entry>
<title>8-K - Catalyst Bancorp, Inc. (0001849867) (Filer)</title>
<link rel="alternate" type="text/html" href="https://www.sec.gov/Archives/edgar/data/1849867/000184986724000015/0001849867-24-000015-index.htm"/>
<summary type="html">
 &lt;b&gt;Filed:&lt;/b&gt; 2024-05-02 &lt;b&gt;AccNo:&lt;/b&gt; 0001849867-24-000015 &lt;b&gt;Size:&lt;/b&gt; 1 MB
&lt;br&gt;Item 2.02: Results of Operations and Financial Condition
&lt;br&gt;Item 5.02: Departure of Directors or Certain Officers; Election of Directors; Appointment of Certain Officers: Compensatory Arrangements of Certain Officers
&lt;br&gt;Item 7.01: Regulation FD Disclosure
&lt;br&gt;Item 9.01: Financial Statements and Exhibits
</summary>
<updated>2024-05-02T15:19:06-04:00</updated>
<category scheme="https://www.sec.gov/" label="form type" term="8-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0001849867-24-000015</id>
</entry>

找不到为什么 simplexml_load_file() 没有返回对象。有线索吗?

php arrays xml-parsing rss
1个回答
0
投票

他们有办法检测自动化工具。但如果你想假装是一个浏览器,你可以设置 UA 并使用 cURL 库。

<?php

// Define the RSS feed URL
$rss_feed_url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=8-K&count=10&output=atom";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $rss_feed_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'My Custom User-Agent/1.0');
$xml = curl_exec($ch);
curl_close($ch);     

$feed = simplexml_load_string($xml);

print_r($feed);
© www.soinside.com 2019 - 2024. All rights reserved.