从文本文件绘制非均匀时间序列数据

问题描述 投票:0回答:1

这个问题是如何读取 .txt 文件来绘制图表的后续问题。

我有一个包含以下格式的时间序列数据的文件:

00:01:28,102,103,103 20-03-2024
00:02:16,111,110,110
00:02:33,108,109,109
00:02:49,107,108,108
24 hours read....
23:58:54,111,112,112
23:59:11,109,110,110
23:59:47,115,116,117
00:00:04,115,116,116 21-03-2024
00:00:20,121,122,120
00:00:36,124,125,125
24 hours read...
23:59:02,115,115,116
23:59:19,114,114,114
23:59:51,113,114,115
00:00:07,113,114,115 22-03-2024
00:00:24,116,117,115
00:00:45,115,115,116
24 hours read
23:59:08,101,101,100
23:59:32,103,103,102
23:59:48,102,102,102
Next day 

每行包含一个时间戳、三个数字读数,有时还包含一个指示新一天开始的日期。我正在尝试使用 pandas 和 matplotlib 绘制这些数据,但遇到两个主要问题:x 轴标签(小时)重叠和绘图加载缓慢。

这是我目前的绘图方法:

plt.figure(figsize=(15,9))
plt.xlabel('Day')
plt.ylabel('Voltage')
# Plot three series from the data
plt.plot(C0Temp, C1Temp, label="Voltage", color=LineColorTemp1Text)
plt.plot(C2Temp, C3Temp, label="Max", color='r')
plt.plot(C4Temp, C5Temp, label="Min", color='g')
plt.legend()

# Attempt to format x-axis to handle daily data
locator = mdates.AutoDateLocator(minticks=12, maxticks=24)
plt.gcf().axes[0].xaxis.set_major_locator(locator)
plt.xticks(rotation=45)

我正在寻找有关如何有效地逐日甚至跨月绘制这些数据的指导,确保 x 轴标签可读且绘图有效加载。

python pandas matplotlib time-series txt
1个回答
1
投票

由于文本文件格式不统一,需要逐行解析。此方法允许处理数据表示的变化,例如某些行上是否存在日期以及包含非数据行(例如,“24 小时阅读...”和“第二天”)。通过读取每一行,脚本区分数据条目和元数据或注释,确保只处理相关信息。尽管文件最初不规则,但这种方法准备了用于分析和可视化的结构化数据集。

我的建议是标准化测量输出格式。

解析文件

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

# Initialize variables
timestamps = []
values1 = []
values2 = []
values3 = []
current_date = None

# Implement parsing logic to accurately handle the lines with and without dates
# 00_test.txt is the data from the OP in a text file
with open('00_test.txt', "r") as file:
    for line in file:
        line = line.strip()

        if not line or "hours read" in line or "Next day" in line:
            continue  # Skip non-data lines
        
        parts = line.split(',')
        if len(parts) == 4 and parts[-1].count('-') == 2:  # Checking if the last part is a date
            # Extract date from the last part
            time, val1, val2, val3, date = parts[0], parts[1], parts[2], parts[3].split(' ')[0], parts[3].split(' ')[1]
            current_date = pd.to_datetime(date, format="%d-%m-%Y")
        else:
            # Process data lines without a date
            time, val1, val2, val3 = parts[0], parts[1], parts[2], parts[3]
            if current_date:  # Ensure a date has been set
                datetime_str = f"{current_date.date()} {time}"
                datetime_obj = pd.to_datetime(datetime_str, format="%Y-%m-%d %H:%M:%S")
                timestamps.append(datetime_obj)
                values1.append(float(val1))
                values2.append(float(val2))
                values3.append(float(val3))

创建数据框

# Ensure the DataFrame is created outside the loop
df = pd.DataFrame({'DateTime': timestamps, 'Value1': values1, 'Value2': values2, 'Value3': values3})
df.set_index('DateTime', inplace=True)

剧情

该图显示一个数据框,其中每个数据点都有标记,设置为特定大小并标记轴。 x 轴上的主要刻度线以“Y-m-d”格式显示日期,次要刻度线表示指定范围内每 4 小时一次的时间。主要刻度标签旋转 90 度并居中,而次刻度标签保持水平并居中。该图以主要间隔和次要间隔的网格线为特色,其样式不同以区分日期和时间。为了清晰起见,布局进行了调整,适应旋转标签以获得更好的可见性。

许多问题已经解决了使用 pandas DataFrames 进行绘图以及格式化 pandas DataFrame 的日期时间 x 轴的问题。我鼓励您探索这些资源并根据您的要求调整情节。如需进一步的绘图查询或具体调整,请考虑发布一个新问题并参考现有讨论。

# Plot the DataFrame directly
ax = df.plot(marker='.', figsize=(15, 9), xlabel='Time', ylabel='Voltage')

# Setting the major ticks to display the date in 'Y-m-d' format
ax.xaxis.set_major_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

# Setting the minor ticks to display the time
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour=range(4, 21, 4)))  # Adjust the interval as needed
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%H:%M'))

# Enhance the display for readability
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, ha="center")  # Rotate major ticks for better visibility
plt.setp(ax.xaxis.get_minorticklabels(), rotation=0, ha="center")  # Rotate and right-align minor ticks

ax.xaxis.grid(True, which='major', linestyle='-', linewidth='0.5', color='black')  # Major grid lines
ax.xaxis.grid(True, which='minor', linestyle=':', linewidth='0.5', color='gray')  # Minor grid lines

plt.tight_layout()  # Adjust layout to make room for tick labels
plt.show()


df

                     Value1  Value2  Value3
DateTime                                   
2024-03-20 00:02:16   111.0   110.0   110.0
2024-03-20 00:02:33   108.0   109.0   109.0
2024-03-20 00:02:49   107.0   108.0   108.0
2024-03-20 23:58:54   111.0   112.0   112.0
2024-03-20 23:59:11   109.0   110.0   110.0
2024-03-20 23:59:47   115.0   116.0   117.0
2024-03-21 00:00:20   121.0   122.0   120.0
2024-03-21 00:00:36   124.0   125.0   125.0
2024-03-21 23:59:02   115.0   115.0   116.0
2024-03-21 23:59:19   114.0   114.0   114.0
2024-03-21 23:59:51   113.0   114.0   115.0
2024-03-22 00:00:24   116.0   117.0   115.0
2024-03-22 00:00:45   115.0   115.0   116.0
2024-03-22 23:59:08   101.0   101.0   100.0
2024-03-22 23:59:32   103.0   103.0   102.0
2024-03-22 23:59:48   102.0   102.0   102.0
© www.soinside.com 2019 - 2024. All rights reserved.