Powershell - 将日期时间字符串解析为日期时间对象

问题描述 投票:0回答:2

我正在尝试解析文件并将字符串日期转换为美国文化中的对象日期时间,但是 我收到错误 ArgumentOutOfRangeException。

我陷入了两个错误:

  • a)需要月份,但看起来我无法使用“a”UFormat。 (我可以丢弃字符串格式“Wed”、“Sun”、“Sat”中的日期以使脚本更容易)
  • b) 需要将“现在”替换为“”
BARCODE     LOCATION    LIBRARY            STORAGEPOLICY                     RETAIN UNTILL DATE       
-------     --------    -------            -------------                     ------------------       
L40065L8    IEPort1     DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_WEEK    Wed Mar 31 10:13:07 2021 
L40063L8    slot 1      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_MONTH   Sun Mar  6 22:34:39 2022 
L40072L8    slot 5      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_ANNUAL  now                      
L40071L8    slot 6      DRP_TAPE_DRPLTO                                      now                      
L40070L8    slot 7      DRP_TAPE_DRPLTO                                      now                      
L40064L8    slot 8      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_MONTH   Sat Mar 19 11:10:37 2022

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
    return [PSCustomObject]@{
        BARCODE  = $_.Substring(0,8).Trim()
        LOCATION = $_.Substring(12,8).Trim()
        LIBRARY = $_.Substring(24,15).Trim()
        STORAGEPOLICY = $_.Substring(44,33).Trim()
        RETAINUNTIL = [datetime]::ParseExact($_.Substring(78,25).Trim()), "a dd hh:mm:ss yyyy", [Globalization.CultureInfo]::CreateSpecificCulture('en-US'))
    }
}

$objects

有人可以帮助我吗?

powershell date datetime parsing
2个回答
3
投票

正如@Matt在评论中提到的,你的问题的第一部分是数据格式 - 当你使用

Substring(78, 25)
时,你依赖于正确的确切列宽,在你的数据的情况下看起来是不正确...

PS> $line = "L40065L8    IEPort1     DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_WEEK    Wed Mar 31 10:13:07 2021 "
PS> $line.Substring(78)
ed Mar 31 10:13:07 2021

给出

ed Mar 31 10:13:07 2021 
,而不是您可能期望的
Wed Mar 31 10:13:07 2021 

如果您可以,最好将数据格式更改为例如csv 或 json,这样您就可以更轻松地提取字段,但如果您不能这样做,您可以尝试动态计算列宽 - 例如:

$columns = [regex]::Matches($lines[1], "-+").Index;
# 0
# 12
# 24
# 43
# 77

这基本上找到了每个“------”标题下划线的开始位置,然后你可以执行以下操作:

$objects = $lines | % {
    return [PSCustomObject] @{
        BARCODE  = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
        LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
        LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
        STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
        RETAINUNTIL = [datetime]::ParseExact(
            $_.Substring($columns[4]).Trim(),
            "a dd hh:mm:ss yyyy",
            [Globalization.CultureInfo]::CreateSpecificCulture("en-US")
        )
    }
}

除了现在,我们收到此错误:

Exception calling "ParseExact" with "3" argument(s): "String 'Wed Mar 31 10:13:07 2021' was not recognized as a valid DateTime."

我们可以用以下方法修复:

[datetime]::ParseExact(
   "Wed Mar 31 10:13:07 2021",
   "ddd MMM dd HH:mm:ss yyyy",
   [Globalization.CultureInfo]::CreateSpecificCulture("en-US")
)
# 31 March 2021 10:13:07

但是你也有这个日期格式:

Sun Mar  6 22:34:39 2022

(当日期部分是个位数时,两个空格)

所以我们需要使用 this 重载

ParseExact
来代替以允许两种格式:

[datetime]::ParseExact(
   "Sun Mar  6 22:34:39 2022",
   [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
   [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
   "None"  
)

然后我们需要允许文字字符串

now
,所以你的最终代码变成:

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$columns = [regex]::Matches($lines[1], "-+").Index;

$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
    return [PSCustomObject] @{
        BARCODE  = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
        LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
        LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
        STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
        RETAINUNTIL = if( $_.Substring($columns[4]).Trim() -eq "now" ) {
            " " } else {
            [datetime]::ParseExact(
                $_.Substring($columns[4]).Trim(),
                [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
                [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
                "None"
            )
        }
    }
}

$objects | ft

#BARCODE  LOCATION LIBRARY         STORAGEPOLICY                    RETAINUNTIL
#-------  -------- -------         -------------                    -----------
#L40065L8 IEPort1  DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK   31/03/2021 10:13:07
#L40063L8 slot 1   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH  06/03/2022 22:34:39
#L40072L8 slot 5   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_ANNUAL
#L40071L8 slot 6   DRP_TAPE_DRPLTO
#L40070L8 slot 7   DRP_TAPE_DRPLTO
#L40064L8 slot 8   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH  19/03/2022 11:10:37

更新

受到 mklement0 的回答的启发,为您的文件格式提供一个通用解析器可能会很有用 - 这会返回一组具有与文件头匹配的属性的 pscustomobjects:

function ConvertFrom-MyFormat
{

    param
    (
        [Parameter(Mandatory=$true)]
        [string[]] $Lines
    )

    # find the positions of the underscores so we can access each one's index and length
    $matches = [regex]::Matches($Lines[1], "-+");

    # extract the header names from the first line using the 
    # positions of the underscores in the second line as a cutting guide
    $headers = $matches | foreach-object {
        $Lines[0].Substring($_.Index, $_.Length);
    }

    # process the data lines and return a custom objects for each one.
    # (the property names will match the headers)
    $Lines | select-object -Skip 2 | foreach-object {
        $line = $_;
        $values = [ordered] @{};
        0..($matches.Count-2) | foreach-object {
            $values.Add($headers[$_], $line.Substring($matches[$_].Index, $matches[$_+1].Index - $matches[$_].Index));
        }
        $values.Add($headers[-1], $line.Substring($matches[-1].Index));
        new-object PSCustomObject -Property $values;
    }

}

然后你的主代码就变成了清理和重组该函数结果的情况:

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$objects = ConvertFrom-MyFormat -Lines $lines | foreach-object {
    return new-object PSCustomObject -Property ([ordered] @{
        BARCODE = $_.BARCODE.Trim()
        LOCATION = $_.LOCATION.Trim()
        LIBRARY = $_.LIBRARY.Trim()
        STORAGEPOLICY = $_.STORAGEPOLICY.Trim()
        RETAINUNTIL = if( $_."RETAIN UNTILL DATE".Trim() -eq "now" ) {
            " " } else {
            [datetime]::ParseExact(
                $_."RETAIN UNTILL DATE".Trim(),
                [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
                [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
                "None"
            )
        }
    })
}

$objects | ft;

2
投票

mclayton 的有用答案提供了很好的解释和有效的解决方案。

让我用一种方法来补充它:

  • 一般解析固定列宽的输入文件
  • 假设可以从分隔线(第二行)可靠地推断出列宽,这样相邻列分隔符之间的每个子字符串(例如
    -------
    )表示一列。

注意:该代码需要 PowerShell (Core) 7,但也可以进行调整以在 Windows PowerShell 中工作。

$sepChar = '-' # The char. used on the separator line to indicate column spans.
Get-Content c:\temp\qmedia.txt | ForEach-Object {
  $line = $_
  switch ($_.ReadCount) {
    1 { 
      # Header line: save for later analysis
      $headerLine = $line
      break
    } 
    2 { 
      # Separator line: it is the only reliable indicator of column width.
      # Construct a regex that captures the column values.
      # With the sample input's separator line, the resulting regex is:
      #     (.{12})(.{12})(.{19})(.{34})(.*)
      # Note: Syntax requires PowerShell 7
      $reCaptureColumns = 
        $line -replace ('{0}+[^{0}]*' -f [regex]::Escape($sepChar)), 
                       { 
                         if ($_.Index + $_.Value.Length -lt $line.Length) { "(.{$($_.Value.Length)})" }
                         else { '(.*)'}
                       }
      # Break the header line into column names.
      if ($headerLine -notmatch $reCaptureColumns) { Throw "Unexpected header line format: $headerLine" }
      # Save the array of column names.
      $columnNames = $Matches[1..($Matches.Count - 1)].Trim()
      break
    }
    default {
      # Data line:
      if ($line -notmatch $reCaptureColumns) { Throw "Unexpected line format: $line" }
      # Construct an ordered hashtable from the column values.
      $oht = [ordered] @{ }
      foreach ($ndx in 1..$columnNames.Count) {
        $oht[$columnNames[$ndx-1]] = $Matches[$ndx].Trim()
      }
      [pscustomobject] $oht # Convert to [pscustomobject] and output.
    }
  }
}

上面输出了一个

[pscustomobject]
实例流,它允许进行健壮、方便的进一步处理,例如您需要的日期解析,如 mclayton 的答案所示(当然,您可以将此处理直接集成到上面的代码中,但是我想单独展示固定宽度解析解决方案)。

© www.soinside.com 2019 - 2024. All rights reserved.