我正在尝试解析文件并将字符串日期转换为美国文化中的对象日期时间,但是 我收到错误 ArgumentOutOfRangeException。
我陷入了两个错误:
BARCODE LOCATION LIBRARY STORAGEPOLICY RETAIN UNTILL DATE
------- -------- ------- ------------- ------------------
L40065L8 IEPort1 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK Wed Mar 31 10:13:07 2021
L40063L8 slot 1 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH Sun Mar 6 22:34:39 2022
L40072L8 slot 5 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_ANNUAL now
L40071L8 slot 6 DRP_TAPE_DRPLTO now
L40070L8 slot 7 DRP_TAPE_DRPLTO now
L40064L8 slot 8 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH Sat Mar 19 11:10:37 2022
$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")
$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
return [PSCustomObject]@{
BARCODE = $_.Substring(0,8).Trim()
LOCATION = $_.Substring(12,8).Trim()
LIBRARY = $_.Substring(24,15).Trim()
STORAGEPOLICY = $_.Substring(44,33).Trim()
RETAINUNTIL = [datetime]::ParseExact($_.Substring(78,25).Trim()), "a dd hh:mm:ss yyyy", [Globalization.CultureInfo]::CreateSpecificCulture('en-US'))
}
}
$objects
有人可以帮助我吗?
正如@Matt在评论中提到的,你的问题的第一部分是数据格式 - 当你使用
Substring(78, 25)
时,你依赖于正确的确切列宽,在你的数据的情况下看起来是不正确...
PS> $line = "L40065L8 IEPort1 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK Wed Mar 31 10:13:07 2021 "
PS> $line.Substring(78)
ed Mar 31 10:13:07 2021
给出
ed Mar 31 10:13:07 2021
,而不是您可能期望的 Wed Mar 31 10:13:07 2021
。
如果您可以,最好将数据格式更改为例如csv 或 json,这样您就可以更轻松地提取字段,但如果您不能这样做,您可以尝试动态计算列宽 - 例如:
$columns = [regex]::Matches($lines[1], "-+").Index;
# 0
# 12
# 24
# 43
# 77
这基本上找到了每个“------”标题下划线的开始位置,然后你可以执行以下操作:
$objects = $lines | % {
return [PSCustomObject] @{
BARCODE = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
RETAINUNTIL = [datetime]::ParseExact(
$_.Substring($columns[4]).Trim(),
"a dd hh:mm:ss yyyy",
[Globalization.CultureInfo]::CreateSpecificCulture("en-US")
)
}
}
除了现在,我们收到此错误:
Exception calling "ParseExact" with "3" argument(s): "String 'Wed Mar 31 10:13:07 2021' was not recognized as a valid DateTime."
我们可以用以下方法修复:
[datetime]::ParseExact(
"Wed Mar 31 10:13:07 2021",
"ddd MMM dd HH:mm:ss yyyy",
[Globalization.CultureInfo]::CreateSpecificCulture("en-US")
)
# 31 March 2021 10:13:07
但是你也有这个日期格式:
Sun Mar 6 22:34:39 2022
(当日期部分是个位数时,两个空格)
所以我们需要使用 this 重载
ParseExact
来代替以允许两种格式:
[datetime]::ParseExact(
"Sun Mar 6 22:34:39 2022",
[string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM d HH:mm:ss yyyy"),
[Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
"None"
)
然后我们需要允许文字字符串
now
,所以你的最终代码变成:
$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")
$columns = [regex]::Matches($lines[1], "-+").Index;
$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
return [PSCustomObject] @{
BARCODE = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
RETAINUNTIL = if( $_.Substring($columns[4]).Trim() -eq "now" ) {
" " } else {
[datetime]::ParseExact(
$_.Substring($columns[4]).Trim(),
[string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM d HH:mm:ss yyyy"),
[Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
"None"
)
}
}
}
$objects | ft
#BARCODE LOCATION LIBRARY STORAGEPOLICY RETAINUNTIL
#------- -------- ------- ------------- -----------
#L40065L8 IEPort1 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK 31/03/2021 10:13:07
#L40063L8 slot 1 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH 06/03/2022 22:34:39
#L40072L8 slot 5 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_ANNUAL
#L40071L8 slot 6 DRP_TAPE_DRPLTO
#L40070L8 slot 7 DRP_TAPE_DRPLTO
#L40064L8 slot 8 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH 19/03/2022 11:10:37
更新
受到 mklement0 的回答的启发,为您的文件格式提供一个通用解析器可能会很有用 - 这会返回一组具有与文件头匹配的属性的 pscustomobjects:
function ConvertFrom-MyFormat
{
param
(
[Parameter(Mandatory=$true)]
[string[]] $Lines
)
# find the positions of the underscores so we can access each one's index and length
$matches = [regex]::Matches($Lines[1], "-+");
# extract the header names from the first line using the
# positions of the underscores in the second line as a cutting guide
$headers = $matches | foreach-object {
$Lines[0].Substring($_.Index, $_.Length);
}
# process the data lines and return a custom objects for each one.
# (the property names will match the headers)
$Lines | select-object -Skip 2 | foreach-object {
$line = $_;
$values = [ordered] @{};
0..($matches.Count-2) | foreach-object {
$values.Add($headers[$_], $line.Substring($matches[$_].Index, $matches[$_+1].Index - $matches[$_].Index));
}
$values.Add($headers[-1], $line.Substring($matches[-1].Index));
new-object PSCustomObject -Property $values;
}
}
然后你的主代码就变成了清理和重组该函数结果的情况:
$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")
$objects = ConvertFrom-MyFormat -Lines $lines | foreach-object {
return new-object PSCustomObject -Property ([ordered] @{
BARCODE = $_.BARCODE.Trim()
LOCATION = $_.LOCATION.Trim()
LIBRARY = $_.LIBRARY.Trim()
STORAGEPOLICY = $_.STORAGEPOLICY.Trim()
RETAINUNTIL = if( $_."RETAIN UNTILL DATE".Trim() -eq "now" ) {
" " } else {
[datetime]::ParseExact(
$_."RETAIN UNTILL DATE".Trim(),
[string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM d HH:mm:ss yyyy"),
[Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
"None"
)
}
})
}
$objects | ft;
mclayton 的有用答案提供了很好的解释和有效的解决方案。
让我用一种方法来补充它:
-------
)表示一列。注意:该代码需要 PowerShell (Core) 7,但也可以进行调整以在 Windows PowerShell 中工作。
$sepChar = '-' # The char. used on the separator line to indicate column spans.
Get-Content c:\temp\qmedia.txt | ForEach-Object {
$line = $_
switch ($_.ReadCount) {
1 {
# Header line: save for later analysis
$headerLine = $line
break
}
2 {
# Separator line: it is the only reliable indicator of column width.
# Construct a regex that captures the column values.
# With the sample input's separator line, the resulting regex is:
# (.{12})(.{12})(.{19})(.{34})(.*)
# Note: Syntax requires PowerShell 7
$reCaptureColumns =
$line -replace ('{0}+[^{0}]*' -f [regex]::Escape($sepChar)),
{
if ($_.Index + $_.Value.Length -lt $line.Length) { "(.{$($_.Value.Length)})" }
else { '(.*)'}
}
# Break the header line into column names.
if ($headerLine -notmatch $reCaptureColumns) { Throw "Unexpected header line format: $headerLine" }
# Save the array of column names.
$columnNames = $Matches[1..($Matches.Count - 1)].Trim()
break
}
default {
# Data line:
if ($line -notmatch $reCaptureColumns) { Throw "Unexpected line format: $line" }
# Construct an ordered hashtable from the column values.
$oht = [ordered] @{ }
foreach ($ndx in 1..$columnNames.Count) {
$oht[$columnNames[$ndx-1]] = $Matches[$ndx].Trim()
}
[pscustomobject] $oht # Convert to [pscustomobject] and output.
}
}
}
上面输出了一个
[pscustomobject]
实例流,它允许进行健壮、方便的进一步处理,例如您需要的日期解析,如 mclayton 的答案所示(当然,您可以将此处理直接集成到上面的代码中,但是我想单独展示固定宽度解析解决方案)。