需要从1个很长的文本文件行中替换13个空格

问题描述 投票:2回答:2

我有一个文件(1.8 Mb),有一行(非常长)的文本行。该行上的值通常用13个空格分隔。我想要做的是用管道替换这13个空格分隔符,以便我可以使用SSIS处理此文本文件。

到目前为止,我没有成功使用批处理文件以编程方式处理此文件。

我尝试使用下面的代码,我从另一个SO帖子。

    @echo off 
REM create empty file:
break>R1.txt
setlocal enabledelayedexpansion
REM prevent empty lines by adding line numbers (find /v /n "")
REM parse the file, taking the second token (*, %%b) with delimiters
REM ] (to eliminate line numbers) and space (to eliminate leading spaces)
for /f "tokens=1,* delims=] " %%a in ('find /v /n "" ^<PXZP_SND_XZ01_GFT10553.dat') do (
  call :sub1 "%%b"
  REM write the string without quotes:
  REM removing the qoutes from the string would make the special chars poisonous again
  >>PXZP_SND_XZ01_GFT10553.dat echo(!s:"=!
)

REM Show the written file:
type PXZP_SND_XZ01_GFT10553.dat 
goto :eof

:sub1
set S=%*
REM do 13 times (adapt to your Needs):
for /l %%i in (1,1,13) do (
  REM replace "space qoute" with "quote" (= removing the last space
  set S=!S: "=|!
)
goto :eof

有人可以帮我吗?我的文本文件示例:

96859471/971 AAAA HAWAII               96860471/971 BBBB HAWAII               96861471/971 CCCC HAWAII               96863471/971 DDDD HAWAII               
batch-file cmd str-replace
2个回答
2
投票

使用适当的工具。

Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Outp.Write Replace(Inp.ReadAll, "             ", "|")

使用

cscript //nologo "C:\Replace13Spaces.vbs" < "c:\folder\inputfile.txt" > "C:\Folder\Outputfile.txt"

使用正则表达式用条形替换2个或更多空格。

Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Set regEx = New RegExp
regEx.Pattern = "\s{2,}"
regEx.IgnoreCase = True
regEx.Global = True
Outp.Write regEx.Replace(Inp.ReadAll, "|")

还有另外两种方法可以解决这个问题。

  1. 像第一种方式是从最长到最短的预定义空间多次qazxsw poi。 IE 13,10,8或5个空格。
  2. replace刺激2个空间。 Split数组排除空数组元素。然后Filter阵列与Join作为分隔符。

4
投票

|无法处理超过约8190个字符的行。但是,有一种方法可以读取较长行的文件:在循环中使用for /F loop,与set /P一起使用; input redirection <读取最多1023个字符,除非遇到换行符或文件末尾;为同一个打开(输入重定向)文件句柄多次执行它允许读取1023个字符部分的非常长的行,因为set /P不重置文件指针。

另一个挑战是返回(回波)非常长的行,由于约8190个字符的行限制(适用于命令行和可变内容),set /P再次无法实现。此处块处理也有帮助:首先,获取文件结束字符(EOF,ASCII 0x1A);然后取一个文本/字符串部分,附加一个EOF并使用echo command(附加一个换行符)和echo将结果写入临时文件;接下来使用output redirection >将文件复制到自身,但是在ASCII文本模式下读取它以丢弃EOF和之后的所有内容(因此之前由copy附加的换行符)并以二进制模式写入以获得结果数据的精确副本;最后,使用echo输入文件内容。

以下脚本使用这些技术(请参阅代码中的所有解释性type备注):

rem

存在以下限制:

  • 两个连续搜索字符串之间的字符串部分(=上述方法中的5×SPACE)必须短于约8190个字符;
  • 搜索字符串不能为空,不得以@echo off setlocal EnableExtensions DisableDelayedexpansion rem // Define constants here: set "_INPUT=.\PXZP_SND_XZ01_GFT10553.dat" & rem // (this is the input file) set "_OUTPUT=.\R1.txt" & rem // (set to `con` to display the result on the console) set "_TEMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (specifies a temporary file) set "_SEARCH= " & rem // (this is the string to be found) set "_REPLAC=|" & rem // (this is the replacement string) set "_LTRIM=#" & rem // (set to something to left-trim sub-strings) (set _LF=^ %= blank line =% ) & rem // (this block stores a new-line character in a variable) rem // This stores an end-of-file character in a variable: for /F %%E in ('forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x1A"') do set "_EOF=%%E" rem /* The input file is going to be processed in a sub-routine, rem which accesses the file content via input redirection `<`: */ < "%_INPUT%" > "%_OUTPUT%" call :PROCESS endlocal exit /B :PROCESS rem // Reset variables that store a partial string to be processed and a separator: set "PART=" & set "SEP=" setlocal EnableDelayedExpansion :READ rem /* At this point 1023 characters are read from the input file at most, until rem a line-break or the end of the file is encountered:*/ set "NEW=" & set /P NEW="" rem // The read characters are appended to a string buffer that will be processed: set "PART=!PART!!NEW!" rem /* Skip processing when the string buffer is empty, which is the case when the end rem of the file has already been reached: */ :LOOP if defined PART ( rem /* Make the search string accessible as a `for` meta-variable reference in rem to not have to use normal (immediate) `%`-expansion, which could cause rem trouble with some special characters under some circumstances: */ for /F delims^=^ eol^= %%K in ("!_SEARCH!") do ( rem /* Try to split the string buffer at the first search string and store the rem portion at the right, using sub-string substitution: */ set "RIGHT=!PART:*%%K=!" rem /* Check whether the split was successful, hence whether a search string rem even occurred in the string buffer; if not, jump back and read more rem characters; otherwise (when the end of the file was reached) clear the rem right portion and continue processing: */ if "!RIGHT!"=="!PART!" if not defined NEW (set "RIGHT=") else goto :READ rem /* Clear the variable that will receive the portion left to the first rem occurrence of the search string in the string buffer; then replace each rem occurrence in the string buffer by a new-line character: */ set "LEFT=" & set ^"PART=!PART:%%K=^%_LF%%_LF%!^" rem /* Iterate over all lines of the altered string buffer, which is now a rem multi-line string, then get the first line, which constitutes the rem portion at the left of the first search string; the (first) line is rem preceded by an `_` just for it not to appear blank, because `for /F` rem skips over empty lines; this character is removed later: */ for /F delims^=^ eol^= %%L in (^"_!PART!^") do ( rem // Execute the loop body only for the first iteration: if not defined LEFT ( rem /* Store the (augmented) left portion with delayed expansion rem disabled in order not to get trouble with `!` in the string: */ setlocal DisableDelayedExpansion & set "LEFT=%%L" rem // Enable delayed expansion to be able to safely echo the string: setlocal EnableDelayedExpansion rem /* Write to a temporary file the output string, which consists of rem a replacement string (except for the very first time), the left rem portion with the preceding `_` removed and an end-of-file rem character; a line-break is automatically appended by `echo`: */ > "!_TEMPF!" echo(!SEP!!LEFT:~1!%_EOF% rem /* Copy the temporary file onto itself, but remove the end-of-file rem character and everything after, then type the file content; rem this is a safe way of echoing a string without a line-break: */ > nul copy /Y /A "!_TEMPF!" + nul "!_TEMPF!" /B & type "!_TEMPF!" rem /* Restore the environment present at the beginning of the loop rem body, then ensure the left portion not to appear empty: */ endlocal & endlocal & set "LEFT=_" ) ) rem // If specified, left-trim the right portion, so remove leading spaces: if defined _LTRIM ( for /F "tokens=* eol= delims= " %%T in ("!RIGHT!_") do ( for /F delims^=^ eol^= %%S in (^""!NEW!"^") do ( endlocal & set "NEW=%%~S" & set "RIGHT=%%T" ) setlocal EnableDelayedExpansion & set "RIGHT=!RIGHT:~,-1!" ) ) rem // Set the replacement string now to skip it only for the first output: set "SEP=!_REPLAC!" rem /* Move the right portion into the string buffer; if there is still some rem amount of text left, jump back to find more occurrences of the search rem string; if not, jump back and read more characters, unless the end of rem the file has already been reached: */ set "PART=!RIGHT!" & if defined PART ( if defined NEW if "!PART:~1024!"=="" goto :READ goto :LOOP ) else if defined NEW goto :READ ) ) endlocal rem // Clean up the temporary file: del "%_TEMPF%" exit /B !*开头,且不得包含~;
  • 替换字符串不得包含=;
© www.soinside.com 2019 - 2024. All rights reserved.