如何在.tsv文件中搜索与字符串的多个匹配项并将其导出到数据库?
我想做的是搜索一个名为mdata.tsv
的大文件(1.5m行),以从数组中为其提供一个字符串。然后输出匹配的列数据。
当前的代码就是我所坚持的:
<?php
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
array_push($movID, $xml->id);
}
//Loop through the TSV rows and search for the $tmdbID then print out the movies category.
foreach ($movID as $tmdbID) {
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
fseek($file,0);
$myString = $row[0];
$b = strstr( $myString, $tmdbID );
//Dump out the row for the sake of clarity.
//var_dump($row);
$myString = $row[0];
if ($b == $tmdbID){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}
}
fclose($file);
?>
tsv文件的示例:
tt0043936 movie The Lawton Story The Lawton Story 0 1949 \N \N Drama,Family
tt0043937 short The Prize Pest The Prize Pest 0 1951 \N 7 Animation,Comedy,Family
tt0043938 movie The Prowler The Prowler 0 1951 \N 92 Drama,Film-Noir,Thriller
tt0043939 movie Przhevalsky Przhevalsky 0 1952 \N \N Biography,Drama
似乎您可以通过使用in_array()
而不是嵌套循环来简化此代码,以查看当前行是否在所需ID的列表中。确保此工作有效的一项更改是,您需要确保将字符串存储在$movID
数组中。
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
// Store ID as string
$movID[] = (string) $xml->id;
}
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
if ( in_array($row[0], $movID) ){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}