有个小需求,批量分析一批pdf文件的总页数,写(找)了个小函数,如下:
function getPageTotal($path){ if (!$fp = @fopen($path, 'r')) { return false; } $max=0; while(!feof($fp)) { $line = fgets($fp,255); if (preg_match('/\/Count [0-9]+/', $line, $matches)){ preg_match('/[0-9]+/',$matches[0], $matches2); if ($max<$matches2[0]) $max=$matches2[0]; } } fclose($fp); return $max; }
实际使用中,发现有极个别文件识别失败,取样分析后,发现出问题的文件都是mac格式的,那自然就是fgets对mac换行符的识别问题了
查了下手册: fgets
Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.
auto_detect_line_endings 这个运行时配置到是从没注意过,ok、稍作修改
function getPageTotal($path){ ini_set("auto_detect_line_endings",true); if (!$fp = @fopen($path, 'r')) { return false; } $max=0; while(!feof($fp)) { $line = fgets($fp,255); if (preg_match('/\/Count [0-9]+/', $line, $matches)){ preg_match('/[0-9]+/',$matches[0], $matches2); if ($max<$matches2[0]) $max=$matches2[0]; } } fclose($fp); return $max; }
问题解决!
关于auto-detect-line-endings的解释:
When turned on, PHP will examine the data read by fgets() and file() to see if it is using Unix, MS-Dos or Macintosh line-ending conventions.
This enables PHP to interoperate with Macintosh systems, but defaults to Off, as there is a very small performance penalty when detecting the EOL conventions for the first line, and also because people using carriage-returns as item separators under Unix systems would experience non-backwards-compatible behaviour.
Note: This configuration option was introduced in PHP 4.3.0