关于百度OCR服务的文档地址:百度OCR文档
百度的API接口需要进行权限认证。
关于百度的权限认证详情见:百度API权限认证
大致说下百度API接口认证的流程:
百度的API认证字段可以包含在你HTTP请求的Header或是URL中,我采用的是包含在http请求的header中。
在请求头中增加一个字段为Authorization,对应的值为加密后的认证字符串。
关于认证字符串的生成规则如下图:
可以看到分为CanonicalRequest、SigningKey、Signature三大部分 Signature便是通过前面两个通过SHA256加密而来
认证字符串生成规则:bce-auth-v1/{assessKeyId}/{timestamp}/{expirationPeriodInSeconds}/{signedHeaders}/{Signature}
bce-auth-v1:百度认证协议头
assessKeyId:为你的sk
timestamp:是签名生效时间UTC
expirationPeriodInSeconds:签名有效期(默认为1800)
signedHeaders:header部分中需要签名的字段
Signature:签名字符串
有关于详细的生成流程见官方文档 生成认证字符串
百度已经为我们提供了认证字符串生成的demo、并不需要我们手动去写。认证字符串生成
php的demo:
<?php
/*
* Copyright (c) 2014 Baidu.com, Inc. All Rights Reserved
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* Http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/
namespace BaiduBce\Auth;
class SignOption
{
const EXPIRATION_IN_SECONDS = 'expirationInSeconds';
const HEADERS_TO_SIGN = 'headersToSign';
const TIMESTAMP = 'timestamp';
const DEFAULT_EXPIRATION_IN_SECONDS = 1800;
const MIN_EXPIRATION_IN_SECONDS = 300;
const MAX_EXPIRATION_IN_SECONDS = 129600;
}
class HttpUtil
{
// 根据RFC 3986,除了:
// 1.大小写英文字符
// 2.阿拉伯数字
// 3.点'.'、波浪线'~'、减号'-'以及下划线'_'
// 以外都要编码
public static $PERCENT_ENCODED_STRINGS;
//填充编码数组
public static function __init()
{
HttpUtil::$PERCENT_ENCODED_STRINGS = array();
for ($i = 0; $i < 256; ++$i) {
HttpUtil::$PERCENT_ENCODED_STRINGS[$i] = sprintf("%%%02X", $i);
}
//a-z不编码
foreach (range('a', 'z') as $ch) {
HttpUtil::$PERCENT_ENCODED_STRINGS[ord($ch)] = $ch;
}
//A-Z不编码
foreach (range('A', 'Z') as $ch) {
HttpUtil::$PERCENT_ENCODED_STRINGS[ord($ch)] = $ch;
}
//0-9不编码
foreach (range('0', '9') as $ch) {
HttpUtil::$PERCENT_ENCODED_STRINGS[ord($ch)] = $ch;
}
//以下4个字符不编码
HttpUtil::$PERCENT_ENCODED_STRINGS[ord('-')] = '-';
HttpUtil::$PERCENT_ENCODED_STRINGS[ord('.')] = '.';
HttpUtil::$PERCENT_ENCODED_STRINGS[ord('_')] = '_';
HttpUtil::$PERCENT_ENCODED_STRINGS[ord('~')] = '~';
}
//在uri编码中不能对'/'编码
public static function urlEncodeExceptSlash($path)
{
return str_replace("%2F", "/", HttpUtil::urlEncode($path));
}
//使用编码数组编码
public static function urlEncode($value)
{
$result = '';
for ($i = 0; $i < strlen($value); ++$i) {
$result .= HttpUtil::$PERCENT_ENCODED_STRINGS[ord($value[$i])];
}
return $result;
}
//生成标准化QueryString
public static function getCanonicalQueryString(array $parameters)
{
//没有参数,直接返回空串
if (count($parameters) == 0) {
return '';
}
$parameterStrings = array();
foreach ($parameters as $k => $v) {
//跳过Authorization字段
if (strcasecmp('Authorization', $k) == 0) {
continue;
}
if (!isset($k)) {
throw new \InvalidArgumentException(
"parameter key should not be null"
);
}
if (isset($v)) {
//对于有值的,编码后放在=号两边
$parameterStrings[] = HttpUtil::urlEncode($k)
. '=' . HttpUtil::urlEncode((string) $v);
} else {
//对于没有值的,只将key编码后放在=号的左边,右边留空
$parameterStrings[] = HttpUtil::urlEncode($k) . '=';
}
}
//按照字典序排序
sort($parameterStrings);
//使用'&'符号连接它们
return implode('&', $parameterStrings);
}
//生成标准化uri
public static function getCanonicalURIPath($path)
{
//空路径设置为'/'
if (empty($path)) {
return '/';
} else {
//所有的uri必须以'/'开头
if ($path[0] == '/') {
return HttpUtil::urlEncodeExceptSlash($path);
} else {
return '/' . HttpUtil::urlEncodeExceptSlash($path);
}
}
}
//生成标准化http请求头串
public static function getCanonicalHeaders($headers)
{
//如果没有headers,则返回空串
if (count($headers) == 0) {
return '';
}
$headerStrings = array();
foreach ($headers as $k => $v) {
//跳过key为null的
if ($k === null) {
continue;
}
//如果value为null,则赋值为空串
if ($v === null) {
$v = '';
}
//trim后再encode,之后使用':'号连接起来
$headerStrings[] = HttpUtil::urlEncode(strtolower(trim($k))) . ':' . HttpUtil::urlEncode(trim($v));
}
//字典序排序
sort($headerStrings);
//用'\n'把它们连接起来
return implode("\n", $headerStrings);
}
}
HttpUtil::__init();
class SampleSigner
{
const BCE_AUTH_VERSION = "bce-auth-v1";
const BCE_PREFIX = 'x-bce-';
//不指定headersToSign情况下,默认签名http头,包括:
// 1.host
// 2.content-length
// 3.content-type
// 4.content-md5
public static $defaultHeadersToSign;
public static function __init()
{
SampleSigner::$defaultHeadersToSign = array(
"host",
"content-length",
"content-type",
"content-md5",
);
}
//签名函数
public function sign(
array $credentials,
$httpMethod,
$path,
$headers,
$params,
$options = array()
) {
//设定签名有效时间
if (!isset($options[SignOption::EXPIRATION_IN_SECONDS])) {
//默认值1800秒
$expirationInSeconds = SignOption::DEFAULT_EXPIRATION_IN_SECONDS;
} else {
$expirationInSeconds = $options[SignOption::EXPIRATION_IN_SECONDS];
}
//解析ak sk
$accessKeyId = $credentials['ak'];
$secretAccessKey = $credentials['sk'];
//设定时间戳,注意:如果自行指定时间戳需要为UTC时间
if (!isset($options[SignOption::TIMESTAMP])) {
//默认值当前时间
$timestamp = new \DateTime();
} else {
$timestamp = $options[SignOption::TIMESTAMP];
}
$timestamp->setTimezone(new \DateTimeZone("UTC"));
//生成authString
$authString = SampleSigner::BCE_AUTH_VERSION . '/' . $accessKeyId . '/'
. $timestamp->format("Y-m-d\TH:i:s\Z") . '/' . $expirationInSeconds;
//使用sk和authString生成signKey
$signingKey = hash_hmac('sha256', $authString, $secretAccessKey);
//生成标准化URI
$canonicalURI = HttpUtil::getCanonicalURIPath($path);
//生成标准化QueryString
$canonicalQueryString = HttpUtil::getCanonicalQueryString($params);
//填充headersToSign,也就是指明哪些header参与签名
$headersToSign = null;
if (isset($options[SignOption::HEADERS_TO_SIGN])) {
$headersToSign = $options[SignOption::HEADERS_TO_SIGN];
}
//生成标准化header
$canonicalHeader = HttpUtil::getCanonicalHeaders(
SampleSigner::getHeadersToSign($headers, $headersToSign)
);
//整理headersToSign,以';'号连接
$signedHeaders = '';
if ($headersToSign !== null) {
$signedHeaders = strtolower(
trim(implode(";", array_keys($headersToSign)))
);
}
//组成标准请求串
$canonicalRequest = "$httpMethod\n$canonicalURI\n"
. "$canonicalQueryString\n$canonicalHeader";
//使用signKey和标准请求串完成签名
$signature = hash_hmac('sha256', $canonicalRequest, $signingKey);
//组成最终签名串
$authorizationHeader = "$authString/$signedHeaders/$signature";
return $authorizationHeader;
}
//根据headsToSign过滤应该参与签名的header
public static function getHeadersToSign($headers, $headersToSign)
{
//value被trim后为空串的header不参与签名
$filter_empty = function($v) {
return trim((string) $v) !== '';
};
$headers = array_filter($headers, $filter_empty);
//处理headers的key:去掉前后的空白并转化成小写
$trim_and_lower = function($str){
return strtolower(trim($str));
};
$temp = array();
$process_keys = function($k, $v) use(&$temp, $trim_and_lower) {
$temp[$trim_and_lower($k)] = $v;
};
array_map($process_keys, array_keys($headers), $headers);
$headers = $temp;
//取出headers的key以备用
$header_keys = array_keys($headers);
$filtered_keys = null;
if ($headersToSign !== null) {
//如果有headersToSign,则根据headersToSign过滤
//预处理headersToSign:去掉前后的空白并转化成小写
$headersToSign = array_map($trim_and_lower, $headersToSign);
//只选取在headersToSign里面的header
$filtered_keys = array_intersect_key($header_keys, $headersToSign);
} else {
//如果没有headersToSign,则根据默认规则来选取headers
$filter_by_default = function($k) {
return SampleSigner::isDefaultHeaderToSign($k);
};
$filtered_keys = array_filter($header_keys, $filter_by_default);
}
//返回需要参与签名的header
return array_intersect_key($headers, array_flip($filtered_keys));
}
//检查header是不是默认参加签名的:
//1.是host、content-type、content-md5、content-length之一
//2.以x-bce开头
public static function isDefaultHeaderToSign($header)
{
$header = strtolower(trim($header));
if (in_array($header, SampleSigner::$defaultHeadersToSign)) {
return true;
}
return substr_compare($header, SampleSigner::BCE_PREFIX, 0, strlen(SampleSigner::BCE_PREFIX)) == 0;
}
}
SampleSigner::__init();
//签名示范代码
$signer = new SampleSigner();
$credentials = array("ak" => "0b0f67dfb88244b289b72b142befad0c","sk" => "bad522c2126a4618a8125f4b6cf6356f");
$httpMethod = "PUT";
$path = "/v1/test/myfolder/readme.txt";
$headers = array("Host" => "bj.bcebos.com",
"Content-Length" => 8,
"Content-MD5" => "NFzcPqhviddjRNnSOGo4rw==",
"Content-Type" => "text/plain",
"x-bce-date" => "2015-04-27T08:23:49Z");
$params = array("partNumber" => 9, "uploadId" => "a44cc9bab11cbd156984767aad637851");
date_default_timezone_set("PRC");
$timestamp = new \DateTime();
$timestamp->setTimestamp(1430123029);
$options = array(SignOption::TIMESTAMP => $timestamp);
$ret = $signer->sign($credentials, $httpMethod, $path, $headers, $params, $options);
print $ret;
所以我们只需要将demo中默认的BOS接口改成OCR接口即可
$signer = new SampleSigner();
$credentials = array("ak" => "你的ak","sk" => "你的sk");
$httpMethod = "POST";
$host= "word.bj.baidubce.com";
$path = "/api/v1/ocr/general";
$url="http://".$host.$path;
date_default_timezone_set('UTC');
$bceDate = date("Y-m-d") . "T" . date("H:i:s") . "Z";
$headers = array(
"host" => $host,
"x-bce-date" => $bceDate);
$params = array();
date_default_timezone_set("PRC");
$timestamp = new \DateTime();
$timestamp->setTimestamp(time());
$options = array(SignOption::TIMESTAMP => $timestamp);//这里还可以指定header中哪些字段需要自定义签名,如SignOption::HEADERS_TO_SIGN=>array("字段1"=>"值","字段2"=>"值")
$ret = $signer->sign($credentials, $httpMethod, $path, $headers, $params, $options);
//获取到加密字符串后准备请求接口
$tempfile="test.jpg";
$handle = fopen($tempfile,'rb');
$file_content = fread($handle,filesize($tempfile));
fclose($handle);
$encoded = base64_encode($file_content);
$data="image=".urlencode($encoded);
$head = array(
"host:{$host}",
"x-bce-date:{$bceDate}",
"Authorization:{$ret}",
"content-type: application/x-www-form-urlencoded"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $head);
curl_setopt($ch, CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST' );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
$output = curl_exec($ch);
curl_close($ch);
print_r($output);
但是官方的demo有个小坑,导致访问OCR接口的时候一直提示
BecResponseException{httpStatus='401', requestId='null', code='AuthError', message='Bad signature or AK string and SK string do not match.'}
解决方法如下:
将官方demo代码做如下修改
//填充headersToSign,也就是指明哪些header参与签名
$headersToSign = null;
if (isset($options[SignOption::HEADERS_TO_SIGN])) {
$headersToSign = $options[SignOption::HEADERS_TO_SIGN];
}
$headersToSign=SampleSigner::getHeadersToSign($headers, $headersToSign);
//生成标准化header
$canonicalHeader = HttpUtil::getCanonicalHeaders($headersToSign);
具体原因如下
百度的文档中提到
CanonicalHeaders:对HTTP请求中的Header部分进行选择性编码的结果。
您可以自行决定哪些Header 需要编码。百度云API的唯一要求是Host域必须被编码。大多数情况下,我们推荐您对以下Header进行编码:
Host
Content-Length
Content-Type
Content-MD5
所有以 x-bce- 开头的Header
如果这些Header没有全部出现在您的HTTP请求里面,那么没有出现的部分无需进行编码。
如果您按照我们的推荐范围进行编码,那么认证字符串中的 {signedHeaders} 可以直接留空,无需填写。
您也可以自行选择自己想要编码的Header。如果您选择了不在推荐范围内的Header进行编码,或者您的HTTP请求包含了推荐范围内的Header但是您选择不对它进行编码,那么您必须在认证字符串中填写 {signedHeaders} 。填写方法为,把所有在这一阶段进行了编码的Header名字转换成全小写之后按照字典序排列,然后用分号(;)连接。
其中header中Host、Content-Length、COntent-Type、Content-MD5、x-bce-开头的这几个字段推荐你其进行签名编码。如果都签名了,那么signedHeaders的值可以为空。百度提供的demo中会检测你传进来的header数组、如果包含上面几个字段会自动帮你签名。
实际上signedHeaders为空的话会导致提示签名字符串错误。
本文地址:https://www.blear.cn/article/baidu-ocr-demo
转载时请以链接形式注明出处
评论