在gRPC中如何正确处理错误

sofia 3年前 (2022-01-19) 相关技术 | 抢沙发 4167

文章评分 0 次，平均分 0.0 ：

[收起] 文章目录

代码示例
gRPC中的错误处理
使用gRPC元数据传递错误元数据
Google Richer错误模型
使用自定义错误模型
用于错误处理的全局拦截器
使用Spring拦截器
总结

在gRPC中如何正确处理错误

正确处理错误可能很棘手，而且在gRPC中可能更棘手。当前版本的gRPC仅具有基于简单状态代码和元数据的有限内置错误处理。在本文中，我们将看到gRPC错误处理的局限性，以及如何克服和构建健壮的错误处理框架。

代码示例

本文的工作代码示例列在GitHub上。要运行该示例，请克隆存储库：https://github.com/techdozo/grpc/tree/master/grpc-spring-boot，并将grpc-spring-boot作为项目导入到您喜爱的IDE中。

在gRPC中如何正确处理错误

代码示例由两个微服务组成——

Product Gateway 产品网关–充当API网关（产品服务Product Service的客户端）并公开REST API（Gradle模块产品API网关）
Product Service 产品服务–公开gRPC API（Gradle模块产品服务）

还有第三个Gradle模块，称为commons，它包含产品网关服务和产品服务所使用的常见异常。

您可以通过分别调用ProductGatewayApplication和ProductApplication的main方法从IDE启动这些服务。

您可以通过以下方式调用产品网关服务API来测试应用程序：

curl --location --request GET 'http://localhost:8080/products/32c29935-da42-4801-825a-ac410584c281' \
--data-raw ''

gRPC中的错误处理

默认情况下，gRPC严重依赖状态代码(https://grpc.io/docs/guides/error/#error-status-codes)进行错误处理。但这种方法有一些缺点。让我们试着用例子来理解。

在我们的示例应用程序中，服务器端产品服务公开了gRPC服务getProduct。此API从ProductRepository获取产品，并将响应返回给客户端，如下所示：

public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver)  {

  String productId = request.getProductId();
  var product = productRepository.get(productId);

  var response =
      GetProductResponse.newBuilder()
          .setName(product.getName())
          .setDescription(product.getDescription())
          .setPrice(product.getPrice())
          .setUserId(product.getUserId())
          .build();
  responseObserver.onNext(response);
  responseObserver.onCompleted();
  log.info("Finished calling Product API service..");
}

ProductRepository从productStorage获取数据并返回产品，如果未找到以下产品，则抛出错误：

public Product get(String productId) {
  var product = Optional.ofNullable(productStorage.get(productId));
  return product.orElseThrow(() -> new ResourceNotFoundException("Product ID not found"));
}

您可能会争论，为什么我们需要抛出自定义异常，为什么我们不能抛出gRPC特定的StatusRunTimeException作为

product.orElseThrow(() -> Status.NOT_FOUND.withDescription("Product ID not found").asRuntimeException());

最大的好处是分离关注点。您不想用属于传输（API）层的gRPC特定代码污染业务逻辑。

客户端应用程序（产品网关服务）的职责是调用服务器应用程序，并将收到的响应转换为域对象。在发生错误的情况下，它只是将错误包装在特定于域的异常中，作为ServiceException（error.getCause（）），并将其抛出到上游进行处理。

//Client call
public Product getProduct(String productId) {
  Product product = null;
  try {
    var request = GetProductRequest.newBuilder().setProductId(productId).build();
    var productApiServiceBlockingStub = ProductServiceGrpc.newBlockingStub(managedChannel);
    var response = productApiServiceBlockingStub.getProduct(request);
    // Map to domain object
    product = ProductMapper.MAPPER.map(response);
  } catch (StatusRuntimeException error) {
    log.error("Error while calling product service, cause {}", error.getMessage());
    throw new ServiceException(error.getCause());
  }
  return product;
}

看起来很简单，但有一个问题。如果出现错误，在客户端，您将看到——

io.grpc.StatusRuntimeException: UNKNOWN

为什么我们看到状态为未知的StatusRuntimeException？

gRPC将自定义异常ResourceNotFoundException包装在StatusRuntimeException中，并接受错误消息并指定默认状态代码UNKNOWN。

我们可以通过捕获服务器服务中的ResourceNotFoundException和调用responseObserver来改进错误处理。onError（…）作为：

//Server Product Service API
public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver) {
  String productId = request.getProductId();
  try {
    var product = productRepository.get(productId);
    var response =
        GetProductResponse.newBuilder()
            .setName(product.getName())
            .setDescription(product.getDescription())
            .setPrice(product.getPrice())
            .setUserId(product.getUserId())
            .build();
    responseObserver.onNext(response);
    responseObserver.onCompleted();
  } catch (ResourceNotFoundException error) {
    log.error("Product id, {} not found", productId);
    var status = Status.NOT_FOUND.withDescription(error.getMessage()).withCause(error);
    responseObserver.onError(status.asException());
  }
  log.info("Finished calling Product API service..");
}

在客户端，您将看到：

Error while calling product service, cause NOT_FOUND: Product ID not found

您会注意到，在客户端，您没有得到服务器抛出的原始异常ResourceNotFoundException，因此出现了错误。客户端上的getCause（）实际上返回null。

throw new ServiceException(error.getCause()); //error.getCause() is null

为什么？

从带有原因的状态（可丢弃原因）的正式文档(https://grpc.github.io/grpc-java/javadoc/io/grpc/Status.html#withCause-java.lang.Throwable-)中，原因不会从服务器传输到客户端。

创建具有给定原因的状态的派生实例。但是，原因不会从服务器传输到客户端。

grpc java文档

使用gRPC元数据传递错误元数据

但是，如果需要将一些错误元数据信息传递回客户端，该怎么办？例如，在我们的示例应用程序中，我们可能希望在发生错误时传递产品id和标准错误消息。这可以通过使用gRPC元数据来实现。

public Product get(String productId) {
  var product = Optional.ofNullable(productStorage.get(productId));

  return product.orElseThrow(
      () ->
          new ResourceNotFoundException(
              "Product ID not found",
              Map.of("resource_id", productId, "message", "Product ID not found")));
}

幸运的是，ResourceNotFoundException类有一个重载构造函数，它将其他errorMetadata作为ResourceNotFoundException(String message, Map<String, String> errorMetaData)。

我们可以通过捕获ResourceNotFoundException并调用responseObserver来更改产品服务API调用。onError（statusRuntimeException），附加元数据如下：

public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver) {
  String productId = request.getProductId();
  try {
    var product = productRepository.get(productId);
    var response =
        GetProductResponse.newBuilder()
            .setName(product.getName())
            .setDescription(product.getDescription())
            .setPrice(product.getPrice())
            .setUserId(product.getUserId())
            .build();
    responseObserver.onNext(response);
    responseObserver.onCompleted();
  } catch (ResourceNotFoundException error) {
    log.error("Product id, {} not found", productId);
    var errorMetaData = error.getErrorMetaData();
    var metadata = new Metadata();    
    errorMetaData.entrySet().stream() 
        .forEach(
            entry ->
                metadata.put(
                    Metadata.Key.of(entry.getKey(), Metadata.ASCII_STRING_MARSHALLER),
                    entry.getValue()));
    var statusRuntimeException =
        Status.NOT_FOUND.withDescription(error.getMessage()).asRuntimeException(metadata); 
    responseObserver.onError(statusRuntimeException);
  }
  log.info("Finished calling Product API service..");
}

让我们了解一下这里正在做什么。

从自定义ResourceNotFoundException获取错误元数据作为错误getErrorMetaData（）。
对于错误元数据的每个键值对，创建一个键作为元数据Metadata.Key.of(entry.getKey(), Metadata.ASCII_STRING_MARSHALLER)。
通过调用元数据在元数据中存储metadata.put(Key,Value)。
通过向Status传递元数据来创建StatusRuntimeException。
调用responseObserver以设置错误条件。

在客户端，您可以捕获StatusRuntimeException并从错误中获取元数据，如下所示：

} catch (StatusRuntimeException error) {

  Metadata trailers = error.getTrailers();
  Set<String> keys = trailers.keys();

  for (String key : keys) {
    Metadata.Key<String> k = Metadata.Key.of(key, Metadata.ASCII_STRING_MARSHALLER);
    log.info("Received key {}, with value {}", k, trailers.get(k));
  }
}

如果出现错误，上述语句将打印：

Received key Key{name='resource_id'}, with value 32c29935-da42-4801-825a-ac410584c281
Received key Key{name='content-type'}, with value application/grpc
Received key Key{name='message'}, with value Product ID not found

如您所见，不清楚哪些元数据是与错误相关的，因为元数据可能包含其他信息，如内容类型（或跟踪信息）。当然，您可以定义自己的约定（例如，将所有错误元数据键附加上err_）。

还有另一种更干净的方法来处理错误元数据传播。

Google Richer错误模型

谷歌的google.rpc.Status提供了更丰富的错误处理功能。谷歌API使用这种方法，但它还不是官方gRPC错误模型的一部分。在内部，它仍然使用元数据，但使用的方式更干净google.rpc.Status定义为：

package google.rpc;

// The `Status` type defines a logical error model that is suitable for
// different programming environments, including REST APIs and RPC APIs.
message Status {
  // A simple error code that can be easily handled by the client. The
  // actual error code is defined by `google.rpc.Code`.
  int32 code = 1;

  // A developer-facing human-readable error message in English. It should
  // both explain the error and offer an actionable resolution to it.
  string message = 2;

  // Additional error information that the client code can use to handle
  // the error, such as retry info or a help link.
  repeated google.protobuf.Any details = 3;
}

您必须了解和此方法相关联的gotcha(https://grpc.io/docs/guides/error/#richer-error-model)，主要是它不受所有语言库的支持，并且在不同语言之间的实现可能不一致。

使用自定义错误模型

将您自己的自定义错误模型定义为：

message ErrorDetail {
  // Error code
  string errorCode = 1;
  //Error message
  string message = 2;
  // Additional metadata associated with the Error
  map<string, string> metadata = 3;
}

在服务器端产品服务上，构建ErrorInfo模型并添加到com.google.rpc.Status.调用addDetails(Any.pack(errorStatus))，如下所示：

//Catch Block
} catch (ResourceNotFoundException error) {
   log.error("Product id, {} not found", productId);
   var errorMetaData = error.getErrorMetaData();
   Resources.ErrorDetail errorInfo =
       Resources.ErrorDetail.newBuilder()
           .setErrorCode("ResourceNotFound")
           .setMessage(error.getMessage())
           .putAllMetadata(errorMetaData)
           .build();
   com.google.rpc.Status status =
       com.google.rpc.Status.newBuilder()
           .setCode(Code.NOT_FOUND.getNumber())
           .setMessage("Product id not found")
           .addDetails(Any.pack(errorInfo))
           .build();
   responseObserver.onError(StatusProto.toStatusRuntimeException(status));
 }

并且，在客户端产品网关服务上，将catch block更改为：

//Catch Block
} catch (StatusRuntimeException error) {
   com.google.rpc.Status status = io.grpc.protobuf.StatusProto.fromThrowable(error);
   Resources.ErrorDetail errorInfo = null;
   for (Any any : status.getDetailsList()) {
     if (!any.is(Resources.ErrorDetail.class)) {
       continue;
     }
     errorInfo = any.unpack(Resources.ErrorDetail.class);
   }
   log.info(" Error while calling product service, reason {} ", errorInfo.getMessage());
   throw new ServiceException(errorInfo.getMessage(), errorInfo.getMetadataMap());
 }

使用预定义的错误模型

您可以使用错误详细信息中的预定义错误模型，而不是定义自己的错误模型。原型。例如，您可以使用定义为以下内容的ErrorInfo：

message ErrorInfo {

  // The reason of the error. This is a constant value that identifies the
  // proximate cause of the error. Error reasons are unique within a particular
  // domain of errors. This should be at most 63 characters and match
  // /[A-Z0-9_]+/.
  string reason = 1;

  // The logical grouping to which the "reason" belongs. The error domain
  // is typically the registered service name of the tool or product that
  // generates the error. Example: "pubsub.googleapis.com". If the error is
  // generated by some common infrastructure, the error domain must be a
  // globally unique value that identifies the infrastructure. For Google API
  // infrastructure, the error domain is "googleapis.com".
  string domain = 2;

  // Additional structured details about this error.
  // Keys should match /[a-zA-Z0-9-_]/ and be limited to 64 characters in
  // length. When identifying the current value of an exceeded limit, the units
  // should be contained in the key, not the value.  For example, rather than
  // {"instanceLimit": "100/request"}, should be returned as,
  // {"instanceLimitPerRequest": "100"}, if the client exceeds the number of
  // instances that can be created in a single (batch) request.
  map<string, string> metadata = 3;
}

在服务器端产品服务上，您可以使用com.google.rpc.ErrorInfo错误信息为：

} catch (ResourceNotFoundException error) {
  var errorMetaData = error.getErrorMetaData();
  ErrorInfo errorInfo =
       ErrorInfo.newBuilder()
           .setReason("Resource not found")
           .setDomain("Product")
           .putAllMetadata(errorMetaData)
           .build();
  com.google.rpc.Status status =
       com.google.rpc.Status.newBuilder()
           .setCode(Code.NOT_FOUND.getNumber())
           .setMessage("Product id not found")
           .addDetails(Any.pack(errorInfo))
           .build();
  responseObserver.onError(StatusProto.toStatusRuntimeException(status));
}

客户端中唯一的更改是将用户编译的ErrorInfo类更改为：

//Catch Block
} catch (StatusRuntimeException error) {
   com.google.rpc.Status status = io.grpc.protobuf.StatusProto.fromThrowable(error);
   ErrorInfo errorInfo = null;
   for (Any any : status.getDetailsList()) {
     if (!any.is(ErrorInfo.class)) {
       continue;
     }
     errorInfo = any.unpack(ErrorInfo.class);
   }
   log.info(" Error while calling product service, reason {} ", errorInfo.getReason());
   throw new ServiceException(errorInfo.getReason(), errorInfo.getMetadataMap());
 }

用于错误处理的全局拦截器

在服务器端产品服务中捕获和抛出异常的方法可能很快变得非常复杂和笨拙。在复杂业务逻辑的情况下，您可能会得到类似catch的代码（ResourceNotFoundException | ServiceException | OtherException error）。

我们可以通过使用gRPC拦截器来简化这个过程。拦截器捕获这些异常并相应地处理它们，如下所示：

public class GlobalExceptionHandlerInterceptor implements ServerInterceptor {

  @Override
  public <T, R> ServerCall.Listener<T> interceptCall(
      ServerCall<T, R> serverCall, Metadata headers, ServerCallHandler<T, R> serverCallHandler) {
    ServerCall.Listener<T> delegate = serverCallHandler.startCall(serverCall, headers);
    return new ExceptionHandler<>(delegate, serverCall, headers);
  }

  private static class ExceptionHandler<T, R>
      extends ForwardingServerCallListener.SimpleForwardingServerCallListener<T> {

    private final ServerCall<T, R> delegate;
    private final Metadata headers;

    ExceptionHandler(
        ServerCall.Listener<T> listener, ServerCall<T, R> serverCall, Metadata headers) {
      super(listener);
      this.delegate = serverCall;
      this.headers = headers;
    }

    @Override
    public void onHalfClose() {
      try {
        super.onHalfClose();
      } catch (RuntimeException ex) {
        handleException(ex, delegate, headers);
        throw ex;
      }
    }

    private void handleException(
        RuntimeException exception, ServerCall<T, R> serverCall, Metadata headers) {
      // Catch specific Exception and Process
      if (exception instanceof ResourceNotFoundException) {
        var errorMetaData = ((ResourceNotFoundException) exception).getErrorMetaData();
        // Build google.rpc.ErrorInfo
        var errorInfo =
            ErrorInfo.newBuilder()
                .setReason("Resource not found")
                .setDomain("Product")
                .putAllMetadata(errorMetaData)
                .build();

        com.google.rpc.Status rpcStatus =
            com.google.rpc.Status.newBuilder()
                .setCode(Code.NOT_FOUND.getNumber())
                .setMessage("Product id not found")
                .addDetails(Any.pack(errorInfo))
                .build();

        var statusRuntimeException = StatusProto.toStatusRuntimeException(rpcStatus);

        var newStatus = Status.fromThrowable(statusRuntimeException);
        // Get metadata from statusRuntimeException
        Metadata newHeaders = statusRuntimeException.getTrailers();

        serverCall.close(newStatus, newHeaders);
      } else {
        serverCall.close(Status.UNKNOWN, headers);
      }
    }
  }
}

让我们了解这里正在做什么-

首先，通过从ForwardingServerCallListener扩展来创建ExceptionHandler，它覆盖ForwardingServerCallListener.SimpleForwardingServerCallListener<T>。
handleException（…）方法首先构建google.rpc.ErrorInfo，然后将ErrorInfo添加到com.google.rpc.Status，它在内部构建包含ErrorInfo的新元数据。
作为serverCall.close(status, newHeaders)，获取io.grpc.Status。我们需要转换com.google.rpc.Status。通过调用Status.fromThrowable(statusRuntimeException)
那么我们所需要做的就是调用serverCall.close(status, newHeaders)。

产品服务API的服务器端服务实现所需的唯一更改是删除catch块和异常处理逻辑，如下所示：

public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver) {

  String productId = request.getProductId();
  var product = productRepository.get(productId);
  var response =
      GetProductResponse.newBuilder()
          .setName(product.getName())
          .setDescription(product.getDescription())
          .setPrice(product.getPrice())
          .setUserId(product.getUserId())
          .build();
  responseObserver.onNext(response);
  responseObserver.onCompleted();
}

在客户端，没有任何更改，即我们可以将ErrorInfo类的实例设置为errorInfo = any.unpack(ErrorInfo.class)。

使用Spring拦截器

如果您可以使用grpc-spring-boot-starter，那么这将大大简化一切。您只需创建一个类并用@GrpcAdvice注释该类，并提供处理单个异常的方法，如下所示：

@GrpcAdvice
public class ExceptionHandler {

  @GrpcExceptionHandler(ResourceNotFoundException.class)
  public StatusRuntimeException handleResourceNotFoundException(ResourceNotFoundException cause) {
    var errorMetaData = cause.getErrorMetaData();
    var errorInfo =
        ErrorInfo.newBuilder()
            .setReason("Resource not found")
            .setDomain("Product")
            .putAllMetadata(errorMetaData)
            .build();
    var status =
        com.google.rpc.Status.newBuilder()
            .setCode(Code.NOT_FOUND.getNumber())
            .setMessage("Resource not found")
            .addDetails(Any.pack(errorInfo))
            .build();
    return StatusProto.toStatusRuntimeException(status);
  }
}

这种方法类似于Spring错误处理。您只需要为特定的错误条件定义一个带有注释@GrpcExceptionHandler的方法，例如@GrpcExceptionHandler（ResourceNotFoundException.class）。就这样，服务器端不需要其他更改。

总结

在gRPC中，正确处理错误可能非常棘手。正式而言，gRPC严重依赖状态代码和元数据来处理错误。我们可以使用gRPC元数据将额外的错误元数据从服务器应用程序传递到客户端应用程序。谷歌的google.rpc.Status提供了更丰富的错误处理功能，但并非所有语言都完全支持它。可以定义一个全局gRPC拦截器来集中处理所有错误条件。spring boot包装器库yidongann/grpc启动器(https://github.com/yidongnan/grpc-spring-boot-starter)提供了一种更干净的方法来处理错误。

原文地址：https://techdozo.dev/getting-error-handling-right-in-grpc/