← Back to journal
·4 min read

Go Microservices at Scale: Patterns from Production

GoMicroservicesgRPCRabbitMQ

After spending years building and maintaining Go microservices in production, I have strong opinions about what works and what does not. These patterns are not theoretical. They come from debugging 3 AM incidents, untangling service dependencies, and learning the hard way that "it works on my machine" means nothing when you have dozens of services talking to each other.

gRPC vs REST: When to Use Which

I default to gRPC for internal service-to-service communication and REST for anything client-facing. The reasoning is straightforward: gRPC gives you strongly typed contracts via protobuf, efficient binary serialization, and built-in streaming. REST gives you browser compatibility and easier debugging with curl.

The mistake I see teams make is using REST everywhere internally because "it's simpler." It is simpler, until you are debugging a production issue caused by a field name mismatch between two services that went undetected for weeks because JSON does not enforce schemas at the transport layer.

The Handler-Service-Repository Pattern

Every Go microservice I build follows the same layered structure:

  • Handler layer - Translates gRPC/HTTP requests into domain calls. No business logic here.
  • Service layer - All business logic lives here. Testable in isolation.
  • Repository layer - Database access only. Swappable for testing with interfaces.

This is not revolutionary, but discipline in maintaining these boundaries is what separates services that are a joy to maintain from ones that become unmaintainable.

RabbitMQ for Async Processing

For anything that does not need a synchronous response, I push it to a message queue. Notification delivery, audit logging, analytics events - these should never block a user-facing request.

Here is a pattern I use for a reliable RabbitMQ consumer with graceful shutdown:

func StartConsumer(ctx context.Context, ch *amqp.Channel, queue string, handler func([]byte) error) error {
    msgs, err := ch.Consume(queue, "", false, false, false, false, nil)
    if err != nil {
        return fmt.Errorf("failed to register consumer: %w", err)
    }

    for {
        select {
        case <-ctx.Done():
            log.Println("shutting down consumer gracefully")
            return nil
        case msg, ok := <-msgs:
            if !ok {
                return fmt.Errorf("channel closed unexpectedly")
            }
            if err := handler(msg.Body); err != nil {
                log.Printf("handler error: %v, nacking message", err)
                _ = msg.Nack(false, true)
                continue
            }
            _ = msg.Ack(false)
        }
    }
}

The key details: manual acknowledgment (not auto-ack), nack with requeue on failure, and a context-based shutdown that lets in-flight messages finish processing before the service exits.

Error Handling: Wrap, Don't Swallow

Go's explicit error handling is a feature, not a tax. The pattern I enforce across every service:

  1. Wrap errors with context using fmt.Errorf("operation failed: %w", err).
  2. Never log and return - do one or the other, not both.
  3. Define sentinel errors for expected failure cases so callers can branch on them.
  4. Return gRPC status codes that actually mean something - codes.NotFound vs codes.Internal matters for client retry logic.

Graceful Shutdown Is Non-Negotiable

Every service must handle SIGTERM cleanly. That means draining HTTP/gRPC connections, finishing in-flight message processing, and closing database pools. I have seen production incidents caused by services that just exit on signal, leaving database transactions hanging and messages half-processed.

The pattern is always the same: trap the signal, cancel a root context, and let every component use that context to wind down. It takes 20 minutes to implement and saves you hours of debugging orphaned state.

What I Would Do Differently

If I were starting fresh, I would invest more in structured logging from day one. Adding correlation IDs and trace context to logs after the fact is painful. I would also standardize on a single error reporting approach across all services instead of letting each team pick their own.

The Go ecosystem for microservices is mature enough that you do not need to reinvent anything. But you do need discipline in applying patterns consistently. The services that cause the fewest incidents are not the ones with the cleverest code. They are the ones that are boringly predictable.