Back to Questions

APIs & Web

REST, GraphQL, authentication, and web protocols

♦

Difficulty:

Session: 0 asked

REST constraints: stateless, uniform interface, etc.

What are the REST architectural constraints?

Junior

REST (Representational State Transfer) has 6 constraints:

1. Client-Server:
Separation of concerns between UI and data storage.

Client (UI) ←→ Server (Data/Logic)
Independent evolution

2. Stateless:
Each request contains all information needed. Server stores no client context.

# Bad: Server remembers previous request
GET /next-page

# Good: Client sends all context
GET /users?page=2&limit=10
Authorization: Bearer token123

3. Cacheable:
Responses must define if they're cacheable.

Cache-Control: public, max-age=3600
ETag: "abc123"

4. Uniform Interface:
Standardized way to interact with resources.
- Resource identification (URIs)
- Resource manipulation through representations
- Self-descriptive messages
- HATEOAS (Hypermedia)

5. Layered System:
Client can't tell if connected directly to server.

Client → Load Balancer → API Gateway → Server → Database

6. Code on Demand (Optional):
Server can send executable code (JavaScript).

Uniform Interface Sub-constraints:

Resource Identification:

/users/123          # Specific user
/users/123/orders   # User's orders

Resource Manipulation:

GET /users/123      # Get representation
PUT /users/123      # Replace with representation
DELETE /users/123   # Delete resource

Self-Descriptive:

Content-Type: application/json
Accept: application/json

HATEOAS:

{
  "id": 123,
  "name": "Alice",
  "_links": {
    "self": "/users/123",
    "orders": "/users/123/orders"
  }
}

Key Points to Look For:
- Knows all 6 constraints
- Understands statelessness
- Can explain uniform interface

Follow-up: Why is statelessness important for scalability?

HTTP methods: GET, POST, PUT, PATCH, DELETE semantics

Explain the semantics of HTTP methods and when to use each.

Junior

HTTP Methods:

Method	Purpose	Idempotent	Safe	Body
GET	Read resource	Yes	Yes	No
POST	Create resource	No	No	Yes
PUT	Replace resource	Yes	No	Yes
PATCH	Partial update	No*	No	Yes
DELETE	Remove resource	Yes	No	Optional

*PATCH can be idempotent depending on implementation

GET - Read:

GET /users/123
Response: 200 OK with user data

# Should NEVER modify data
# Cacheable

POST - Create:

POST /users
Body: {"name": "Alice", "email": "alice@example.com"}
Response: 201 Created
Location: /users/456

PUT - Replace:

PUT /users/123
Body: {"name": "Alice Updated", "email": "new@example.com"}
Response: 200 OK or 204 No Content

# Replaces ENTIRE resource
# Omitted fields become null/default

PATCH - Partial Update:

PATCH /users/123
Body: {"email": "new@example.com"}
Response: 200 OK

# Updates ONLY specified fields
# Other fields unchanged

DELETE - Remove:

DELETE /users/123
Response: 204 No Content or 200 OK

# Idempotent: Deleting twice = same result

Common Mistakes:

# Wrong: Using GET to delete
GET /users/123/delete

# Wrong: Using POST for everything
POST /users/123/update

# Wrong: PUT for partial update
PUT /users/123 {"email": "new@example.com"}
# This would null out name!

Key Points to Look For:
- Knows idempotent vs safe
- Understands PUT vs PATCH
- Correct method for operation

Follow-up: What's the difference between safe and idempotent?

HTTP status codes: when to use which

What are the common HTTP status codes and when should you use them?

Junior

Status Code Categories:
- 1xx: Informational
- 2xx: Success
- 3xx: Redirection
- 4xx: Client Error
- 5xx: Server Error

Success (2xx):

200 OK           # Generic success, response body included
201 Created      # Resource created (POST)
204 No Content   # Success, no response body (DELETE)
202 Accepted     # Async processing started

Redirection (3xx):

301 Moved Permanently  # URL changed permanently, update bookmarks
302 Found              # Temporary redirect
304 Not Modified       # Cached version is fresh

Client Error (4xx):

400 Bad Request        # Malformed request, validation failed
401 Unauthorized       # Not authenticated (needs login)
403 Forbidden          # Authenticated but not authorized
404 Not Found          # Resource doesn't exist
405 Method Not Allowed # Wrong HTTP method
409 Conflict           # State conflict (duplicate, version mismatch)
422 Unprocessable      # Validation error (semantic)
429 Too Many Requests  # Rate limited

Server Error (5xx):

500 Internal Error     # Generic server error
502 Bad Gateway        # Upstream server error
503 Service Unavailable# Temporarily down
504 Gateway Timeout    # Upstream timeout

Common Scenarios:

Scenario	Status Code
Get user successfully	200
Create user	201
Delete user	204
User not found	404
Invalid email format	400
Email already exists	409
Wrong password	401
No permission for resource	403
Server crashed	500
Rate limit exceeded	429

Common Mistakes:

# Wrong: 200 for everything
POST /users → 200 OK (should be 201)

# Wrong: 200 for errors
GET /users/999 → 200 {"error": "Not found"} (should be 404)

# Wrong: 500 for validation
POST /users {"email": "invalid"} → 500 (should be 400)

Key Points to Look For:
- Knows common codes
- Uses appropriate code per scenario
- Distinguishes 400 vs 401 vs 403

Follow-up: When would you use 409 Conflict?

REST vs RPC style APIs

What's the difference between REST and RPC style APIs?

Mid

REST (Resource-Oriented):
Focus on resources (nouns).

GET    /users/123         # Get user
POST   /users             # Create user
PUT    /users/123         # Update user
DELETE /users/123         # Delete user
GET    /users/123/orders  # Get user's orders

RPC (Action-Oriented):
Focus on actions (verbs).

POST /getUser         {"userId": 123}
POST /createUser      {"name": "Alice"}
POST /updateUser      {"userId": 123, "name": "Bob"}
POST /deleteUser      {"userId": 123}
POST /getUserOrders   {"userId": 123}

Comparison:

Aspect	REST	RPC
Focus	Resources	Actions
URLs	Nouns	Verbs
HTTP methods	Semantic meaning	Usually POST
Caching	Built-in (GET)	Manual
Discoverability	HATEOAS	Documentation
Flexibility	Standardized	Custom

When RPC Makes Sense:

1. Actions that aren't CRUD:

# REST (awkward)
POST /emails/123/send-action

# RPC (natural)
POST /sendEmail {"emailId": 123}

2. Complex operations:

# REST
POST /transfers
{"from": "account1", "to": "account2", "amount": 100}

# RPC
POST /transferMoney
{"fromAccount": "account1", "toAccount": "account2", "amount": 100}

3. Batch operations:

# RPC
POST /batchCreateUsers
{"users": [{...}, {...}, {...}]}

Modern Hybrid Approach:

# RESTful resources
GET  /users/123
POST /users

# RPC for actions
POST /users/123/activate
POST /users/123/reset-password
POST /orders/456/cancel

gRPC (Modern RPC):

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc CreateUser(CreateUserRequest) returns (User);
}

Key Points to Look For:
- Knows fundamental difference
- Recognizes when each fits
- Understands hybrid approach

Follow-up: When would you choose gRPC over REST?

HATEOAS explained

What is HATEOAS and why is it important?

Senior

HATEOAS (Hypermedia as the Engine of Application State):
Responses include links to related resources and available actions.

Without HATEOAS:

{
  "id": 123,
  "name": "Alice",
  "status": "active"
}

Client must know all URLs and valid operations.

With HATEOAS:

{
  "id": 123,
  "name": "Alice",
  "status": "active",
  "_links": {
    "self": {"href": "/users/123"},
    "orders": {"href": "/users/123/orders"},
    "deactivate": {"href": "/users/123/deactivate", "method": "POST"}
  }
}

Benefits:

1. Discoverability:

// API root
GET /api
{
  "_links": {
    "users": {"href": "/api/users"},
    "products": {"href": "/api/products"},
    "orders": {"href": "/api/orders"}
  }
}

2. Decoupled Clients:

// Client follows links, not hardcoded URLs
const user = await fetch('/api/users/123');
const orders = await fetch(user._links.orders.href);

3. State-Aware Actions:

// Pending order
{
  "id": 456,
  "status": "pending",
  "_links": {
    "cancel": {"href": "/orders/456/cancel"},
    "pay": {"href": "/orders/456/pay"}
  }
}

// Shipped order - different available actions
{
  "id": 456,
  "status": "shipped",
  "_links": {
    "track": {"href": "/orders/456/tracking"},
    "return": {"href": "/orders/456/return"}
  }
}

Common Formats:

HAL (Hypertext Application Language):

{
  "_links": {
    "self": {"href": "/orders/123"}
  },
  "_embedded": {
    "items": [...]
  }
}

JSON:API:

{
  "data": {
    "type": "users",
    "id": "123",
    "links": {"self": "/users/123"}
  }
}

Reality Check:
- Most APIs don't fully implement HATEOAS
- Client developers often prefer documentation
- Can add complexity

Key Points to Look For:
- Understands the concept
- Knows benefits and trade-offs
- Can implement basic example

Follow-up: Why don't most APIs implement full HATEOAS?

API versioning strategies

What are different API versioning strategies? What are the trade-offs?

Mid

Versioning Strategies:

1. URL Path Versioning:

GET /v1/users/123
GET /v2/users/123

Pros: Clear, easy caching, easy routing
Cons: Not RESTful (resource = user, not v1/user)

2. Query Parameter:

GET /users/123?version=1
GET /users/123?version=2

Pros: Optional parameter, same resource
Cons: Can be forgotten, caching complex

3. Header Versioning:

GET /users/123
Accept: application/vnd.myapi.v1+json
# or
X-API-Version: 1

Pros: Clean URLs, RESTful
Cons: Hidden, harder to test in browser

4. Content Negotiation:

GET /users/123
Accept: application/vnd.company.user-v2+json

Pros: True REST, per-resource versioning
Cons: Complex, client must specify

Comparison:

Strategy	Visibility	Cacheable	RESTful
URL path	High	Easy	No
Query param	Medium	Complex	Somewhat
Header	Low	Medium	Yes
Content-type	Low	Medium	Yes

Best Practices:

1. Start with versioning:

# Even v1 makes it clear API is versioned
/v1/users

2. Semantic versioning for major changes:

v1 → v2: Breaking changes
v1.1: New features (backward compatible)

3. Deprecation policy:

# Response headers
Deprecation: true
Sunset: Sat, 31 Dec 2024 23:59:59 GMT
Link: </v2/users>; rel="successor-version"

4. Support multiple versions:

# Version routing
@app.route('/v1/users/<id>')
def get_user_v1(id):
    return UserSerializerV1(user).data

@app.route('/v2/users/<id>')
def get_user_v2(id):
    return UserSerializerV2(user).data  # Different format

Key Points to Look For:
- Knows multiple strategies
- Understands trade-offs
- Has deprecation strategy

Follow-up: How do you handle breaking changes in APIs?

Pagination: offset vs cursor-based

Compare offset-based and cursor-based pagination.

Mid

Offset-Based (Traditional):

GET /users?page=5&limit=20
# or
GET /users?offset=80&limit=20

SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 80;

Problems:
1. Performance: Large offsets scan many rows
2. Inconsistency: New records shift pages

Page 1: [A, B, C, D, E]
# New item X inserted
Page 2: [E, F, G, H, I]  # E duplicated!

Cursor-Based (Keyset):

GET /users?limit=20
Response:
{
  "data": [...],
  "next_cursor": "eyJpZCI6MTIzfQ=="
}

GET /users?limit=20&cursor=eyJpZCI6MTIzfQ==

-- Cursor contains: id=123, created_at=2024-01-15
SELECT * FROM users
WHERE (created_at, id) < ('2024-01-15', 123)
ORDER BY created_at DESC, id DESC
LIMIT 20;

Comparison:

Aspect	Offset	Cursor
Jump to page	Yes	No
Performance	O(offset)	O(1)
Consistency	Poor	Good
Complexity	Simple	Complex
Bidirectional	Easy	Harder

Implementation:

# Offset-based
def get_users_offset(page, limit):
    offset = (page - 1) * limit
    users = db.query(f"SELECT * FROM users LIMIT {limit} OFFSET {offset}")
    return {
        "data": users,
        "page": page,
        "total_pages": total_count // limit
    }

# Cursor-based
def get_users_cursor(cursor, limit):
    if cursor:
        decoded = base64.decode(cursor)
        last_id, last_created = decoded.split(',')
        users = db.query(f"""
            SELECT * FROM users
            WHERE (created_at, id) < ('{last_created}', {last_id})
            ORDER BY created_at DESC, id DESC
            LIMIT {limit}
        """)
    else:
        users = db.query(f"SELECT * FROM users ORDER BY created_at DESC, id DESC LIMIT {limit}")

    next_cursor = None
    if len(users) == limit:
        last = users[-1]
        next_cursor = base64.encode(f"{last.id},{last.created_at}")

    return {"data": users, "next_cursor": next_cursor}

When to Use:

Offset: Small datasets, need page jumping, simple implementation
Cursor: Large datasets, real-time data, infinite scroll

Key Points to Look For:
- Knows performance difference
- Understands consistency issue
- Can implement both

Follow-up: How do you implement cursor-based pagination for complex sorting?

API Design

Designing intuitive API endpoints

What are best practices for designing intuitive API endpoints?

Junior

Naming Conventions:

1. Use Nouns, Not Verbs:

# Good
GET /users
POST /users
GET /users/123

# Bad
GET /getUsers
POST /createUser
GET /getUserById

2. Use Plural Nouns:

# Good
GET /users
GET /users/123

# Inconsistent
GET /user
GET /user/123

3. Hierarchical Resources:

# User's orders
GET /users/123/orders

# Specific order
GET /users/123/orders/456

# Order's items
GET /users/123/orders/456/items

4. Keep It Flat When Possible:

# If orders have unique IDs
GET /orders/456
# Instead of
GET /users/123/orders/456

5. Use Query Parameters for Filtering:

GET /users?status=active&role=admin
GET /orders?from=2024-01-01&to=2024-12-31
GET /products?category=electronics&sort=price&order=asc

6. Actions as Sub-resources:

# Non-CRUD actions
POST /users/123/activate
POST /orders/456/cancel
POST /accounts/789/transfer

7. Consistent Naming:

# Pick a style and stick with it
/user-profiles  # kebab-case
/user_profiles  # snake_case (less common)
/userProfiles   # camelCase (less common)

Good Examples:

GET    /articles              # List articles
GET    /articles/123          # Get article
POST   /articles              # Create article
PUT    /articles/123          # Update article
DELETE /articles/123          # Delete article
GET    /articles/123/comments # Article's comments
POST   /articles/123/comments # Add comment
GET    /articles?author=john&tag=tech  # Filter

Key Points to Look For:
- Nouns over verbs
- Consistent pluralization
- Logical hierarchy
- Proper HTTP methods

Follow-up: How do you handle actions that don't fit CRUD?

Request/Response body best practices

What are best practices for API request and response bodies?

Junior

Request Body:

1. Use camelCase consistently:

{
  "firstName": "Alice",
  "lastName": "Smith",
  "emailAddress": "alice@example.com"
}

2. Don't include what's in URL:

# Bad - ID in both
PUT /users/123
{"id": 123, "name": "Alice"}

# Good
PUT /users/123
{"name": "Alice"}

3. Accept partial updates for PATCH:

PATCH /users/123
{"email": "new@example.com"}
# Only email changes

Response Body:

1. Envelope pattern (optional):

{
  "data": {...},
  "meta": {
    "page": 1,
    "total": 100
  }
}

2. Include resource ID:

{
  "id": 123,
  "name": "Alice",
  "createdAt": "2024-01-15T10:30:00Z"
}

3. Use ISO 8601 for dates:

{
  "createdAt": "2024-01-15T10:30:00Z",
  "updatedAt": "2024-01-16T14:45:30Z"
}

4. Consistent null handling:

// Option 1: Include null
{"middleName": null}

// Option 2: Omit null fields
{}

// Pick one and be consistent

5. Collection responses:

{
  "data": [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
  ],
  "pagination": {
    "page": 1,
    "perPage": 20,
    "total": 150,
    "totalPages": 8
  }
}

6. Avoid deep nesting:

// Too deep
{
  "user": {
    "profile": {
      "settings": {
        "notifications": {
          "email": true
        }
      }
    }
  }
}

// Better: Flatten or separate endpoint
{
  "notificationSettings": {
    "email": true
  }
}

Key Points to Look For:
- Consistent naming convention
- Proper date formatting
- Reasonable structure

Follow-up: How do you handle field expansion/sparse fieldsets?

10.

Error handling and error responses

How should API error responses be designed?

Mid

Error Response Structure:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "The request body contains invalid data",
    "details": [
      {
        "field": "email",
        "message": "Must be a valid email address",
        "code": "INVALID_FORMAT"
      },
      {
        "field": "age",
        "message": "Must be at least 18",
        "code": "MIN_VALUE"
      }
    ],
    "requestId": "req_abc123",
    "documentationUrl": "https://api.example.com/docs/errors#VALIDATION_ERROR"
  }
}

Components:

1. Machine-readable code:

"code": "RESOURCE_NOT_FOUND"
// Not just HTTP status code
// Client can handle programmatically

2. Human-readable message:

"message": "User with ID 123 was not found"
// Safe to display to end users (sometimes)

3. Field-level errors:

"details": [
  {"field": "email", "message": "Invalid format"}
]
// For form validation

4. Request tracking:

"requestId": "req_abc123"
// For debugging/support

Error Examples:

400 Validation Error:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid request parameters",
    "details": [
      {"field": "email", "message": "Required field missing"}
    ]
  }
}

401 Authentication Error:

{
  "error": {
    "code": "AUTHENTICATION_REQUIRED",
    "message": "Valid authentication token required"
  }
}

403 Authorization Error:

{
  "error": {
    "code": "INSUFFICIENT_PERMISSIONS",
    "message": "You don't have permission to access this resource"
  }
}

404 Not Found:

{
  "error": {
    "code": "RESOURCE_NOT_FOUND",
    "message": "User not found",
    "details": {
      "resourceType": "User",
      "resourceId": "123"
    }
  }
}

500 Internal Error:

{
  "error": {
    "code": "INTERNAL_ERROR",
    "message": "An unexpected error occurred",
    "requestId": "req_abc123"
  }
}
// Don't expose internal details!

Best Practices:
1. Always return JSON, even for errors
2. Include request ID for debugging
3. Don't expose stack traces in production
4. Use consistent structure across all errors
5. Provide actionable messages when possible

Key Points to Look For:
- Consistent error structure
- Machine and human readable
- Field-level validation details
- Security awareness (no stack traces)

Follow-up: How do you handle errors in async/webhook APIs?

11.

Rate limiting and throttling

How do you implement rate limiting in an API?

Mid

Purpose:
- Prevent abuse
- Ensure fair usage
- Protect backend resources
- Manage costs

Response Headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1609459200

# When limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 60

Rate Limiting Strategies:

1. Per User/API Key:

# 1000 requests per hour per user
@rate_limit(limit=1000, period=3600, key=lambda req: req.user_id)
def api_endpoint(request):
    ...

2. Per IP:

# For unauthenticated endpoints
@rate_limit(limit=100, period=60, key=lambda req: req.ip)
def public_endpoint(request):
    ...

3. Per Endpoint:

# Different limits for different endpoints
@rate_limit(limit=10, period=60)   # 10/min for expensive
def search_endpoint(request):
    ...

@rate_limit(limit=100, period=60)  # 100/min for cheap
def get_user(request):
    ...

Implementation:

class RateLimiter:
    def __init__(self, redis):
        self.redis = redis

    def is_allowed(self, key, limit, window):
        current = self.redis.get(f"rate:{key}")
        if current and int(current) >= limit:
            return False, {
                'limit': limit,
                'remaining': 0,
                'reset': self.redis.ttl(f"rate:{key}")
            }

        pipe = self.redis.pipeline()
        pipe.incr(f"rate:{key}")
        pipe.expire(f"rate:{key}", window)
        result = pipe.execute()

        return True, {
            'limit': limit,
            'remaining': limit - result[0],
            'reset': window
        }

Tiered Limits:

Free tier:     100 requests/day
Basic tier:   1,000 requests/day
Pro tier:    10,000 requests/day
Enterprise: 100,000 requests/day

Handling 429 Response:

async function apiCall() {
  const response = await fetch('/api/data');
  if (response.status === 429) {
    const retryAfter = response.headers.get('Retry-After');
    await sleep(retryAfter * 1000);
    return apiCall(); // Retry
  }
  return response.json();
}

Key Points to Look For:
- Uses proper headers
- Has retry guidance
- Considers different strategies

Follow-up: How do you handle rate limiting in a distributed system?

12.

API documentation (OpenAPI/Swagger)

How do you document an API? What is OpenAPI/Swagger?

Junior

OpenAPI (formerly Swagger):
Standard specification for describing REST APIs.

Basic Structure:

openapi: 3.0.0
info:
  title: User API
  version: 1.0.0
  description: API for managing users

servers:
  - url: https://api.example.com/v1

paths:
  /users:
    get:
      summary: List all users
      parameters:
        - name: page
          in: query
          schema:
            type: integer
            default: 1
      responses:
        '200':
          description: List of users
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/User'

    post:
      summary: Create a user
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateUser'
      responses:
        '201':
          description: User created
        '400':
          description: Validation error

  /users/{id}:
    get:
      summary: Get user by ID
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: integer
      responses:
        '200':
          description: User found
        '404':
          description: User not found

components:
  schemas:
    User:
      type: object
      properties:
        id:
          type: integer
        name:
          type: string
        email:
          type: string
          format: email
        createdAt:
          type: string
          format: date-time

    CreateUser:
      type: object
      required:
        - name
        - email
      properties:
        name:
          type: string
          minLength: 1
          maxLength: 100
        email:
          type: string
          format: email

  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer

Tools:
- Swagger UI: Interactive documentation
- Swagger Editor: Write/edit specs
- Code generators: Generate client SDKs
- Validation: Validate requests/responses

Best Practices:
1. Keep spec up-to-date (generate from code if possible)
2. Include examples
3. Document all error responses
4. Use semantic descriptions
5. Version your documentation

Key Points to Look For:
- Knows OpenAPI structure
- Understands benefits
- Mentions tooling

Follow-up: How do you keep documentation in sync with code?

13.

Backward compatibility strategies

How do you maintain backward compatibility when evolving an API?

Senior

Backward Compatible Changes (Safe):

1. Adding fields:

// v1 response
{"id": 1, "name": "Alice"}

// v1.1 response (compatible)
{"id": 1, "name": "Alice", "email": "alice@example.com"}

2. Adding optional parameters:

// v1
GET /users

// v1.1 - new optional filter
GET /users?includeInactive=true

3. Adding new endpoints:

// New endpoint doesn't break existing
POST /users/123/preferences

Breaking Changes (Require New Version):

1. Removing fields:

// Breaking: Clients may depend on 'email'
// v1: {"id": 1, "name": "Alice", "email": "..."}
// v2: {"id": 1, "name": "Alice"}

2. Renaming fields:

// Breaking: Different key
// v1: {"userName": "alice"}
// v2: {"username": "alice"}

3. Changing field types:

// Breaking: String to object
// v1: {"location": "New York"}
// v2: {"location": {"city": "New York", "country": "USA"}}

Strategies:

1. Dual writing:

// Include both old and new format
{
  "name": "Alice",        // Old
  "fullName": "Alice",    // New
  "location": "NYC",      // Old (deprecated)
  "address": {            // New
    "city": "NYC"
  }
}

2. Deprecation period:

# Mark field as deprecated
Deprecation: true
Sunset: 2025-01-01

{
  "oldField": "value",  // Deprecated, still works
  "_warnings": [
    "'oldField' is deprecated, use 'newField' instead"
  ]
}

3. Version negotiation:

GET /users/123
Accept: application/vnd.api+json; version=2

# Fall back to latest if not specified

4. Adapter pattern:

def get_user(user_id, version):
    user = db.get_user(user_id)
    if version == 1:
        return UserSerializerV1(user).data
    return UserSerializerV2(user).data

Key Points to Look For:
- Knows safe vs breaking changes
- Has deprecation strategy
- Understands transition period

Follow-up: How do you communicate breaking changes to API consumers?

Authentication & Security

14.

Session-based vs Token-based auth

Compare session-based and token-based authentication.

Mid

Session-Based:

1. User logs in
2. Server creates session, stores in DB/Redis
3. Server sends session ID in cookie
4. Browser sends cookie with each request
5. Server validates session

Client ←─── Cookie: sessionId=abc123 ───→ Server
                                            │
                                       Session Store

Token-Based (JWT):

1. User logs in
2. Server creates signed token
3. Server sends token to client
4. Client stores token, sends in header
5. Server validates signature

Client ←─── Authorization: Bearer eyJhbG... ───→ Server
                                                   │
                                             Validate signature
                                             (no DB lookup)

Comparison:

Aspect	Session	Token
Storage	Server-side	Client-side
Scalability	Need shared store	Stateless
Revocation	Easy (delete session)	Hard (need blocklist)
Mobile	Cookie issues	Works well
CSRF	Vulnerable	Not vulnerable
Size	Small (session ID)	Larger (JWT payload)

Session Implementation:

# Login
def login(username, password):
    user = authenticate(username, password)
    session_id = generate_session_id()
    redis.setex(f"session:{session_id}", 3600, user.id)
    response.set_cookie('session_id', session_id, httponly=True)

# Middleware
def get_current_user(request):
    session_id = request.cookies.get('session_id')
    user_id = redis.get(f"session:{session_id}")
    return User.get(user_id)

Token Implementation:

# Login
def login(username, password):
    user = authenticate(username, password)
    token = jwt.encode({
        'user_id': user.id,
        'exp': datetime.utcnow() + timedelta(hours=1)
    }, SECRET_KEY)
    return {'token': token}

# Middleware
def get_current_user(request):
    token = request.headers.get('Authorization').split()[1]
    payload = jwt.decode(token, SECRET_KEY)
    return User.get(payload['user_id'])

When to Use:

Session: Traditional web apps, need easy revocation
Token: APIs, mobile apps, microservices, SPAs

Key Points to Look For:
- Knows trade-offs
- Understands stateless nature of tokens
- Considers revocation challenges

Follow-up: How do you handle token refresh?

15.

JWT structure and validation

How does JWT work? What are the security considerations?

Mid

JWT Structure:

header.payload.signature

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4ifQ.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Header (Base64):

{
  "alg": "HS256",
  "typ": "JWT"
}

Payload (Base64):

{
  "sub": "user123",
  "name": "John Doe",
  "role": "admin",
  "iat": 1609459200,
  "exp": 1609462800
}

Signature:

HMACSHA256(
  base64UrlEncode(header) + "." + base64UrlEncode(payload),
  secret
)

Validation Steps:

def validate_jwt(token):
    # 1. Split token
    header, payload, signature = token.split('.')

    # 2. Verify signature
    expected_sig = hmac_sha256(f"{header}.{payload}", SECRET)
    if signature != expected_sig:
        raise InvalidSignature()

    # 3. Decode payload
    claims = base64_decode(payload)

    # 4. Check expiration
    if claims['exp'] < time.time():
        raise TokenExpired()

    # 5. Check issuer/audience (optional)
    if claims['iss'] != EXPECTED_ISSUER:
        raise InvalidIssuer()

    return claims

Security Considerations:

1. Never store sensitive data:

// Bad - JWT is only encoded, not encrypted
{"password": "secret123", "ssn": "123-45-6789"}

// Good
{"user_id": "123", "role": "admin"}

2. Use strong secrets:

# Bad
SECRET = "password123"

# Good
SECRET = os.environ['JWT_SECRET']  # 256+ bits of entropy

3. Always validate expiration:

if claims['exp'] < time.time():
    raise TokenExpired()

4. Use appropriate algorithm:

# Good
jwt.decode(token, secret, algorithms=['HS256'])

# Vulnerable to algorithm confusion
jwt.decode(token, secret, algorithms=['none', 'HS256'])

5. Set reasonable expiration:

# Access token: Short-lived (15 min - 1 hour)
# Refresh token: Longer (days/weeks)

Key Points to Look For:
- Knows JWT structure
- Understands signature purpose
- Security-aware

Follow-up: What's the difference between HS256 and RS256?

16.

OAuth 2.0 flows explained

Explain the main OAuth 2.0 flows and when to use each.

Mid

OAuth 2.0 Roles:
- Resource Owner: User
- Client: Application requesting access
- Authorization Server: Issues tokens
- Resource Server: API with protected resources

1. Authorization Code Flow (Most Secure):

For: Server-side applications

User → Client: "Login with Google"
Client → Auth Server: Redirect to login
User → Auth Server: Logs in, consents
Auth Server → Client: Authorization code
Client → Auth Server: Exchange code for tokens
Auth Server → Client: Access token, refresh token
Client → Resource Server: API calls with token

1. https://auth.server/authorize?
     response_type=code&
     client_id=xxx&
     redirect_uri=https://app.com/callback&
     scope=read&
     state=xyz

2. Callback: https://app.com/callback?code=AUTH_CODE&state=xyz

3. POST https://auth.server/token
   grant_type=authorization_code&
   code=AUTH_CODE&
   client_secret=xxx

2. Authorization Code with PKCE:

For: Mobile/SPA (no client secret)

Same as above, but with code_verifier/code_challenge
Prevents code interception attacks

1. Generate code_verifier (random string)
2. Create code_challenge = SHA256(code_verifier)
3. Send code_challenge in authorize request
4. Send code_verifier in token request
5. Server verifies: SHA256(code_verifier) == stored challenge

3. Client Credentials Flow:

For: Machine-to-machine (no user)

Client → Auth Server: client_id + client_secret
Auth Server → Client: Access token

POST https://auth.server/token
grant_type=client_credentials&
client_id=xxx&
client_secret=xxx

4. Implicit Flow (Deprecated):

For: SPAs (legacy, insecure)
Token returned directly in URL fragment
No refresh tokens
Vulnerable to token leakage

Flow Selection:

Application Type	Recommended Flow
Server-side web app	Authorization Code
SPA	Authorization Code + PKCE
Mobile app	Authorization Code + PKCE
Machine-to-machine	Client Credentials

Key Points to Look For:
- Knows multiple flows
- Recommends PKCE for public clients
- Understands security implications

Follow-up: How do refresh tokens work?

17.

API keys vs OAuth tokens

When would you use API keys vs OAuth tokens?

Junior

API Keys:

GET /api/data
X-API-Key: sk_live_abc123
# or
GET /api/data?api_key=sk_live_abc123

OAuth Tokens:

GET /api/data
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

Comparison:

Aspect	API Keys	OAuth Tokens
Identifies	Application	User + Application
User context	No	Yes
Expiration	Typically long-lived	Short-lived
Scopes	Basic/none	Fine-grained
Revocation	Manual	Token expiry/blocklist
Delegation	No	Yes

Use API Keys When:
- Server-to-server communication
- No user context needed
- Simple authentication sufficient
- Rate limiting by client
- Public APIs

# API key for service account
weather_api.get_forecast(api_key=WEATHER_API_KEY)
stripe.Charge.create(api_key=STRIPE_SECRET_KEY)

Use OAuth When:
- Acting on behalf of users
- Need user consent
- Fine-grained permissions
- Third-party integrations
- User data access

# OAuth for user's data
google_calendar.get_events(access_token=user_token)
github.get_repositories(access_token=user_token)

Best Practices for API Keys:

1. Different keys for different purposes:

sk_live_xxx  # Production
sk_test_xxx  # Testing
pk_xxx       # Public (limited permissions)

2. Key rotation:

# Support multiple active keys
valid_keys = [current_key, previous_key]

3. Secure storage:

# Never in code
# Environment variables or secrets manager
API_KEY = os.environ['API_KEY']

Key Points to Look For:
- Knows when to use each
- Understands user context difference
- Security awareness

Follow-up: How do you securely store API keys?

18.

CORS: why it exists and how to configure

What is CORS and why does it exist?

Mid

CORS (Cross-Origin Resource Sharing):
Browser security mechanism controlling cross-origin requests.

Same-Origin Policy:

Page at: https://app.com
Can request: https://app.com/api ✓
Cannot request: https://api.other.com ✗ (different origin)

Origin = Protocol + Domain + Port:

https://example.com:443
└─┬─┘ └────┬─────┘└┬┘
protocol  domain  port

Why CORS Exists:
Prevents malicious sites from making requests to other sites using your credentials.

// Evil site at evil.com
// Without CORS, this could steal data:
fetch('https://bank.com/api/account', {
  credentials: 'include'  // Sends bank.com cookies
})

How CORS Works:

Simple Requests (GET, POST with simple headers):

Request:
GET /api/data HTTP/1.1
Origin: https://app.com

Response:
Access-Control-Allow-Origin: https://app.com

Preflight Requests (Complex requests):

# Browser sends OPTIONS first
OPTIONS /api/data HTTP/1.1
Origin: https://app.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: Content-Type

# Server responds with allowed methods/headers
Access-Control-Allow-Origin: https://app.com
Access-Control-Allow-Methods: GET, PUT, POST, DELETE
Access-Control-Allow-Headers: Content-Type
Access-Control-Max-Age: 86400

# If allowed, browser sends actual request
PUT /api/data HTTP/1.1

Server Configuration:

# Flask
from flask_cors import CORS

app = Flask(__name__)
CORS(app, origins=['https://app.com', 'https://admin.app.com'])

# Or specific routes
@app.route('/api/data')
@cross_origin(origins=['https://app.com'])
def get_data():
    ...

// Express
const cors = require('cors');

app.use(cors({
  origin: ['https://app.com', 'https://admin.app.com'],
  methods: ['GET', 'POST', 'PUT', 'DELETE'],
  allowedHeaders: ['Content-Type', 'Authorization'],
  credentials: true
}));

Common CORS Headers:

Access-Control-Allow-Origin: https://app.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Allow-Credentials: true
Access-Control-Max-Age: 86400

Key Points to Look For:
- Understands purpose (security)
- Knows preflight requests
- Can configure server

Follow-up: Why is Access-Control-Allow-Origin: * risky?

19.

HTTPS and TLS basics

How does HTTPS/TLS work? Why is it important?

Junior

HTTPS = HTTP + TLS (Transport Layer Security)

What TLS Provides:
1. Encryption: Data can't be read in transit
2. Authentication: Verify server identity
3. Integrity: Data can't be modified

TLS Handshake (Simplified):

Client                          Server
   │                              │
   │─── ClientHello ─────────────→│  (supported ciphers)
   │                              │
   │←── ServerHello ──────────────│  (chosen cipher)
   │←── Certificate ──────────────│  (server's cert)
   │←── ServerHelloDone ──────────│
   │                              │
   │─── ClientKeyExchange ───────→│  (encrypted pre-master)
   │─── ChangeCipherSpec ────────→│
   │─── Finished ────────────────→│
   │                              │
   │←── ChangeCipherSpec ─────────│
   │←── Finished ─────────────────│
   │                              │
   │←── Encrypted Data ──────────→│

Certificate Verification:

1. Server sends certificate
2. Client checks:
   - Valid date range
   - Issued by trusted CA
   - Domain matches
   - Not revoked
3. If valid, proceed

Why HTTPS Matters:

1. Privacy:

HTTP: Passwords, tokens visible to anyone on network
HTTPS: Encrypted, can't be read

2. Integrity:

HTTP: Man-in-middle can modify responses
HTTPS: Tampering detected

3. Authentication:

HTTP: Can't verify server identity
HTTPS: Certificate proves identity

4. SEO & Browser Features:

- Google ranks HTTPS higher
- Modern features require HTTPS (geolocation, camera)
- Browsers show "Not Secure" for HTTP

Implementation:

# Nginx
server {
    listen 443 ssl;
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Redirect HTTP to HTTPS
    if ($scheme = http) {
        return 301 https://$host$request_uri;
    }
}

Key Points to Look For:
- Knows what TLS provides
- Understands certificate purpose
- Recognizes importance

Follow-up: What is certificate pinning?

Advanced Web

20.

GraphQL vs REST trade-offs

When would you choose GraphQL over REST?

Mid

REST:

GET /users/123
GET /users/123/posts
GET /users/123/posts/456/comments
# Multiple round trips

GraphQL:

query {
  user(id: "123") {
    name
    posts {
      title
      comments {
        text
      }
    }
  }
}
# Single request, exact data needed

Comparison:

Aspect	REST	GraphQL
Endpoint	Multiple	Single
Data fetching	Fixed structure	Client specifies
Over-fetching	Common	Avoided
Under-fetching	Common	Avoided
Caching	HTTP caching	Complex
Versioning	URL versioning	Schema evolution
Learning curve	Lower	Higher

Choose GraphQL When:
- Complex, nested data
- Multiple clients (web, mobile) with different needs
- Rapid frontend iteration
- Bandwidth constrained (mobile)

# Mobile gets minimal data
query {
  user(id: "123") { name }
}

# Web gets full data
query {
  user(id: "123") {
    name
    email
    posts { title, createdAt }
  }
}

Choose REST When:
- Simple CRUD operations
- HTTP caching important
- File uploads/downloads
- Team unfamiliar with GraphQL
- Public API with simple needs

GraphQL Challenges:
1. N+1 queries: DataLoader needed
2. Caching: No HTTP caching
3. Security: Rate limiting by field complexity
4. Monitoring: Different tooling needed

Key Points to Look For:
- Knows trade-offs, not just hype
- Understands caching challenge
- Can choose based on requirements

Follow-up: How do you handle the N+1 problem in GraphQL?

21.

WebSockets for real-time communication

When would you use WebSockets? How do they work?

Mid

WebSocket: Full-duplex, persistent connection between client and server.

HTTP vs WebSocket:

HTTP:
Client → Request → Server
Client ← Response ← Server
(Connection closed)

WebSocket:
Client ←→ Persistent connection ←→ Server
(Bidirectional, always open)

Handshake:

# Client request
GET /chat HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

# Server response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

Use Cases:
- Real-time chat
- Live notifications
- Collaborative editing
- Gaming
- Live dashboards
- Stock tickers

Server Implementation:

# Python with websockets
import asyncio
import websockets

connected = set()

async def handler(websocket, path):
    connected.add(websocket)
    try:
        async for message in websocket:
            # Broadcast to all
            for conn in connected:
                await conn.send(f"User: {message}")
    finally:
        connected.remove(websocket)

asyncio.run(websockets.serve(handler, "localhost", 8765))

Client Implementation:

const ws = new WebSocket('wss://example.com/chat');

ws.onopen = () => {
  console.log('Connected');
  ws.send('Hello!');
};

ws.onmessage = (event) => {
  console.log('Received:', event.data);
};

ws.onclose = () => {
  console.log('Disconnected');
  // Implement reconnection
};

Scaling Challenges:

Multiple servers need shared state:
Server A ─────┐
Server B ─────┼──── Redis Pub/Sub ─── Broadcast
Server C ─────┘

When NOT to Use:
- Simple request/response
- Infrequent updates (use polling or SSE)
- One-way server updates (use SSE)

Key Points to Look For:
- Knows handshake process
- Understands use cases
- Considers scaling

Follow-up: How do you handle WebSocket reconnection?

22.

Server-Sent Events (SSE)

What are Server-Sent Events? When would you use them over WebSockets?

Mid

SSE: One-way server-to-client streaming over HTTP.

Client ← Events ← Server
(One-way: server → client only)

Format:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache

event: message
data: Hello, world!

event: update
data: {"price": 150.25}

event: message
data: Line 1
data: Line 2

Client:

const eventSource = new EventSource('/events');

eventSource.onmessage = (event) => {
  console.log(event.data);
};

eventSource.addEventListener('update', (event) => {
  const data = JSON.parse(event.data);
  console.log('Price:', data.price);
});

eventSource.onerror = () => {
  console.log('Connection lost, reconnecting...');
  // Auto-reconnects!
};

Server:

from flask import Response

@app.route('/events')
def events():
    def generate():
        while True:
            data = get_latest_data()
            yield f"event: update\ndata: {json.dumps(data)}\n\n"
            time.sleep(1)
    return Response(generate(), mimetype='text/event-stream')

SSE vs WebSocket:

Aspect	SSE	WebSocket
Direction	Server → Client	Bidirectional
Protocol	HTTP	WebSocket
Auto-reconnect	Built-in	Manual
Browser support	Good	Excellent
Complexity	Simple	More complex
Binary data	No	Yes

Use SSE When:
- Server-to-client updates only
- News feeds, notifications
- Live scores, stock prices
- Progress updates

Use WebSocket When:
- Bidirectional needed (chat)
- Binary data needed
- Low latency critical

Key Points to Look For:
- Knows difference from WebSocket
- Understands use cases
- Can implement basic server

Follow-up: How do you handle SSE with load balancers?

23.

HTTP/2 and HTTP/3 improvements

What improvements do HTTP/2 and HTTP/3 bring?

Senior

HTTP/1.1 Problems:
- Head-of-line blocking
- Multiple connections needed
- Text headers (large, no compression)
- No server push

HTTP/2 Improvements:

1. Multiplexing:

HTTP/1.1:
Request 1 ─────────────────────→ Response 1
       Request 2 ────────────────────→ Response 2
              Request 3 ─────────────────→ Response 3

HTTP/2:
Request 1 ─┐
Request 2 ──┼──→ Single connection ──→ Responses interleaved
Request 3 ─┘

2. Header Compression (HPACK):

HTTP/1.1: Same headers sent repeatedly
HTTP/2: Headers compressed, only differences sent

3. Server Push:

Client requests index.html
Server pushes: index.html + style.css + app.js
(Without waiting for additional requests)

4. Binary Protocol:

HTTP/1.1: Text-based parsing
HTTP/2: Binary framing, efficient parsing

5. Stream Prioritization:

CSS: High priority
Images: Lower priority

HTTP/3 Improvements (QUIC):

1. UDP-based:

HTTP/2: TCP (connection setup overhead)
HTTP/3: QUIC over UDP (faster connection)

2. No Head-of-Line Blocking:

HTTP/2: Lost packet blocks all streams (TCP)
HTTP/3: Lost packet blocks only affected stream (QUIC)

3. Connection Migration:

HTTP/2: Connection lost on network change (WiFi → 4G)
HTTP/3: Connection persists across network changes

4. Built-in Encryption:

HTTP/3: TLS 1.3 required, integrated into handshake

Summary:

Feature	HTTP/1.1	HTTP/2	HTTP/3
Multiplexing	No	Yes	Yes
Header compression	No	HPACK	QPACK
Transport	TCP	TCP	QUIC/UDP
HOL blocking	Yes	TCP level	No
Connection setup	Slow	Medium	Fast

Key Points to Look For:
- Knows multiplexing benefit
- Understands QUIC advantages
- Can explain HOL blocking

Follow-up: Does HTTP/2 make domain sharding obsolete?

24.

gRPC for inter-service communication

What is gRPC? When would you use it over REST?

Senior

gRPC: High-performance RPC framework using Protocol Buffers.

Protocol Buffers (Schema):

syntax = "proto3";

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (stream User);
  rpc CreateUser(User) returns (User);
}

message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
}

message GetUserRequest {
  int32 id = 1;
}

Generated Code:

# Server
class UserServicer(user_pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        user = db.get_user(request.id)
        return user_pb2.User(id=user.id, name=user.name)

# Client
channel = grpc.insecure_channel('localhost:50051')
stub = user_pb2_grpc.UserServiceStub(channel)
user = stub.GetUser(user_pb2.GetUserRequest(id=123))

gRPC Features:

1. Streaming:

// Server streaming
rpc ListUsers(Query) returns (stream User);

// Client streaming
rpc UploadUsers(stream User) returns (Summary);

// Bidirectional
rpc Chat(stream Message) returns (stream Message);

2. Built-in Features:
- Deadlines/timeouts
- Cancellation
- Load balancing
- Authentication

gRPC vs REST:

Aspect	gRPC	REST
Protocol	HTTP/2	HTTP/1.1 or 2
Payload	Binary (Protobuf)	Text (JSON)
Schema	Required (.proto)	Optional (OpenAPI)
Code generation	Built-in	External tools
Streaming	Native	SSE/WebSocket
Browser support	Limited	Full
Performance	Higher	Lower

Use gRPC When:
- Microservices communication
- High performance needed
- Streaming required
- Polyglot environment (code gen)
- Internal services

Use REST When:
- Public APIs
- Browser clients
- Simple CRUD
- Wide compatibility needed
- Human-readable debugging

Key Points to Look For:
- Knows Protocol Buffers
- Understands streaming types
- Can compare to REST

Follow-up: How do you handle gRPC errors?

25.

API Gateway pattern

What is the API Gateway pattern? What problems does it solve?

Mid

API Gateway: Single entry point for all client requests.

Without Gateway:
Client ─→ Service A
Client ─→ Service B
Client ─→ Service C

With Gateway:
Client ─→ API Gateway ─→ Service A
                      ─→ Service B
                      ─→ Service C

Responsibilities:

1. Request Routing:

/users/* → User Service
/orders/* → Order Service
/products/* → Product Service

2. Authentication/Authorization:

Client → Gateway (validates token) → Services
Services don't need to implement auth

3. Rate Limiting:

100 requests/minute per user
Enforced at gateway

4. Request/Response Transformation:

// Client gets unified format
{
  "user": {...},
  "orders": [...]
}
// Gateway aggregates from multiple services

5. Caching:

Frequently accessed data cached at gateway
Reduces load on services

6. Load Balancing:

Gateway distributes requests across service instances

7. Circuit Breaking:

Gateway fails fast when service is down

Implementation:

# Kong example
services:
  - name: user-service
    url: http://users:8080

routes:
  - name: users-route
    paths: ["/users"]
    service: user-service

plugins:
  - name: rate-limiting
    config:
      minute: 100
  - name: jwt

Patterns:

BFF (Backend for Frontend):

Mobile Client ─→ Mobile Gateway ─→ Services
Web Client ────→ Web Gateway ───→ Services

Popular Gateways:
- Kong
- AWS API Gateway
- NGINX
- Envoy
- Zuul

Trade-offs:
- Single point of failure (need HA)
- Added latency (extra hop)
- Complexity
- Potential bottleneck

Key Points to Look For:
- Knows multiple responsibilities
- Understands trade-offs
- Mentions BFF pattern

Follow-up: How do you ensure API Gateway doesn't become a bottleneck?

Microservices Communication

26.

Service mesh: What problem does it solve?

What is a service mesh and what problems does it solve in microservices architectures?

Senior

Service Mesh: Infrastructure layer that handles service-to-service communication.

Without Service Mesh:

┌─────────────┐         ┌─────────────┐
│  Service A  │────────→│  Service B  │
│  (with all  │         │  (with all  │
│  networking │         │  networking │
│  logic)     │         │  logic)     │
└─────────────┘         └─────────────┘

Each service implements:
- Load balancing
- Retry logic
- Circuit breaker
- mTLS
- Metrics/tracing

With Service Mesh:

┌─────────────────────────────────────────┐
│                Service Mesh              │
│  ┌───────┐ Proxy ──────→ Proxy ┌───────┐│
│  │Svc A  │←────→          ←────→│Svc B  ││
│  └───────┘ Sidecar    Sidecar  └───────┘│
│                                          │
│  Control Plane                           │
│  (config, certs, policies)              │
└─────────────────────────────────────────┘

Services focus on business logic
Proxy handles networking concerns

Problems Solved:

1. Observability:

# All traffic flows through proxies
# Automatic collection of:
- Request rates
- Error rates
- Latencies (p50, p99)
- Distributed traces
- Service dependency graphs

2. Security:

# mTLS between all services (automatic)
# Policy: Only Service A can call Service B
authorization:
  - from: service-a
    to: service-b
    methods: [GET, POST]

3. Traffic Management:

# Canary deployment
route:
  - destination: service-v1
    weight: 90
  - destination: service-v2
    weight: 10

# Retry and timeout policies
retries:
  attempts: 3
  retryOn: 5xx
timeout: 5s

4. Resilience:

# Circuit breaker
outlierDetection:
  consecutiveErrors: 5
  interval: 30s
  baseEjectionTime: 30s

# Rate limiting per service

Architecture Components:

Data Plane (Sidecars):
- Envoy proxies alongside each service
- Intercepts all network traffic
- Enforces policies

Control Plane:
- Configuration management
- Certificate authority
- Service discovery
- Policy distribution

Popular Service Meshes:
- Istio (most feature-rich)
- Linkerd (simpler, lighter)
- Consul Connect
- AWS App Mesh

When to Use:
- Many microservices (>10-20)
- Need consistent observability
- Security requirements (mTLS)
- Complex traffic routing needs
- Polyglot services

When NOT to Use:
- Few services
- Simple architecture
- Team not ready for complexity
- Performance-critical (adds latency ~1-2ms)

Key Points to Look For:
- Understands sidecar pattern
- Knows problems it solves
- Aware of trade-offs (complexity, latency)

Follow-up: How does a service mesh differ from an API gateway?

27.

RabbitMQ vs Kafka: When to use each?

When would you choose RabbitMQ over Kafka, and vice versa?

Mid

RabbitMQ: Traditional message broker with smart broker/dumb consumer model.
Kafka: Distributed event streaming platform with dumb broker/smart consumer model.

RabbitMQ:

Producer → Exchange → Queue → Consumer
                ↘ Queue → Consumer

- Messages deleted after consumption
- Broker tracks what's consumed
- Push model to consumers

Kafka:

Producer → Topic [Partition 0] → Consumer Group
                [Partition 1] → Consumer Group
                [Partition 2] → Consumer Group

- Messages retained (configurable time)
- Consumers track their offset
- Pull model from consumers

Comparison:

Aspect	RabbitMQ	Kafka
Model	Message queue	Event log
Message retention	Until consumed	Time/size based
Ordering	Per queue	Per partition
Throughput	~10K/sec	~1M/sec
Latency	Lower (~1ms)	Higher (~5-10ms)
Replay	No	Yes
Routing	Flexible (exchanges)	Topic-based
Consumer model	Push	Pull

Choose RabbitMQ When:

1. Task Distribution:

# Work queue - distribute tasks among workers
# Each task processed by ONE worker

channel.basic_publish(
    exchange='',
    routing_key='task_queue',
    body=task_data,
    properties=pika.BasicProperties(delivery_mode=2)
)

2. Complex Routing:

# Route based on patterns
# Fanout, Direct, Topic, Headers exchanges

channel.exchange_declare(exchange='logs', exchange_type='topic')
channel.basic_publish(
    exchange='logs',
    routing_key='error.payment.us',  # Pattern matching
    body=message
)

3. Request/Reply Pattern:

# RPC over messaging
result = client.call(request)  # Built-in correlation

4. Priority Queues:

# Process urgent messages first
channel.queue_declare(queue='tasks', arguments={'x-max-priority': 10})

Choose Kafka When:

1. Event Sourcing:

# Replay events to rebuild state
consumer.seek(partition, offset=0)  # Start from beginning
for event in consumer:
    apply_event(event)

2. High Throughput:

# Millions of messages per second
# Log aggregation, metrics collection
producer.send('metrics', value=metric_data)

3. Multiple Consumers:

# Same event consumed by multiple services
# Each consumer group gets all messages

# Analytics service reads events
# Notification service reads same events
# Audit service reads same events

4. Stream Processing:

# Real-time data pipelines
# Kafka Streams / KSQL

stream.filter(lambda x: x['amount'] > 1000)
      .map(lambda x: enrich(x))
      .to('high_value_transactions')

Summary:

RabbitMQ: "Smart broker"
- Traditional messaging
- Complex routing
- Lower latency
- Task queues

Kafka: "Dumb broker"
- Event streaming
- High throughput
- Event replay
- Multiple consumers

Key Points to Look For:
- Knows fundamental difference (queue vs log)
- Can recommend based on use case
- Understands throughput/latency trade-offs

Follow-up: How would you handle exactly-once delivery in Kafka?

28.

API composition patterns in microservices

How do you handle API composition when data is spread across multiple microservices?

Mid

Problem: Client needs data from multiple services.

Client needs: Order + Customer + Product details
Data lives in: Order Service, Customer Service, Product Service

Patterns:

1. API Gateway Composition:

Client → API Gateway → Order Service
                    → Customer Service
                    → Product Service
        ↓
     Aggregated Response

# Gateway aggregates responses
async def get_order_details(order_id):
    order = await order_service.get(order_id)
    customer = await customer_service.get(order.customer_id)
    products = await asyncio.gather(*[
        product_service.get(pid) for pid in order.product_ids
    ])

    return {
        "order": order,
        "customer": customer,
        "products": products
    }

Pros: Simple client, centralized logic
Cons: Gateway becomes complex, single point of failure

2. Backend for Frontend (BFF):

Mobile App → Mobile BFF → Services
Web App   → Web BFF    → Services

// Mobile BFF - minimal data
app.get('/orders/:id', async (req, res) => {
  const order = await getOrder(req.params.id);
  return {
    id: order.id,
    total: order.total,
    status: order.status
  };
});

// Web BFF - full details
app.get('/orders/:id', async (req, res) => {
  const [order, customer, products] = await Promise.all([...]);
  return {
    ...order,
    customer: { name: customer.name, email: customer.email },
    products: products.map(p => ({ name: p.name, price: p.price }))
  };
});

Pros: Optimized per client, clear ownership
Cons: Code duplication, multiple codebases

3. GraphQL Federation:

# Each service defines its types
# Gateway federates them

# Order Service
type Order @key(fields: "id") {
  id: ID!
  total: Float!
  customer: Customer!  # Reference
}

# Customer Service
type Customer @key(fields: "id") {
  id: ID!
  name: String!
  orders: [Order!]!  # Extension
}

# Client query - single request
query {
  order(id: "123") {
    total
    customer {
      name
      email
    }
    products {
      name
      price
    }
  }
}

Pros: Flexible queries, type safety
Cons: Complexity, learning curve

4. Client-Side Composition:

// Client makes multiple calls
async function getOrderPage(orderId) {
  const order = await fetch(`/orders/${orderId}`);
  const customer = await fetch(`/customers/${order.customerId}`);
  const products = await Promise.all(
    order.productIds.map(id => fetch(`/products/${id}`))
  );
  return { order, customer, products };
}

Pros: Simple services, flexibility
Cons: Multiple round trips, client complexity, over-fetching

5. Materialized View / CQRS:

Services → Events → Read Service → Denormalized View

Order Created → Update order-details view
Customer Updated → Update order-details view
Product Changed → Update order-details view

Client → Read Service (single call)

Pros: Fast reads, simple queries
Cons: Eventual consistency, event handling complexity

Choosing a Pattern:

Pattern	Best For
API Gateway	Simple aggregation, few services
BFF	Multiple client types, different needs
GraphQL	Flexible queries, rapid frontend iteration
Client-side	Simple apps, few compositions
CQRS/Materialized	High read volume, complex aggregations

Key Points to Look For:
- Knows multiple patterns
- Can recommend based on context
- Considers trade-offs

Follow-up: How do you handle failures in one of the composed services?

29.

Webhook design and reliability

How do you design a reliable webhook system?

Mid

Webhook: Server-to-server HTTP callback when events occur.

Your Service → Event → HTTP POST → Customer's Endpoint

Example: Stripe sends payment.succeeded to your /webhooks/stripe

Design Considerations:

1. Payload Design:

{
  "id": "evt_123abc",
  "type": "order.completed",
  "created": "2024-01-15T10:30:00Z",
  "data": {
    "order_id": "ord_456",
    "total": 99.99,
    "currency": "USD"
  },
  "api_version": "2024-01-01"
}

Include:
- Unique event ID (for deduplication)
- Event type
- Timestamp
- API version
- Minimal data (or link to fetch full data)

2. Security:

# Signature verification
import hmac
import hashlib

def verify_signature(payload, signature, secret):
    expected = hmac.new(
        secret.encode(),
        payload.encode(),
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

# Webhook handler
@app.post('/webhooks')
def handle_webhook(request):
    signature = request.headers.get('X-Webhook-Signature')
    if not verify_signature(request.body, signature, WEBHOOK_SECRET):
        return Response(status=401)

    # Process event...

Headers to send:

X-Webhook-Signature: sha256=abc123...
X-Webhook-ID: evt_123abc
X-Webhook-Timestamp: 1705312200
Content-Type: application/json

3. Retry Strategy:

# Exponential backoff with jitter
retry_delays = [60, 300, 1800, 7200, 43200]  # 1m, 5m, 30m, 2h, 12h

def send_webhook(event, attempt=0):
    try:
        response = requests.post(
            event.endpoint_url,
            json=event.payload,
            headers=get_headers(event),
            timeout=30
        )
        if response.status_code >= 200 and response.status_code < 300:
            mark_delivered(event)
        elif response.status_code >= 500:
            schedule_retry(event, attempt)
        else:
            # 4xx - don't retry (client error)
            mark_failed(event)
    except requests.Timeout:
        schedule_retry(event, attempt)

def schedule_retry(event, attempt):
    if attempt < len(retry_delays):
        delay = retry_delays[attempt] + random.randint(0, 60)
        queue.enqueue_in(delay, send_webhook, event, attempt + 1)
    else:
        mark_failed(event)
        notify_customer(event)

4. Idempotency (Receiver Side):

# Receiver must handle duplicates
@app.post('/webhooks/provider')
def handle_webhook(request):
    event_id = request.json['id']

    # Check if already processed
    if redis.exists(f"webhook:{event_id}"):
        return Response(status=200)  # Acknowledge but don't reprocess

    # Process the event
    process_event(request.json)

    # Mark as processed
    redis.setex(f"webhook:{event_id}", 86400, "processed")

    return Response(status=200)

5. Delivery Status Dashboard:

CREATE TABLE webhook_deliveries (
    id SERIAL PRIMARY KEY,
    event_id VARCHAR(255),
    endpoint_url TEXT,
    status VARCHAR(20),  -- pending, delivered, failed
    attempts INT DEFAULT 0,
    last_attempt_at TIMESTAMP,
    response_code INT,
    response_body TEXT,
    created_at TIMESTAMP
);

-- Customer can see delivery status
-- Manual retry option for failed webhooks

6. Event Types & Filtering:

# Let customers subscribe to specific events
webhook_config = {
    "url": "https://customer.com/webhooks",
    "events": ["order.completed", "order.refunded"],
    "secret": "whsec_..."
}

Best Practices:

Practice	Description
Quick response	Receiver should respond fast (<5s), process async
Verify signatures	Always validate webhook authenticity
Idempotent handling	Same event twice = same result
Log everything	Both sender and receiver
Provide replay	Let customers re-request missed events
Version your webhooks	Include API version in payload

Key Points to Look For:
- Security (signatures)
- Retry with backoff
- Idempotency
- Monitoring/dashboard

Follow-up: How do you handle a customer endpoint that's been down for hours?