APIs & Web
REST, GraphQL, authentication, and web protocols
REST constraints: stateless, uniform interface, etc.
What are the REST architectural constraints?
REST (Representational State Transfer) has 6 constraints:
1. Client-Server:
Separation of concerns between UI and data storage.
Client (UI) ←→ Server (Data/Logic)
Independent evolution
2. Stateless:
Each request contains all information needed. Server stores no client context.
# Bad: Server remembers previous request
GET /next-page
# Good: Client sends all context
GET /users?page=2&limit=10
Authorization: Bearer token123
3. Cacheable:
Responses must define if they're cacheable.
Cache-Control: public, max-age=3600
ETag: "abc123"
4. Uniform Interface:
Standardized way to interact with resources.
- Resource identification (URIs)
- Resource manipulation through representations
- Self-descriptive messages
- HATEOAS (Hypermedia)
5. Layered System:
Client can't tell if connected directly to server.
Client → Load Balancer → API Gateway → Server → Database
6. Code on Demand (Optional):
Server can send executable code (JavaScript).
Uniform Interface Sub-constraints:
Resource Identification:
/users/123 # Specific user
/users/123/orders # User's orders
Resource Manipulation:
GET /users/123 # Get representation
PUT /users/123 # Replace with representation
DELETE /users/123 # Delete resource
Self-Descriptive:
Content-Type: application/json
Accept: application/json
HATEOAS:
{
"id": 123,
"name": "Alice",
"_links": {
"self": "/users/123",
"orders": "/users/123/orders"
}
}
Key Points to Look For:
- Knows all 6 constraints
- Understands statelessness
- Can explain uniform interface
Follow-up: Why is statelessness important for scalability?
HTTP methods: GET, POST, PUT, PATCH, DELETE semantics
Explain the semantics of HTTP methods and when to use each.
HTTP Methods:
| Method | Purpose | Idempotent | Safe | Body |
|---|---|---|---|---|
| GET | Read resource | Yes | Yes | No |
| POST | Create resource | No | No | Yes |
| PUT | Replace resource | Yes | No | Yes |
| PATCH | Partial update | No* | No | Yes |
| DELETE | Remove resource | Yes | No | Optional |
*PATCH can be idempotent depending on implementation
GET - Read:
GET /users/123
Response: 200 OK with user data
# Should NEVER modify data
# Cacheable
POST - Create:
POST /users
Body: {"name": "Alice", "email": "alice@example.com"}
Response: 201 Created
Location: /users/456
PUT - Replace:
PUT /users/123
Body: {"name": "Alice Updated", "email": "new@example.com"}
Response: 200 OK or 204 No Content
# Replaces ENTIRE resource
# Omitted fields become null/default
PATCH - Partial Update:
PATCH /users/123
Body: {"email": "new@example.com"}
Response: 200 OK
# Updates ONLY specified fields
# Other fields unchanged
DELETE - Remove:
DELETE /users/123
Response: 204 No Content or 200 OK
# Idempotent: Deleting twice = same result
Common Mistakes:
# Wrong: Using GET to delete
GET /users/123/delete
# Wrong: Using POST for everything
POST /users/123/update
# Wrong: PUT for partial update
PUT /users/123 {"email": "new@example.com"}
# This would null out name!
Key Points to Look For:
- Knows idempotent vs safe
- Understands PUT vs PATCH
- Correct method for operation
Follow-up: What's the difference between safe and idempotent?
HTTP status codes: when to use which
What are the common HTTP status codes and when should you use them?
Status Code Categories:
- 1xx: Informational
- 2xx: Success
- 3xx: Redirection
- 4xx: Client Error
- 5xx: Server Error
Success (2xx):
200 OK # Generic success, response body included
201 Created # Resource created (POST)
204 No Content # Success, no response body (DELETE)
202 Accepted # Async processing started
Redirection (3xx):
301 Moved Permanently # URL changed permanently, update bookmarks
302 Found # Temporary redirect
304 Not Modified # Cached version is fresh
Client Error (4xx):
400 Bad Request # Malformed request, validation failed
401 Unauthorized # Not authenticated (needs login)
403 Forbidden # Authenticated but not authorized
404 Not Found # Resource doesn't exist
405 Method Not Allowed # Wrong HTTP method
409 Conflict # State conflict (duplicate, version mismatch)
422 Unprocessable # Validation error (semantic)
429 Too Many Requests # Rate limited
Server Error (5xx):
500 Internal Error # Generic server error
502 Bad Gateway # Upstream server error
503 Service Unavailable# Temporarily down
504 Gateway Timeout # Upstream timeout
Common Scenarios:
| Scenario | Status Code |
|---|---|
| Get user successfully | 200 |
| Create user | 201 |
| Delete user | 204 |
| User not found | 404 |
| Invalid email format | 400 |
| Email already exists | 409 |
| Wrong password | 401 |
| No permission for resource | 403 |
| Server crashed | 500 |
| Rate limit exceeded | 429 |
Common Mistakes:
# Wrong: 200 for everything
POST /users → 200 OK (should be 201)
# Wrong: 200 for errors
GET /users/999 → 200 {"error": "Not found"} (should be 404)
# Wrong: 500 for validation
POST /users {"email": "invalid"} → 500 (should be 400)
Key Points to Look For:
- Knows common codes
- Uses appropriate code per scenario
- Distinguishes 400 vs 401 vs 403
Follow-up: When would you use 409 Conflict?
REST vs RPC style APIs
What's the difference between REST and RPC style APIs?
REST (Resource-Oriented):
Focus on resources (nouns).
GET /users/123 # Get user
POST /users # Create user
PUT /users/123 # Update user
DELETE /users/123 # Delete user
GET /users/123/orders # Get user's orders
RPC (Action-Oriented):
Focus on actions (verbs).
POST /getUser {"userId": 123}
POST /createUser {"name": "Alice"}
POST /updateUser {"userId": 123, "name": "Bob"}
POST /deleteUser {"userId": 123}
POST /getUserOrders {"userId": 123}
Comparison:
| Aspect | REST | RPC |
|---|---|---|
| Focus | Resources | Actions |
| URLs | Nouns | Verbs |
| HTTP methods | Semantic meaning | Usually POST |
| Caching | Built-in (GET) | Manual |
| Discoverability | HATEOAS | Documentation |
| Flexibility | Standardized | Custom |
When RPC Makes Sense:
1. Actions that aren't CRUD:
# REST (awkward)
POST /emails/123/send-action
# RPC (natural)
POST /sendEmail {"emailId": 123}
2. Complex operations:
# REST
POST /transfers
{"from": "account1", "to": "account2", "amount": 100}
# RPC
POST /transferMoney
{"fromAccount": "account1", "toAccount": "account2", "amount": 100}
3. Batch operations:
# RPC
POST /batchCreateUsers
{"users": [{...}, {...}, {...}]}
Modern Hybrid Approach:
# RESTful resources
GET /users/123
POST /users
# RPC for actions
POST /users/123/activate
POST /users/123/reset-password
POST /orders/456/cancel
gRPC (Modern RPC):
service UserService {
rpc GetUser(GetUserRequest) returns (User);
rpc CreateUser(CreateUserRequest) returns (User);
}
Key Points to Look For:
- Knows fundamental difference
- Recognizes when each fits
- Understands hybrid approach
Follow-up: When would you choose gRPC over REST?
HATEOAS explained
What is HATEOAS and why is it important?
HATEOAS (Hypermedia as the Engine of Application State):
Responses include links to related resources and available actions.
Without HATEOAS:
{
"id": 123,
"name": "Alice",
"status": "active"
}
Client must know all URLs and valid operations.
With HATEOAS:
{
"id": 123,
"name": "Alice",
"status": "active",
"_links": {
"self": {"href": "/users/123"},
"orders": {"href": "/users/123/orders"},
"deactivate": {"href": "/users/123/deactivate", "method": "POST"}
}
}
Benefits:
1. Discoverability:
// API root
GET /api
{
"_links": {
"users": {"href": "/api/users"},
"products": {"href": "/api/products"},
"orders": {"href": "/api/orders"}
}
}
2. Decoupled Clients:
// Client follows links, not hardcoded URLs
const user = await fetch('/api/users/123');
const orders = await fetch(user._links.orders.href);
3. State-Aware Actions:
// Pending order
{
"id": 456,
"status": "pending",
"_links": {
"cancel": {"href": "/orders/456/cancel"},
"pay": {"href": "/orders/456/pay"}
}
}
// Shipped order - different available actions
{
"id": 456,
"status": "shipped",
"_links": {
"track": {"href": "/orders/456/tracking"},
"return": {"href": "/orders/456/return"}
}
}
Common Formats:
HAL (Hypertext Application Language):
{
"_links": {
"self": {"href": "/orders/123"}
},
"_embedded": {
"items": [...]
}
}
JSON:API:
{
"data": {
"type": "users",
"id": "123",
"links": {"self": "/users/123"}
}
}
Reality Check:
- Most APIs don't fully implement HATEOAS
- Client developers often prefer documentation
- Can add complexity
Key Points to Look For:
- Understands the concept
- Knows benefits and trade-offs
- Can implement basic example
Follow-up: Why don't most APIs implement full HATEOAS?
API versioning strategies
What are different API versioning strategies? What are the trade-offs?
Versioning Strategies:
1. URL Path Versioning:
GET /v1/users/123
GET /v2/users/123
Pros: Clear, easy caching, easy routing
Cons: Not RESTful (resource = user, not v1/user)
2. Query Parameter:
GET /users/123?version=1
GET /users/123?version=2
Pros: Optional parameter, same resource
Cons: Can be forgotten, caching complex
3. Header Versioning:
GET /users/123
Accept: application/vnd.myapi.v1+json
# or
X-API-Version: 1
Pros: Clean URLs, RESTful
Cons: Hidden, harder to test in browser
4. Content Negotiation:
GET /users/123
Accept: application/vnd.company.user-v2+json
Pros: True REST, per-resource versioning
Cons: Complex, client must specify
Comparison:
| Strategy | Visibility | Cacheable | RESTful |
|---|---|---|---|
| URL path | High | Easy | No |
| Query param | Medium | Complex | Somewhat |
| Header | Low | Medium | Yes |
| Content-type | Low | Medium | Yes |
Best Practices:
1. Start with versioning:
# Even v1 makes it clear API is versioned
/v1/users
2. Semantic versioning for major changes:
v1 → v2: Breaking changes
v1.1: New features (backward compatible)
3. Deprecation policy:
# Response headers
Deprecation: true
Sunset: Sat, 31 Dec 2024 23:59:59 GMT
Link: </v2/users>; rel="successor-version"
4. Support multiple versions:
# Version routing
@app.route('/v1/users/<id>')
def get_user_v1(id):
return UserSerializerV1(user).data
@app.route('/v2/users/<id>')
def get_user_v2(id):
return UserSerializerV2(user).data # Different format
Key Points to Look For:
- Knows multiple strategies
- Understands trade-offs
- Has deprecation strategy
Follow-up: How do you handle breaking changes in APIs?
Pagination: offset vs cursor-based
Compare offset-based and cursor-based pagination.
Offset-Based (Traditional):
GET /users?page=5&limit=20
# or
GET /users?offset=80&limit=20
SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 80;
Problems:
1. Performance: Large offsets scan many rows
2. Inconsistency: New records shift pages
Page 1: [A, B, C, D, E]
# New item X inserted
Page 2: [E, F, G, H, I] # E duplicated!
Cursor-Based (Keyset):
GET /users?limit=20
Response:
{
"data": [...],
"next_cursor": "eyJpZCI6MTIzfQ=="
}
GET /users?limit=20&cursor=eyJpZCI6MTIzfQ==
-- Cursor contains: id=123, created_at=2024-01-15
SELECT * FROM users
WHERE (created_at, id) < ('2024-01-15', 123)
ORDER BY created_at DESC, id DESC
LIMIT 20;
Comparison:
| Aspect | Offset | Cursor |
|---|---|---|
| Jump to page | Yes | No |
| Performance | O(offset) | O(1) |
| Consistency | Poor | Good |
| Complexity | Simple | Complex |
| Bidirectional | Easy | Harder |
Implementation:
# Offset-based
def get_users_offset(page, limit):
offset = (page - 1) * limit
users = db.query(f"SELECT * FROM users LIMIT {limit} OFFSET {offset}")
return {
"data": users,
"page": page,
"total_pages": total_count // limit
}
# Cursor-based
def get_users_cursor(cursor, limit):
if cursor:
decoded = base64.decode(cursor)
last_id, last_created = decoded.split(',')
users = db.query(f"""
SELECT * FROM users
WHERE (created_at, id) < ('{last_created}', {last_id})
ORDER BY created_at DESC, id DESC
LIMIT {limit}
""")
else:
users = db.query(f"SELECT * FROM users ORDER BY created_at DESC, id DESC LIMIT {limit}")
next_cursor = None
if len(users) == limit:
last = users[-1]
next_cursor = base64.encode(f"{last.id},{last.created_at}")
return {"data": users, "next_cursor": next_cursor}
When to Use:
Offset: Small datasets, need page jumping, simple implementation
Cursor: Large datasets, real-time data, infinite scroll
Key Points to Look For:
- Knows performance difference
- Understands consistency issue
- Can implement both
Follow-up: How do you implement cursor-based pagination for complex sorting?
API Design
Designing intuitive API endpoints
What are best practices for designing intuitive API endpoints?
Naming Conventions:
1. Use Nouns, Not Verbs:
# Good
GET /users
POST /users
GET /users/123
# Bad
GET /getUsers
POST /createUser
GET /getUserById
2. Use Plural Nouns:
# Good
GET /users
GET /users/123
# Inconsistent
GET /user
GET /user/123
3. Hierarchical Resources:
# User's orders
GET /users/123/orders
# Specific order
GET /users/123/orders/456
# Order's items
GET /users/123/orders/456/items
4. Keep It Flat When Possible:
# If orders have unique IDs
GET /orders/456
# Instead of
GET /users/123/orders/456
5. Use Query Parameters for Filtering:
GET /users?status=active&role=admin
GET /orders?from=2024-01-01&to=2024-12-31
GET /products?category=electronics&sort=price&order=asc
6. Actions as Sub-resources:
# Non-CRUD actions
POST /users/123/activate
POST /orders/456/cancel
POST /accounts/789/transfer
7. Consistent Naming:
# Pick a style and stick with it
/user-profiles # kebab-case
/user_profiles # snake_case (less common)
/userProfiles # camelCase (less common)
Good Examples:
GET /articles # List articles
GET /articles/123 # Get article
POST /articles # Create article
PUT /articles/123 # Update article
DELETE /articles/123 # Delete article
GET /articles/123/comments # Article's comments
POST /articles/123/comments # Add comment
GET /articles?author=john&tag=tech # Filter
Key Points to Look For:
- Nouns over verbs
- Consistent pluralization
- Logical hierarchy
- Proper HTTP methods
Follow-up: How do you handle actions that don't fit CRUD?
Request/Response body best practices
What are best practices for API request and response bodies?
Request Body:
1. Use camelCase consistently:
{
"firstName": "Alice",
"lastName": "Smith",
"emailAddress": "alice@example.com"
}
2. Don't include what's in URL:
# Bad - ID in both
PUT /users/123
{"id": 123, "name": "Alice"}
# Good
PUT /users/123
{"name": "Alice"}
3. Accept partial updates for PATCH:
PATCH /users/123
{"email": "new@example.com"}
# Only email changes
Response Body:
1. Envelope pattern (optional):
{
"data": {...},
"meta": {
"page": 1,
"total": 100
}
}
2. Include resource ID:
{
"id": 123,
"name": "Alice",
"createdAt": "2024-01-15T10:30:00Z"
}
3. Use ISO 8601 for dates:
{
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-16T14:45:30Z"
}
4. Consistent null handling:
// Option 1: Include null
{"middleName": null}
// Option 2: Omit null fields
{}
// Pick one and be consistent
5. Collection responses:
{
"data": [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
],
"pagination": {
"page": 1,
"perPage": 20,
"total": 150,
"totalPages": 8
}
}
6. Avoid deep nesting:
// Too deep
{
"user": {
"profile": {
"settings": {
"notifications": {
"email": true
}
}
}
}
}
// Better: Flatten or separate endpoint
{
"notificationSettings": {
"email": true
}
}
Key Points to Look For:
- Consistent naming convention
- Proper date formatting
- Reasonable structure
Follow-up: How do you handle field expansion/sparse fieldsets?
Error handling and error responses
How should API error responses be designed?
Error Response Structure:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "The request body contains invalid data",
"details": [
{
"field": "email",
"message": "Must be a valid email address",
"code": "INVALID_FORMAT"
},
{
"field": "age",
"message": "Must be at least 18",
"code": "MIN_VALUE"
}
],
"requestId": "req_abc123",
"documentationUrl": "https://api.example.com/docs/errors#VALIDATION_ERROR"
}
}
Components:
1. Machine-readable code:
"code": "RESOURCE_NOT_FOUND"
// Not just HTTP status code
// Client can handle programmatically
2. Human-readable message:
"message": "User with ID 123 was not found"
// Safe to display to end users (sometimes)
3. Field-level errors:
"details": [
{"field": "email", "message": "Invalid format"}
]
// For form validation
4. Request tracking:
"requestId": "req_abc123"
// For debugging/support
Error Examples:
400 Validation Error:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid request parameters",
"details": [
{"field": "email", "message": "Required field missing"}
]
}
}
401 Authentication Error:
{
"error": {
"code": "AUTHENTICATION_REQUIRED",
"message": "Valid authentication token required"
}
}
403 Authorization Error:
{
"error": {
"code": "INSUFFICIENT_PERMISSIONS",
"message": "You don't have permission to access this resource"
}
}
404 Not Found:
{
"error": {
"code": "RESOURCE_NOT_FOUND",
"message": "User not found",
"details": {
"resourceType": "User",
"resourceId": "123"
}
}
}
500 Internal Error:
{
"error": {
"code": "INTERNAL_ERROR",
"message": "An unexpected error occurred",
"requestId": "req_abc123"
}
}
// Don't expose internal details!
Best Practices:
1. Always return JSON, even for errors
2. Include request ID for debugging
3. Don't expose stack traces in production
4. Use consistent structure across all errors
5. Provide actionable messages when possible
Key Points to Look For:
- Consistent error structure
- Machine and human readable
- Field-level validation details
- Security awareness (no stack traces)
Follow-up: How do you handle errors in async/webhook APIs?
Rate limiting and throttling
How do you implement rate limiting in an API?
Purpose:
- Prevent abuse
- Ensure fair usage
- Protect backend resources
- Manage costs
Response Headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1609459200
# When limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Rate Limiting Strategies:
1. Per User/API Key:
# 1000 requests per hour per user
@rate_limit(limit=1000, period=3600, key=lambda req: req.user_id)
def api_endpoint(request):
...
2. Per IP:
# For unauthenticated endpoints
@rate_limit(limit=100, period=60, key=lambda req: req.ip)
def public_endpoint(request):
...
3. Per Endpoint:
# Different limits for different endpoints
@rate_limit(limit=10, period=60) # 10/min for expensive
def search_endpoint(request):
...
@rate_limit(limit=100, period=60) # 100/min for cheap
def get_user(request):
...
Implementation:
class RateLimiter:
def __init__(self, redis):
self.redis = redis
def is_allowed(self, key, limit, window):
current = self.redis.get(f"rate:{key}")
if current and int(current) >= limit:
return False, {
'limit': limit,
'remaining': 0,
'reset': self.redis.ttl(f"rate:{key}")
}
pipe = self.redis.pipeline()
pipe.incr(f"rate:{key}")
pipe.expire(f"rate:{key}", window)
result = pipe.execute()
return True, {
'limit': limit,
'remaining': limit - result[0],
'reset': window
}
Tiered Limits:
Free tier: 100 requests/day
Basic tier: 1,000 requests/day
Pro tier: 10,000 requests/day
Enterprise: 100,000 requests/day
Handling 429 Response:
async function apiCall() {
const response = await fetch('/api/data');
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
await sleep(retryAfter * 1000);
return apiCall(); // Retry
}
return response.json();
}
Key Points to Look For:
- Uses proper headers
- Has retry guidance
- Considers different strategies
Follow-up: How do you handle rate limiting in a distributed system?
API documentation (OpenAPI/Swagger)
How do you document an API? What is OpenAPI/Swagger?
OpenAPI (formerly Swagger):
Standard specification for describing REST APIs.
Basic Structure:
openapi: 3.0.0
info:
title: User API
version: 1.0.0
description: API for managing users
servers:
- url: https://api.example.com/v1
paths:
/users:
get:
summary: List all users
parameters:
- name: page
in: query
schema:
type: integer
default: 1
responses:
'200':
description: List of users
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/User'
post:
summary: Create a user
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateUser'
responses:
'201':
description: User created
'400':
description: Validation error
/users/{id}:
get:
summary: Get user by ID
parameters:
- name: id
in: path
required: true
schema:
type: integer
responses:
'200':
description: User found
'404':
description: User not found
components:
schemas:
User:
type: object
properties:
id:
type: integer
name:
type: string
email:
type: string
format: email
createdAt:
type: string
format: date-time
CreateUser:
type: object
required:
- name
- email
properties:
name:
type: string
minLength: 1
maxLength: 100
email:
type: string
format: email
securitySchemes:
bearerAuth:
type: http
scheme: bearer
Tools:
- Swagger UI: Interactive documentation
- Swagger Editor: Write/edit specs
- Code generators: Generate client SDKs
- Validation: Validate requests/responses
Best Practices:
1. Keep spec up-to-date (generate from code if possible)
2. Include examples
3. Document all error responses
4. Use semantic descriptions
5. Version your documentation
Key Points to Look For:
- Knows OpenAPI structure
- Understands benefits
- Mentions tooling
Follow-up: How do you keep documentation in sync with code?
Backward compatibility strategies
How do you maintain backward compatibility when evolving an API?
Backward Compatible Changes (Safe):
1. Adding fields:
// v1 response
{"id": 1, "name": "Alice"}
// v1.1 response (compatible)
{"id": 1, "name": "Alice", "email": "alice@example.com"}
2. Adding optional parameters:
// v1
GET /users
// v1.1 - new optional filter
GET /users?includeInactive=true
3. Adding new endpoints:
// New endpoint doesn't break existing
POST /users/123/preferences
Breaking Changes (Require New Version):
1. Removing fields:
// Breaking: Clients may depend on 'email'
// v1: {"id": 1, "name": "Alice", "email": "..."}
// v2: {"id": 1, "name": "Alice"}
2. Renaming fields:
// Breaking: Different key
// v1: {"userName": "alice"}
// v2: {"username": "alice"}
3. Changing field types:
// Breaking: String to object
// v1: {"location": "New York"}
// v2: {"location": {"city": "New York", "country": "USA"}}
Strategies:
1. Dual writing:
// Include both old and new format
{
"name": "Alice", // Old
"fullName": "Alice", // New
"location": "NYC", // Old (deprecated)
"address": { // New
"city": "NYC"
}
}
2. Deprecation period:
# Mark field as deprecated
Deprecation: true
Sunset: 2025-01-01
{
"oldField": "value", // Deprecated, still works
"_warnings": [
"'oldField' is deprecated, use 'newField' instead"
]
}
3. Version negotiation:
GET /users/123
Accept: application/vnd.api+json; version=2
# Fall back to latest if not specified
4. Adapter pattern:
def get_user(user_id, version):
user = db.get_user(user_id)
if version == 1:
return UserSerializerV1(user).data
return UserSerializerV2(user).data
Key Points to Look For:
- Knows safe vs breaking changes
- Has deprecation strategy
- Understands transition period
Follow-up: How do you communicate breaking changes to API consumers?
Authentication & Security
Session-based vs Token-based auth
Compare session-based and token-based authentication.
Session-Based:
1. User logs in
2. Server creates session, stores in DB/Redis
3. Server sends session ID in cookie
4. Browser sends cookie with each request
5. Server validates session
Client ←─── Cookie: sessionId=abc123 ───→ Server
│
Session Store
Token-Based (JWT):
1. User logs in
2. Server creates signed token
3. Server sends token to client
4. Client stores token, sends in header
5. Server validates signature
Client ←─── Authorization: Bearer eyJhbG... ───→ Server
│
Validate signature
(no DB lookup)
Comparison:
| Aspect | Session | Token |
|---|---|---|
| Storage | Server-side | Client-side |
| Scalability | Need shared store | Stateless |
| Revocation | Easy (delete session) | Hard (need blocklist) |
| Mobile | Cookie issues | Works well |
| CSRF | Vulnerable | Not vulnerable |
| Size | Small (session ID) | Larger (JWT payload) |
Session Implementation:
# Login
def login(username, password):
user = authenticate(username, password)
session_id = generate_session_id()
redis.setex(f"session:{session_id}", 3600, user.id)
response.set_cookie('session_id', session_id, httponly=True)
# Middleware
def get_current_user(request):
session_id = request.cookies.get('session_id')
user_id = redis.get(f"session:{session_id}")
return User.get(user_id)
Token Implementation:
# Login
def login(username, password):
user = authenticate(username, password)
token = jwt.encode({
'user_id': user.id,
'exp': datetime.utcnow() + timedelta(hours=1)
}, SECRET_KEY)
return {'token': token}
# Middleware
def get_current_user(request):
token = request.headers.get('Authorization').split()[1]
payload = jwt.decode(token, SECRET_KEY)
return User.get(payload['user_id'])
When to Use:
Session: Traditional web apps, need easy revocation
Token: APIs, mobile apps, microservices, SPAs
Key Points to Look For:
- Knows trade-offs
- Understands stateless nature of tokens
- Considers revocation challenges
Follow-up: How do you handle token refresh?
JWT structure and validation
How does JWT work? What are the security considerations?
JWT Structure:
header.payload.signature
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4ifQ.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
Header (Base64):
{
"alg": "HS256",
"typ": "JWT"
}
Payload (Base64):
{
"sub": "user123",
"name": "John Doe",
"role": "admin",
"iat": 1609459200,
"exp": 1609462800
}
Signature:
HMACSHA256(
base64UrlEncode(header) + "." + base64UrlEncode(payload),
secret
)
Validation Steps:
def validate_jwt(token):
# 1. Split token
header, payload, signature = token.split('.')
# 2. Verify signature
expected_sig = hmac_sha256(f"{header}.{payload}", SECRET)
if signature != expected_sig:
raise InvalidSignature()
# 3. Decode payload
claims = base64_decode(payload)
# 4. Check expiration
if claims['exp'] < time.time():
raise TokenExpired()
# 5. Check issuer/audience (optional)
if claims['iss'] != EXPECTED_ISSUER:
raise InvalidIssuer()
return claims
Security Considerations:
1. Never store sensitive data:
// Bad - JWT is only encoded, not encrypted
{"password": "secret123", "ssn": "123-45-6789"}
// Good
{"user_id": "123", "role": "admin"}
2. Use strong secrets:
# Bad
SECRET = "password123"
# Good
SECRET = os.environ['JWT_SECRET'] # 256+ bits of entropy
3. Always validate expiration:
if claims['exp'] < time.time():
raise TokenExpired()
4. Use appropriate algorithm:
# Good
jwt.decode(token, secret, algorithms=['HS256'])
# Vulnerable to algorithm confusion
jwt.decode(token, secret, algorithms=['none', 'HS256'])
5. Set reasonable expiration:
# Access token: Short-lived (15 min - 1 hour)
# Refresh token: Longer (days/weeks)
Key Points to Look For:
- Knows JWT structure
- Understands signature purpose
- Security-aware
Follow-up: What's the difference between HS256 and RS256?
OAuth 2.0 flows explained
Explain the main OAuth 2.0 flows and when to use each.
OAuth 2.0 Roles:
- Resource Owner: User
- Client: Application requesting access
- Authorization Server: Issues tokens
- Resource Server: API with protected resources
1. Authorization Code Flow (Most Secure):
For: Server-side applications
User → Client: "Login with Google"
Client → Auth Server: Redirect to login
User → Auth Server: Logs in, consents
Auth Server → Client: Authorization code
Client → Auth Server: Exchange code for tokens
Auth Server → Client: Access token, refresh token
Client → Resource Server: API calls with token
1. https://auth.server/authorize?
response_type=code&
client_id=xxx&
redirect_uri=https://app.com/callback&
scope=read&
state=xyz
2. Callback: https://app.com/callback?code=AUTH_CODE&state=xyz
3. POST https://auth.server/token
grant_type=authorization_code&
code=AUTH_CODE&
client_secret=xxx
2. Authorization Code with PKCE:
For: Mobile/SPA (no client secret)
Same as above, but with code_verifier/code_challenge
Prevents code interception attacks
1. Generate code_verifier (random string)
2. Create code_challenge = SHA256(code_verifier)
3. Send code_challenge in authorize request
4. Send code_verifier in token request
5. Server verifies: SHA256(code_verifier) == stored challenge
3. Client Credentials Flow:
For: Machine-to-machine (no user)
Client → Auth Server: client_id + client_secret
Auth Server → Client: Access token
POST https://auth.server/token
grant_type=client_credentials&
client_id=xxx&
client_secret=xxx
4. Implicit Flow (Deprecated):
For: SPAs (legacy, insecure)
Token returned directly in URL fragment
No refresh tokens
Vulnerable to token leakage
Flow Selection:
| Application Type | Recommended Flow |
|---|---|
| Server-side web app | Authorization Code |
| SPA | Authorization Code + PKCE |
| Mobile app | Authorization Code + PKCE |
| Machine-to-machine | Client Credentials |
Key Points to Look For:
- Knows multiple flows
- Recommends PKCE for public clients
- Understands security implications
Follow-up: How do refresh tokens work?
API keys vs OAuth tokens
When would you use API keys vs OAuth tokens?
API Keys:
GET /api/data
X-API-Key: sk_live_abc123
# or
GET /api/data?api_key=sk_live_abc123
OAuth Tokens:
GET /api/data
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Comparison:
| Aspect | API Keys | OAuth Tokens |
|---|---|---|
| Identifies | Application | User + Application |
| User context | No | Yes |
| Expiration | Typically long-lived | Short-lived |
| Scopes | Basic/none | Fine-grained |
| Revocation | Manual | Token expiry/blocklist |
| Delegation | No | Yes |
Use API Keys When:
- Server-to-server communication
- No user context needed
- Simple authentication sufficient
- Rate limiting by client
- Public APIs
# API key for service account
weather_api.get_forecast(api_key=WEATHER_API_KEY)
stripe.Charge.create(api_key=STRIPE_SECRET_KEY)
Use OAuth When:
- Acting on behalf of users
- Need user consent
- Fine-grained permissions
- Third-party integrations
- User data access
# OAuth for user's data
google_calendar.get_events(access_token=user_token)
github.get_repositories(access_token=user_token)
Best Practices for API Keys:
1. Different keys for different purposes:
sk_live_xxx # Production
sk_test_xxx # Testing
pk_xxx # Public (limited permissions)
2. Key rotation:
# Support multiple active keys
valid_keys = [current_key, previous_key]
3. Secure storage:
# Never in code
# Environment variables or secrets manager
API_KEY = os.environ['API_KEY']
Key Points to Look For:
- Knows when to use each
- Understands user context difference
- Security awareness
Follow-up: How do you securely store API keys?
CORS: why it exists and how to configure
What is CORS and why does it exist?
CORS (Cross-Origin Resource Sharing):
Browser security mechanism controlling cross-origin requests.
Same-Origin Policy:
Page at: https://app.com
Can request: https://app.com/api ✓
Cannot request: https://api.other.com ✗ (different origin)
Origin = Protocol + Domain + Port:
https://example.com:443
└─┬─┘ └────┬─────┘└┬┘
protocol domain port
Why CORS Exists:
Prevents malicious sites from making requests to other sites using your credentials.
// Evil site at evil.com
// Without CORS, this could steal data:
fetch('https://bank.com/api/account', {
credentials: 'include' // Sends bank.com cookies
})
How CORS Works:
Simple Requests (GET, POST with simple headers):
Request:
GET /api/data HTTP/1.1
Origin: https://app.com
Response:
Access-Control-Allow-Origin: https://app.com
Preflight Requests (Complex requests):
# Browser sends OPTIONS first
OPTIONS /api/data HTTP/1.1
Origin: https://app.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: Content-Type
# Server responds with allowed methods/headers
Access-Control-Allow-Origin: https://app.com
Access-Control-Allow-Methods: GET, PUT, POST, DELETE
Access-Control-Allow-Headers: Content-Type
Access-Control-Max-Age: 86400
# If allowed, browser sends actual request
PUT /api/data HTTP/1.1
Server Configuration:
# Flask
from flask_cors import CORS
app = Flask(__name__)
CORS(app, origins=['https://app.com', 'https://admin.app.com'])
# Or specific routes
@app.route('/api/data')
@cross_origin(origins=['https://app.com'])
def get_data():
...
// Express
const cors = require('cors');
app.use(cors({
origin: ['https://app.com', 'https://admin.app.com'],
methods: ['GET', 'POST', 'PUT', 'DELETE'],
allowedHeaders: ['Content-Type', 'Authorization'],
credentials: true
}));
Common CORS Headers:
Access-Control-Allow-Origin: https://app.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Allow-Credentials: true
Access-Control-Max-Age: 86400
Key Points to Look For:
- Understands purpose (security)
- Knows preflight requests
- Can configure server
Follow-up: Why is Access-Control-Allow-Origin: * risky?
HTTPS and TLS basics
How does HTTPS/TLS work? Why is it important?
HTTPS = HTTP + TLS (Transport Layer Security)
What TLS Provides:
1. Encryption: Data can't be read in transit
2. Authentication: Verify server identity
3. Integrity: Data can't be modified
TLS Handshake (Simplified):
Client Server
│ │
│─── ClientHello ─────────────→│ (supported ciphers)
│ │
│←── ServerHello ──────────────│ (chosen cipher)
│←── Certificate ──────────────│ (server's cert)
│←── ServerHelloDone ──────────│
│ │
│─── ClientKeyExchange ───────→│ (encrypted pre-master)
│─── ChangeCipherSpec ────────→│
│─── Finished ────────────────→│
│ │
│←── ChangeCipherSpec ─────────│
│←── Finished ─────────────────│
│ │
│←── Encrypted Data ──────────→│
Certificate Verification:
1. Server sends certificate
2. Client checks:
- Valid date range
- Issued by trusted CA
- Domain matches
- Not revoked
3. If valid, proceed
Why HTTPS Matters:
1. Privacy:
HTTP: Passwords, tokens visible to anyone on network
HTTPS: Encrypted, can't be read
2. Integrity:
HTTP: Man-in-middle can modify responses
HTTPS: Tampering detected
3. Authentication:
HTTP: Can't verify server identity
HTTPS: Certificate proves identity
4. SEO & Browser Features:
- Google ranks HTTPS higher
- Modern features require HTTPS (geolocation, camera)
- Browsers show "Not Secure" for HTTP
Implementation:
# Nginx
server {
listen 443 ssl;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
# Redirect HTTP to HTTPS
if ($scheme = http) {
return 301 https://$host$request_uri;
}
}
Key Points to Look For:
- Knows what TLS provides
- Understands certificate purpose
- Recognizes importance
Follow-up: What is certificate pinning?
Advanced Web
GraphQL vs REST trade-offs
When would you choose GraphQL over REST?
REST:
GET /users/123
GET /users/123/posts
GET /users/123/posts/456/comments
# Multiple round trips
GraphQL:
query {
user(id: "123") {
name
posts {
title
comments {
text
}
}
}
}
# Single request, exact data needed
Comparison:
| Aspect | REST | GraphQL |
|---|---|---|
| Endpoint | Multiple | Single |
| Data fetching | Fixed structure | Client specifies |
| Over-fetching | Common | Avoided |
| Under-fetching | Common | Avoided |
| Caching | HTTP caching | Complex |
| Versioning | URL versioning | Schema evolution |
| Learning curve | Lower | Higher |
Choose GraphQL When:
- Complex, nested data
- Multiple clients (web, mobile) with different needs
- Rapid frontend iteration
- Bandwidth constrained (mobile)
# Mobile gets minimal data
query {
user(id: "123") { name }
}
# Web gets full data
query {
user(id: "123") {
name
email
posts { title, createdAt }
}
}
Choose REST When:
- Simple CRUD operations
- HTTP caching important
- File uploads/downloads
- Team unfamiliar with GraphQL
- Public API with simple needs
GraphQL Challenges:
1. N+1 queries: DataLoader needed
2. Caching: No HTTP caching
3. Security: Rate limiting by field complexity
4. Monitoring: Different tooling needed
Key Points to Look For:
- Knows trade-offs, not just hype
- Understands caching challenge
- Can choose based on requirements
Follow-up: How do you handle the N+1 problem in GraphQL?
WebSockets for real-time communication
When would you use WebSockets? How do they work?
WebSocket: Full-duplex, persistent connection between client and server.
HTTP vs WebSocket:
HTTP:
Client → Request → Server
Client ← Response ← Server
(Connection closed)
WebSocket:
Client ←→ Persistent connection ←→ Server
(Bidirectional, always open)
Handshake:
# Client request
GET /chat HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
# Server response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Use Cases:
- Real-time chat
- Live notifications
- Collaborative editing
- Gaming
- Live dashboards
- Stock tickers
Server Implementation:
# Python with websockets
import asyncio
import websockets
connected = set()
async def handler(websocket, path):
connected.add(websocket)
try:
async for message in websocket:
# Broadcast to all
for conn in connected:
await conn.send(f"User: {message}")
finally:
connected.remove(websocket)
asyncio.run(websockets.serve(handler, "localhost", 8765))
Client Implementation:
const ws = new WebSocket('wss://example.com/chat');
ws.onopen = () => {
console.log('Connected');
ws.send('Hello!');
};
ws.onmessage = (event) => {
console.log('Received:', event.data);
};
ws.onclose = () => {
console.log('Disconnected');
// Implement reconnection
};
Scaling Challenges:
Multiple servers need shared state:
Server A ─────┐
Server B ─────┼──── Redis Pub/Sub ─── Broadcast
Server C ─────┘
When NOT to Use:
- Simple request/response
- Infrequent updates (use polling or SSE)
- One-way server updates (use SSE)
Key Points to Look For:
- Knows handshake process
- Understands use cases
- Considers scaling
Follow-up: How do you handle WebSocket reconnection?
Server-Sent Events (SSE)
What are Server-Sent Events? When would you use them over WebSockets?
SSE: One-way server-to-client streaming over HTTP.
Client ← Events ← Server
(One-way: server → client only)
Format:
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
event: message
data: Hello, world!
event: update
data: {"price": 150.25}
event: message
data: Line 1
data: Line 2
Client:
const eventSource = new EventSource('/events');
eventSource.onmessage = (event) => {
console.log(event.data);
};
eventSource.addEventListener('update', (event) => {
const data = JSON.parse(event.data);
console.log('Price:', data.price);
});
eventSource.onerror = () => {
console.log('Connection lost, reconnecting...');
// Auto-reconnects!
};
Server:
from flask import Response
@app.route('/events')
def events():
def generate():
while True:
data = get_latest_data()
yield f"event: update\ndata: {json.dumps(data)}\n\n"
time.sleep(1)
return Response(generate(), mimetype='text/event-stream')
SSE vs WebSocket:
| Aspect | SSE | WebSocket |
|---|---|---|
| Direction | Server → Client | Bidirectional |
| Protocol | HTTP | WebSocket |
| Auto-reconnect | Built-in | Manual |
| Browser support | Good | Excellent |
| Complexity | Simple | More complex |
| Binary data | No | Yes |
Use SSE When:
- Server-to-client updates only
- News feeds, notifications
- Live scores, stock prices
- Progress updates
Use WebSocket When:
- Bidirectional needed (chat)
- Binary data needed
- Low latency critical
Key Points to Look For:
- Knows difference from WebSocket
- Understands use cases
- Can implement basic server
Follow-up: How do you handle SSE with load balancers?
HTTP/2 and HTTP/3 improvements
What improvements do HTTP/2 and HTTP/3 bring?
HTTP/1.1 Problems:
- Head-of-line blocking
- Multiple connections needed
- Text headers (large, no compression)
- No server push
HTTP/2 Improvements:
1. Multiplexing:
HTTP/1.1:
Request 1 ─────────────────────→ Response 1
Request 2 ────────────────────→ Response 2
Request 3 ─────────────────→ Response 3
HTTP/2:
Request 1 ─┐
Request 2 ──┼──→ Single connection ──→ Responses interleaved
Request 3 ─┘
2. Header Compression (HPACK):
HTTP/1.1: Same headers sent repeatedly
HTTP/2: Headers compressed, only differences sent
3. Server Push:
Client requests index.html
Server pushes: index.html + style.css + app.js
(Without waiting for additional requests)
4. Binary Protocol:
HTTP/1.1: Text-based parsing
HTTP/2: Binary framing, efficient parsing
5. Stream Prioritization:
CSS: High priority
Images: Lower priority
HTTP/3 Improvements (QUIC):
1. UDP-based:
HTTP/2: TCP (connection setup overhead)
HTTP/3: QUIC over UDP (faster connection)
2. No Head-of-Line Blocking:
HTTP/2: Lost packet blocks all streams (TCP)
HTTP/3: Lost packet blocks only affected stream (QUIC)
3. Connection Migration:
HTTP/2: Connection lost on network change (WiFi → 4G)
HTTP/3: Connection persists across network changes
4. Built-in Encryption:
HTTP/3: TLS 1.3 required, integrated into handshake
Summary:
| Feature | HTTP/1.1 | HTTP/2 | HTTP/3 |
|---|---|---|---|
| Multiplexing | No | Yes | Yes |
| Header compression | No | HPACK | QPACK |
| Transport | TCP | TCP | QUIC/UDP |
| HOL blocking | Yes | TCP level | No |
| Connection setup | Slow | Medium | Fast |
Key Points to Look For:
- Knows multiplexing benefit
- Understands QUIC advantages
- Can explain HOL blocking
Follow-up: Does HTTP/2 make domain sharding obsolete?
gRPC for inter-service communication
What is gRPC? When would you use it over REST?
gRPC: High-performance RPC framework using Protocol Buffers.
Protocol Buffers (Schema):
syntax = "proto3";
service UserService {
rpc GetUser(GetUserRequest) returns (User);
rpc ListUsers(ListUsersRequest) returns (stream User);
rpc CreateUser(User) returns (User);
}
message User {
int32 id = 1;
string name = 2;
string email = 3;
}
message GetUserRequest {
int32 id = 1;
}
Generated Code:
# Server
class UserServicer(user_pb2_grpc.UserServiceServicer):
def GetUser(self, request, context):
user = db.get_user(request.id)
return user_pb2.User(id=user.id, name=user.name)
# Client
channel = grpc.insecure_channel('localhost:50051')
stub = user_pb2_grpc.UserServiceStub(channel)
user = stub.GetUser(user_pb2.GetUserRequest(id=123))
gRPC Features:
1. Streaming:
// Server streaming
rpc ListUsers(Query) returns (stream User);
// Client streaming
rpc UploadUsers(stream User) returns (Summary);
// Bidirectional
rpc Chat(stream Message) returns (stream Message);
2. Built-in Features:
- Deadlines/timeouts
- Cancellation
- Load balancing
- Authentication
gRPC vs REST:
| Aspect | gRPC | REST |
|---|---|---|
| Protocol | HTTP/2 | HTTP/1.1 or 2 |
| Payload | Binary (Protobuf) | Text (JSON) |
| Schema | Required (.proto) | Optional (OpenAPI) |
| Code generation | Built-in | External tools |
| Streaming | Native | SSE/WebSocket |
| Browser support | Limited | Full |
| Performance | Higher | Lower |
Use gRPC When:
- Microservices communication
- High performance needed
- Streaming required
- Polyglot environment (code gen)
- Internal services
Use REST When:
- Public APIs
- Browser clients
- Simple CRUD
- Wide compatibility needed
- Human-readable debugging
Key Points to Look For:
- Knows Protocol Buffers
- Understands streaming types
- Can compare to REST
Follow-up: How do you handle gRPC errors?
API Gateway pattern
What is the API Gateway pattern? What problems does it solve?
API Gateway: Single entry point for all client requests.
Without Gateway:
Client ─→ Service A
Client ─→ Service B
Client ─→ Service C
With Gateway:
Client ─→ API Gateway ─→ Service A
─→ Service B
─→ Service C
Responsibilities:
1. Request Routing:
/users/* → User Service
/orders/* → Order Service
/products/* → Product Service
2. Authentication/Authorization:
Client → Gateway (validates token) → Services
Services don't need to implement auth
3. Rate Limiting:
100 requests/minute per user
Enforced at gateway
4. Request/Response Transformation:
// Client gets unified format
{
"user": {...},
"orders": [...]
}
// Gateway aggregates from multiple services
5. Caching:
Frequently accessed data cached at gateway
Reduces load on services
6. Load Balancing:
Gateway distributes requests across service instances
7. Circuit Breaking:
Gateway fails fast when service is down
Implementation:
# Kong example
services:
- name: user-service
url: http://users:8080
routes:
- name: users-route
paths: ["/users"]
service: user-service
plugins:
- name: rate-limiting
config:
minute: 100
- name: jwt
Patterns:
BFF (Backend for Frontend):
Mobile Client ─→ Mobile Gateway ─→ Services
Web Client ────→ Web Gateway ───→ Services
Popular Gateways:
- Kong
- AWS API Gateway
- NGINX
- Envoy
- Zuul
Trade-offs:
- Single point of failure (need HA)
- Added latency (extra hop)
- Complexity
- Potential bottleneck
Key Points to Look For:
- Knows multiple responsibilities
- Understands trade-offs
- Mentions BFF pattern
Follow-up: How do you ensure API Gateway doesn't become a bottleneck?
Microservices Communication
Service mesh: What problem does it solve?
What is a service mesh and what problems does it solve in microservices architectures?
Service Mesh: Infrastructure layer that handles service-to-service communication.
Without Service Mesh:
┌─────────────┐ ┌─────────────┐
│ Service A │────────→│ Service B │
│ (with all │ │ (with all │
│ networking │ │ networking │
│ logic) │ │ logic) │
└─────────────┘ └─────────────┘
Each service implements:
- Load balancing
- Retry logic
- Circuit breaker
- mTLS
- Metrics/tracing
With Service Mesh:
┌─────────────────────────────────────────┐
│ Service Mesh │
│ ┌───────┐ Proxy ──────→ Proxy ┌───────┐│
│ │Svc A │←────→ ←────→│Svc B ││
│ └───────┘ Sidecar Sidecar └───────┘│
│ │
│ Control Plane │
│ (config, certs, policies) │
└─────────────────────────────────────────┘
Services focus on business logic
Proxy handles networking concerns
Problems Solved:
1. Observability:
# All traffic flows through proxies
# Automatic collection of:
- Request rates
- Error rates
- Latencies (p50, p99)
- Distributed traces
- Service dependency graphs
2. Security:
# mTLS between all services (automatic)
# Policy: Only Service A can call Service B
authorization:
- from: service-a
to: service-b
methods: [GET, POST]
3. Traffic Management:
# Canary deployment
route:
- destination: service-v1
weight: 90
- destination: service-v2
weight: 10
# Retry and timeout policies
retries:
attempts: 3
retryOn: 5xx
timeout: 5s
4. Resilience:
# Circuit breaker
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
# Rate limiting per service
Architecture Components:
Data Plane (Sidecars):
- Envoy proxies alongside each service
- Intercepts all network traffic
- Enforces policies
Control Plane:
- Configuration management
- Certificate authority
- Service discovery
- Policy distribution
Popular Service Meshes:
- Istio (most feature-rich)
- Linkerd (simpler, lighter)
- Consul Connect
- AWS App Mesh
When to Use:
- Many microservices (>10-20)
- Need consistent observability
- Security requirements (mTLS)
- Complex traffic routing needs
- Polyglot services
When NOT to Use:
- Few services
- Simple architecture
- Team not ready for complexity
- Performance-critical (adds latency ~1-2ms)
Key Points to Look For:
- Understands sidecar pattern
- Knows problems it solves
- Aware of trade-offs (complexity, latency)
Follow-up: How does a service mesh differ from an API gateway?
RabbitMQ vs Kafka: When to use each?
When would you choose RabbitMQ over Kafka, and vice versa?
RabbitMQ: Traditional message broker with smart broker/dumb consumer model.
Kafka: Distributed event streaming platform with dumb broker/smart consumer model.
RabbitMQ:
Producer → Exchange → Queue → Consumer
↘ Queue → Consumer
- Messages deleted after consumption
- Broker tracks what's consumed
- Push model to consumers
Kafka:
Producer → Topic [Partition 0] → Consumer Group
[Partition 1] → Consumer Group
[Partition 2] → Consumer Group
- Messages retained (configurable time)
- Consumers track their offset
- Pull model from consumers
Comparison:
| Aspect | RabbitMQ | Kafka |
|---|---|---|
| Model | Message queue | Event log |
| Message retention | Until consumed | Time/size based |
| Ordering | Per queue | Per partition |
| Throughput | ~10K/sec | ~1M/sec |
| Latency | Lower (~1ms) | Higher (~5-10ms) |
| Replay | No | Yes |
| Routing | Flexible (exchanges) | Topic-based |
| Consumer model | Push | Pull |
Choose RabbitMQ When:
1. Task Distribution:
# Work queue - distribute tasks among workers
# Each task processed by ONE worker
channel.basic_publish(
exchange='',
routing_key='task_queue',
body=task_data,
properties=pika.BasicProperties(delivery_mode=2)
)
2. Complex Routing:
# Route based on patterns
# Fanout, Direct, Topic, Headers exchanges
channel.exchange_declare(exchange='logs', exchange_type='topic')
channel.basic_publish(
exchange='logs',
routing_key='error.payment.us', # Pattern matching
body=message
)
3. Request/Reply Pattern:
# RPC over messaging
result = client.call(request) # Built-in correlation
4. Priority Queues:
# Process urgent messages first
channel.queue_declare(queue='tasks', arguments={'x-max-priority': 10})
Choose Kafka When:
1. Event Sourcing:
# Replay events to rebuild state
consumer.seek(partition, offset=0) # Start from beginning
for event in consumer:
apply_event(event)
2. High Throughput:
# Millions of messages per second
# Log aggregation, metrics collection
producer.send('metrics', value=metric_data)
3. Multiple Consumers:
# Same event consumed by multiple services
# Each consumer group gets all messages
# Analytics service reads events
# Notification service reads same events
# Audit service reads same events
4. Stream Processing:
# Real-time data pipelines
# Kafka Streams / KSQL
stream.filter(lambda x: x['amount'] > 1000)
.map(lambda x: enrich(x))
.to('high_value_transactions')
Summary:
RabbitMQ: "Smart broker"
- Traditional messaging
- Complex routing
- Lower latency
- Task queues
Kafka: "Dumb broker"
- Event streaming
- High throughput
- Event replay
- Multiple consumers
Key Points to Look For:
- Knows fundamental difference (queue vs log)
- Can recommend based on use case
- Understands throughput/latency trade-offs
Follow-up: How would you handle exactly-once delivery in Kafka?
API composition patterns in microservices
How do you handle API composition when data is spread across multiple microservices?
Problem: Client needs data from multiple services.
Client needs: Order + Customer + Product details
Data lives in: Order Service, Customer Service, Product Service
Patterns:
1. API Gateway Composition:
Client → API Gateway → Order Service
→ Customer Service
→ Product Service
↓
Aggregated Response
# Gateway aggregates responses
async def get_order_details(order_id):
order = await order_service.get(order_id)
customer = await customer_service.get(order.customer_id)
products = await asyncio.gather(*[
product_service.get(pid) for pid in order.product_ids
])
return {
"order": order,
"customer": customer,
"products": products
}
Pros: Simple client, centralized logic
Cons: Gateway becomes complex, single point of failure
2. Backend for Frontend (BFF):
Mobile App → Mobile BFF → Services
Web App → Web BFF → Services
// Mobile BFF - minimal data
app.get('/orders/:id', async (req, res) => {
const order = await getOrder(req.params.id);
return {
id: order.id,
total: order.total,
status: order.status
};
});
// Web BFF - full details
app.get('/orders/:id', async (req, res) => {
const [order, customer, products] = await Promise.all([...]);
return {
...order,
customer: { name: customer.name, email: customer.email },
products: products.map(p => ({ name: p.name, price: p.price }))
};
});
Pros: Optimized per client, clear ownership
Cons: Code duplication, multiple codebases
3. GraphQL Federation:
# Each service defines its types
# Gateway federates them
# Order Service
type Order @key(fields: "id") {
id: ID!
total: Float!
customer: Customer! # Reference
}
# Customer Service
type Customer @key(fields: "id") {
id: ID!
name: String!
orders: [Order!]! # Extension
}
# Client query - single request
query {
order(id: "123") {
total
customer {
name
email
}
products {
name
price
}
}
}
Pros: Flexible queries, type safety
Cons: Complexity, learning curve
4. Client-Side Composition:
// Client makes multiple calls
async function getOrderPage(orderId) {
const order = await fetch(`/orders/${orderId}`);
const customer = await fetch(`/customers/${order.customerId}`);
const products = await Promise.all(
order.productIds.map(id => fetch(`/products/${id}`))
);
return { order, customer, products };
}
Pros: Simple services, flexibility
Cons: Multiple round trips, client complexity, over-fetching
5. Materialized View / CQRS:
Services → Events → Read Service → Denormalized View
Order Created → Update order-details view
Customer Updated → Update order-details view
Product Changed → Update order-details view
Client → Read Service (single call)
Pros: Fast reads, simple queries
Cons: Eventual consistency, event handling complexity
Choosing a Pattern:
| Pattern | Best For |
|---|---|
| API Gateway | Simple aggregation, few services |
| BFF | Multiple client types, different needs |
| GraphQL | Flexible queries, rapid frontend iteration |
| Client-side | Simple apps, few compositions |
| CQRS/Materialized | High read volume, complex aggregations |
Key Points to Look For:
- Knows multiple patterns
- Can recommend based on context
- Considers trade-offs
Follow-up: How do you handle failures in one of the composed services?
Webhook design and reliability
How do you design a reliable webhook system?
Webhook: Server-to-server HTTP callback when events occur.
Your Service → Event → HTTP POST → Customer's Endpoint
Example: Stripe sends payment.succeeded to your /webhooks/stripe
Design Considerations:
1. Payload Design:
{
"id": "evt_123abc",
"type": "order.completed",
"created": "2024-01-15T10:30:00Z",
"data": {
"order_id": "ord_456",
"total": 99.99,
"currency": "USD"
},
"api_version": "2024-01-01"
}
Include:
- Unique event ID (for deduplication)
- Event type
- Timestamp
- API version
- Minimal data (or link to fetch full data)
2. Security:
# Signature verification
import hmac
import hashlib
def verify_signature(payload, signature, secret):
expected = hmac.new(
secret.encode(),
payload.encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
# Webhook handler
@app.post('/webhooks')
def handle_webhook(request):
signature = request.headers.get('X-Webhook-Signature')
if not verify_signature(request.body, signature, WEBHOOK_SECRET):
return Response(status=401)
# Process event...
Headers to send:
X-Webhook-Signature: sha256=abc123...
X-Webhook-ID: evt_123abc
X-Webhook-Timestamp: 1705312200
Content-Type: application/json
3. Retry Strategy:
# Exponential backoff with jitter
retry_delays = [60, 300, 1800, 7200, 43200] # 1m, 5m, 30m, 2h, 12h
def send_webhook(event, attempt=0):
try:
response = requests.post(
event.endpoint_url,
json=event.payload,
headers=get_headers(event),
timeout=30
)
if response.status_code >= 200 and response.status_code < 300:
mark_delivered(event)
elif response.status_code >= 500:
schedule_retry(event, attempt)
else:
# 4xx - don't retry (client error)
mark_failed(event)
except requests.Timeout:
schedule_retry(event, attempt)
def schedule_retry(event, attempt):
if attempt < len(retry_delays):
delay = retry_delays[attempt] + random.randint(0, 60)
queue.enqueue_in(delay, send_webhook, event, attempt + 1)
else:
mark_failed(event)
notify_customer(event)
4. Idempotency (Receiver Side):
# Receiver must handle duplicates
@app.post('/webhooks/provider')
def handle_webhook(request):
event_id = request.json['id']
# Check if already processed
if redis.exists(f"webhook:{event_id}"):
return Response(status=200) # Acknowledge but don't reprocess
# Process the event
process_event(request.json)
# Mark as processed
redis.setex(f"webhook:{event_id}", 86400, "processed")
return Response(status=200)
5. Delivery Status Dashboard:
CREATE TABLE webhook_deliveries (
id SERIAL PRIMARY KEY,
event_id VARCHAR(255),
endpoint_url TEXT,
status VARCHAR(20), -- pending, delivered, failed
attempts INT DEFAULT 0,
last_attempt_at TIMESTAMP,
response_code INT,
response_body TEXT,
created_at TIMESTAMP
);
-- Customer can see delivery status
-- Manual retry option for failed webhooks
6. Event Types & Filtering:
# Let customers subscribe to specific events
webhook_config = {
"url": "https://customer.com/webhooks",
"events": ["order.completed", "order.refunded"],
"secret": "whsec_..."
}
Best Practices:
| Practice | Description |
|---|---|
| Quick response | Receiver should respond fast (<5s), process async |
| Verify signatures | Always validate webhook authenticity |
| Idempotent handling | Same event twice = same result |
| Log everything | Both sender and receiver |
| Provide replay | Let customers re-request missed events |
| Version your webhooks | Include API version in payload |
Key Points to Look For:
- Security (signatures)
- Retry with backoff
- Idempotency
- Monitoring/dashboard
Follow-up: How do you handle a customer endpoint that's been down for hours?