Skip to main content

GFS Performance Characteristics

Performance findings from load testing file retrieval through the storage API (storage.cloud.eddisonso.com).

Test Setup

  • Tool: Custom Go perftest framework (~/perftest)
  • Endpoint: GET /storage/public/cat.jpg (155KB file)
  • Duration: 60 seconds per test
  • Network capacity: 200 Mb/s

Results

Throughput Scaling

Before fixes (gateway 500m CPU, HAProxy 100m CPU, PostgreSQL 500m CPU):

ConcurrencyOps/secThroughputNotes
10 (keep-alive)44~54 Mb/sSingle clean run
50 (keep-alive)36~44 Mb/sGateway CPU-bound at 500m limit
50 (keep-alive, 2000m gateway)43~53 Mb/sAfter raising gateway CPU limit
10 (no keep-alive)41~50 Mb/sNew TLS connection per request
1000 (keep-alive)121~147 Mb/sApproaching network cap
1000 (no keep-alive)108~133 Mb/s~9% overhead from TLS handshakes

After fixes (gateway 2000m, HAProxy 500m, PostgreSQL 1000m):

ConcurrencyOps/secThroughputNotes
5 (no keep-alive)96~118 Mb/s+140% vs before
10 (no keep-alive)91~112 Mb/s+117% vs before
25 (no keep-alive)63~77 Mb/sBandwidth/TLS limited
50 (no keep-alive)61~75 Mb/s+69% vs before

Per-Request Latency Breakdown

From a single external request with curl:

PhaseTime% of Total
DNS lookup12ms14%
TCP handshake3ms3%
TLS handshake17ms20%
Server processing (TTFB)50ms57%
Data transfer (155KB)5ms6%
Total87ms

Only 6% of the request time is spent transferring data. The rest is overhead.

Internal GFS Read Path Breakdown

Measured by running benchmarks directly inside the SFS pod (eliminating all external network overhead):

Operationp50Description
GetFile gRPC0.67msMaster metadata lookup
GetChunkLocations gRPC0.71msMaster chunk location lookup
Chunkserver TCP read (155KB)~5.5msConnect + transfer from chunkserver
Full GFS read (all combined)6.4msTwo gRPC calls + chunkserver read
SFS handler simulation6.9msGetFile + Full Read (what the HTTP handler does)
SFS HTTP overhead (no GFS)3.7msHandler routing, response writing
Total in-cluster HTTP request~11msSFS HTTP + GFS read

The GFS master is fast (~8000 ops/sec for metadata lookups). The chunkserver TCP read dominates the internal path.

Internal capacity: 550 ops/sec (10 concurrent) = 83 MB/s through GFS — far exceeding the 200 Mb/s external link.

External Overhead Stack

For a single external request (~46ms):

GFS read (master + chunkserver):    ~7ms
SFS HTTP handler: ~4ms
Gateway TCP proxy: ~1ms
TLS handshake: ~17ms
TCP + DNS: ~17ms
Total: ~46ms

Request Path

Every file read traverses four network hops:

client -> gateway (TLS termination + reverse proxy)
-> SFS backend (HTTP handler)
-> GFS master (gRPC metadata lookup)
-> GFS chunkserver (TCP data read)
<- back through all layers

Latency Under Concurrency

Latency distribution measured with per-request timing (15s runs, no keep-alive). Before CPU limit fixes:

WorkersOps/secp50p90p99max
12246ms52ms70ms95ms
24742ms50ms60ms90ms
37240ms50ms98ms262ms
59939ms90ms204ms26s
1040109ms612ms977ms1.3s

Throughput scaled linearly from 1-5 workers (22→99 ops/sec) then collapsed at 10. Root cause was CFS CPU throttling on HAProxy and PostgreSQL (see below).

Control: Health Endpoint (no GFS, 19 bytes)

WorkersOps/secp50p99
13330ms46ms
523121ms37ms
1033330ms50ms

Linear scaling, no tail latency. All internal components (gateway, TLS, backend services) have ample headroom.

Primary Bottleneck: CFS CPU Throttling (resolved)

The throughput cliff at 5-6 workers was caused by Linux CFS CPU throttling on the database path. Every file download queries PostgreSQL for namespace visibility (canAccessNamespace), and both HAProxy and PostgreSQL had tight CPU limits.

Root Cause: CPU Cgroup Throttling

When a pod exceeds its CPU limit, the CFS scheduler pauses all its processes for the remainder of the 100ms period. This caused ~90ms latency spikes on every throttled query.

ComponentOld CPU LimitThrottle EventsEffect
HAProxy100m8,45653.6% of queries >10ms at c=10
PostgreSQL500m3,78427.1% of queries >10ms at c=10

Evidence

DB query benchmarks (inside SFS pod, namespace lookup query):

Targetc=1 ops/secc=10 ops/secSpike rate at c=10
Via HAProxy (100m CPU)165168 (flat!)53.6%
Direct to PostgreSQL (500m CPU)74638127.1%

HAProxy at 100m was the primary bottleneck — only 10% of a CPU core for proxying all database traffic.

Fix Applied

ComponentBeforeAfter
Gateway500m2000m
HAProxy100m500m
PostgreSQL500m1000m

After fix — DB query throughput through HAProxy:

WorkersBefore (ops/sec)After (ops/sec)Improvement
c=11656313.8x
c=31348926.7x
c=51528285.4x
c=101688745.2x

Spike rate at c=10 dropped from 53.6% to 13.6%.

Layer-by-Layer Isolation

TestOps/sec (10 workers)p50Conclusion
Full stack (gateway + TLS)91After fix
Direct to SFS (in-cluster, no TLS)43108msBefore fix; gateway/TLS add less than 5% overhead
Health endpoint (no GFS)33330msGFS path adds ~7ms per request internally
GFS read (inside SFS pod, 10 concurrent)55017msInternal GFS capacity far exceeds external link

Remaining Bottlenecks

1. External Bandwidth (200 Mb/s)

At peak throughput (96 ops/sec × 155KB = ~118 Mb/s), the external link is ~59% utilized. TLS handshake overhead (~17ms per connection with no keep-alive) limits per-worker bandwidth utilization.

2. Duplicate GFS Master Calls (RESOLVED)

Before: Each file read made two gRPC calls to the GFS master:

  1. GetFile() — to get file size for Content-Length header
  2. GetChunkLocations() — to find which chunkserver holds the data

After: Both calls eliminated by:

  • Using getCachedChunks() instead of GetChunkLocations() in ReadToWithNamespace()
  • Adding FileSizeWithNamespace() that computes size from cached chunk locations

Now each download makes zero fresh master calls if chunks are cached (5-minute TTL).

4. Gateway Forces Connection: close

The gateway injects Connection: close on every proxied request and opens a new TCP connection to the backend per request. This prevents HTTP keep-alive between gateway and backend.

5. No Caching Layer

There is no caching at the SFS level. Every request reads from GFS end-to-end, even for the same file requested thousands of times.

GFS as Object Storage

GFS was designed for large sequential reads/writes (following Google's GFS paper). This makes it a poor fit for small-file object storage workloads:

CharacteristicGFS DesignObject Storage Need
File sizeLarge (GB+)Any (KB to GB)
Access patternSequential append/readRandom read/write
Metadata costgRPC round-trip per readColocated/cached
Chunk granularity64MBPacked small objects
Write modelAppend-onlyOverwrite support

Where GFS Excels

GFS is well-suited for workloads like log aggregation (used by log-service):

  • Append-only writes (logs are write-once)
  • Large sequential writes (continuous log streams)
  • Large sequential reads (scanning log ranges)
  • Low metadata pressure (appending to few open files, chunk allocations amortized over 64MB)

Where GFS Struggles

For the storage service serving many small files to users:

  • Every read pays the metadata round-trip (~1.4ms for two gRPC calls) regardless of file size
  • For a 1KB file, useful data transfer is less than 1% of total request time
  • No built-in caching or HTTP-aware optimizations (range requests, ETags, etc.)

3. Per-Request Database Queries (RESOLVED)

Before: Every file download performed a database query to check namespace visibility (canAccessNamespace), which under concurrent load caused ~90ms latency spikes when HAProxy was throttled.

After: Namespace visibility information is now cached in-memory with a 30-second TTL:

  • Cache hit: no database round-trip
  • Cache miss: single query, result cached for future requests
  • Invalidation: automatic on namespace visibility updates

This eliminates the database as a bottleneck on the download path.

Potential Improvements

  1. Use cached chunk locations on read pathDONEReadToWithNamespace now uses getCachedChunks()
  2. Add an in-memory/on-disk cache at the SFS layer — skip GFS entirely for hot files
  3. HTTP caching headers — allow browsers and proxies to cache responses (ETag, Cache-Control)