Opening

“Why are you still creating batches by extending Thread when you could use Spring Batch?”

Yes, they’re right. If building something new in 2025, Spring Batch or commercial batch frameworks would be the correct choice. But what about reality? Legacy systems that are 10 or 15 years old are still full of batch classes that extend Thread. And when these batches crash after work hours, developers who don’t understand the difference between Thread and Daemon can’t even find the root cause.

Real-World Batch Systems - The Era of Thread extends

Why Still Thread extends Structure?

Looking inside financial institutions or large enterprise systems reveals a surprising sight. Hundreds of critical business logic batch jobs are implemented as classes that extend Thread, scheduled with cron or Thread.sleep().

Why do these structures still survive?

Historical Reasons: Until the mid-2000s, Spring Batch didn’t exist (first released in 2007). All we had was pure Java Thread and JDBC. Thread-based batches created then are maintained “because they work well.”

The Beauty of Simplicity: Everyone understands batch classes that extend Thread. Just implement the run() method, debugging is easy, and local testing is convenient. Immediate execution without complex framework configuration.

Independence: Each batch runs in an independent JVM, so if one dies, it doesn’t affect others. Even with memory leaks, the JVM terminates after execution, so no problem.

Operations Team Preference: Operations teams love simplicity. “java -cp batch.jar com.company.batch.DailySettlementThread” - that’s it. Complex batch admin screens? Not necessary.

Thread and Daemon - Why Batches Never End

User Thread vs Daemon Thread

Do you know the conditions for a Java application to terminate? The JVM terminates only when all User Threads have ended. The important point here is that Daemon Threads are not included in this count.

User Thread Characteristics:

  • The thread executing the main method is typical
  • Classes that extend Thread or threads created with new Thread() are User Threads by default
  • JVM terminates only when all User Threads end
  • Batch classes that extend Thread all operate as User Threads

Daemon Thread Characteristics:

  • For background tasks like GC, monitoring
  • Forcibly terminated immediately when all User Threads end
  • Requires explicit setting with setDaemon(true)

Common Thread Problems in Batch Processing

Case 1: Batch That Won’t Terminate

The run() method of the batch class that extends Thread has ended but the JVM won’t terminate. The causes are usually:

  • ExecutorService not shutdown()
  • Connection pool maintenance threads created as User Threads
  • File watching or directory polling threads still running

Case 2: Batch That Dies Suddenly

The batch terminates suddenly while processing. This happens because important work was processed with Daemon Threads:

  • Asynchronous logging processed with Daemon Thread
  • Important post-processing delegated to Daemon Thread
  • Main batch Thread terminates first, forcing Daemon Thread termination

Thread Management Best Practices

1. Always Clean Up ExecutorService

If using ExecutorService for parallel processing in batches, you must properly terminate it. Combine shutdown() and awaitTermination() for safe termination. Be especially careful with shutdownNow() as it interrupts running tasks.

2. Use Daemon Threads Carefully

Use Daemon Threads only for supplementary tasks like logging, monitoring, and statistics collection. Never process business logic or data processing in Daemon Threads.

3. Monitor Thread State

Make it a habit to check active threads at batch start and before termination. Thread.getAllStackTraces() lets you check all currently running threads.

The Uncomfortable Truth About Transaction Rollback

The Limitations of @Transactional

Spring’s @Transactional is convenient, but it can be a trap in large-scale batches. What happens when you put @Transactional on a method processing 1 million records?

Memory Issues: All changes are kept in memory until the transaction commits. Undo logs keep accumulating, potentially causing out-of-memory errors.

Rollback Segment Exhaustion: Database rollback segments are not infinite. Large transactions can exhaust rollback segments.

Lock Contention: Long transactions block other operations. If online services and batches use the same tables, serious performance degradation occurs.

Database-Specific Rollback Limits

Each database has physical limits on the amount of data that can be rolled back. While adjustable through configuration, it cannot be increased indefinitely.

Oracle’s Case

Oracle uses rollback segments (Undo Segments). The UNDO_RETENTION parameter sets the retention period, and rollback segment size expands dynamically. However, tablespace size is the limit.

General recommendations:

  • Set rollback segments to 10% of the largest table size
  • Use 4 or fewer rollback segments per transaction
  • Allocate separate large rollback segments for batch operations

Actual limits:

  • Undo space available for a single transaction is limited by Undo tablespace size
  • Typically several GB to tens of GB
  • Beyond that, “ORA-01555: snapshot too old” errors occur

MySQL (InnoDB) Case

InnoDB uses Undo Logs, with size limited by innodb_max_undo_log_size. Default is 1GB, automatically truncated when exceeded.

Transaction limit calculation:

  • Each rollback segment has 1024 undo slots
  • Default settings: 128 rollback segments × 1024 slots = approximately 131,072 concurrent transactions supported
  • For single large transactions, Undo Log can grow to tens of GB

Real experience:

  • Cases exist where Undo Log grew to 180GB
  • In such cases, rollback takes hours to days
  • Rollback continues even after forced termination and restart

PostgreSQL’s Case

PostgreSQL uses WAL (Write-Ahead Log) for MVCC. Unlike Oracle or MySQL, there are no separate rollback segments; multiple row versions are stored in the table itself.

WAL size limits:

  • max_wal_size: Maximum WAL size between checkpoints (default 1GB, PostgreSQL 9.5+)
  • min_wal_size: Minimum WAL size to maintain for recycling (default 80MB)
  • max_wal_size is a soft limit, so it can actually be exceeded (under high load or archiving delays)
  • Older versions (9.4 and below) use checkpoint_segments parameter

Transaction size limits:

  • Theoretically unlimited as long as disk space allows
  • In practice, dead tuples not cleaned by VACUUM accumulate, causing table bloat
  • Long transactions prevent VACUUM for other transactions

Practical Batch Transaction Strategies

1. Chunk Processing

Don’t process everything in one transaction. Processing in chunks of 1000 or 10000 records:

  • Predictable memory usage
  • Partial reprocessing possible on failure
  • Minimized lock contention with other operations

2. Checkpoint Method

Record processing progress in a separate table:

  • Store last processed ID or timestamp
  • Restart from checkpoint on failure
  • No need for full rollback

3. Compensation Transactions

Consider compensation transactions that perform opposite operations instead of rollback:

  • Batch version of Saga pattern
  • Implement compensation logic for each step
  • Flexible response to partial failures

4. Temporary Table Utilization

Use temporary tables for large data processing:

  • Minimize locks on original tables
  • Apply all at once after processing
  • Only DROP temporary table on failure

Real-World Case - Lessons from 100 Million Record Settlement Batch

Problem Situation

We had a batch that settled 100 million daily sales records. It started simple. Using MyBatis to query and process all data:

@Transactional
public void settleDailySales(LocalDate date) {
    List<Sale> sales = saleMapper.selectDailySales(date);
    for (Sale sale : sales) {
        processSettlement(sale);
    }
}

A month later, this batch died with OutOfMemoryError. Obviously, trying to load 100 million records into memory.

First Improvement - Direct JDBC Usage and Cursor

Feeling MyBatis’s limitations, we decided to use JDBC directly. We processed using Cursor method with PreparedStatement and ResultSet. But problems remained. The transaction became too long, Undo Log grew explosively, and online services started throwing “ORA-01555: snapshot too old” errors.

Second Improvement - Chunk Processing

We changed to process in chunks of 10,000 records while managing JDBC Connections directly. Each chunk was processed in a separate transaction, and after commit, we obtained a new Connection to process the next chunk. But after processing 50 million records, a network failure interrupted the batch, and we couldn’t tell where to restart processing.

Final Solution

We eventually designed it this way:

  1. Partition-based Parallel Processing: Split daily partitions into 24 hourly chunks
  2. Checkpoint Table: Record processing status of each chunk in separate table
  3. Thread Pool Utilization: Parallel processing with 4 threads, each with independent transactions
  4. Failure Recovery Strategy: Reprocess only failed chunks, manual processing after 3 retries

Results:

  • Processing time: 8 hours → 2 hours
  • Memory usage: Peak 32GB → Stable 4GB
  • Disaster recovery: Full reprocessing → Failed chunks only (average 5 minutes)

Thread and Transaction Debugging Tips

Thread Dump Analysis

When a batch is stuck or won’t terminate, thread dumps are the best debugging tool:

jstack <PID> > thread_dump.txt
kill -3 <PID>  # Dump to standard output

Patterns to watch for:

  • Threads in BLOCKED state - Possible deadlock
  • WAITING on monitor - Synchronization issues
  • Many TIMED_WAITING - Possible connection pool exhaustion

Transaction Monitoring

Oracle:

-- Current active transactions and Undo usage
SELECT s.sid, s.serial#, s.username, t.used_ublk, t.used_urec
FROM v$session s, v$transaction t
WHERE s.taddr = t.addr
ORDER BY t.used_ublk DESC;

-- Rollback segment usage
SELECT segment_name, status, tablespace_name,
       bytes/1024/1024 as size_mb,
       blocks, extents
FROM dba_rollback_segs;

MySQL:

-- InnoDB transaction status
SELECT * FROM information_schema.INNODB_TRX;

-- Check Undo Log size
SELECT NAME, ALLOCATED_SIZE/1024/1024/1024 AS SIZE_GB 
FROM INFORMATION_SCHEMA.INNODB_TABLESPACES 
WHERE NAME LIKE '%undo%';

-- History List Length (MVCC load indicator)
SHOW ENGINE INNODB STATUS;  -- Check TRANSACTIONS section

PostgreSQL:

-- Long-running transactions
SELECT pid, age(clock_timestamp(), query_start), query 
FROM pg_stat_activity
WHERE state != 'idle' 
AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start;

-- WAL usage
SELECT name, setting, unit 
FROM pg_settings 
WHERE name IN ('max_wal_size', 'min_wal_size', 'wal_keep_size');

Additional Considerations

The Importance of Connection Pool Settings

An often overlooked aspect of batch processing is Connection Pool configuration:

  • validationQuery/testOnBorrow: Essential for long-running batches. You might not realize the DB connection is broken and continue processing, only to fail at commit time.
  • maxWait: If too short, it will fail to obtain connections under high load. Set more generously for batches than online services.
  • removeAbandoned: Be careful as batches may use a single connection for a long time by nature.

Utilizing Bulk Operations

Performance improves significantly when using bulk processing features provided by each database:

  • JDBC Batch: Utilize PreparedStatement.addBatch() and executeBatch()
  • MySQL LOAD DATA INFILE: Over 20x faster than INSERT for bulk loading CSV files
  • Oracle SQL*Loader: External utility optimized for large data loads
  • PostgreSQL COPY: The fastest method for bulk data input

Considering Transaction Isolation Levels

Choose appropriate isolation levels based on batch processing characteristics:

  • READ UNCOMMITTED: For statistical batches where speed is more important than accuracy
  • READ COMMITTED: Suitable for most batch operations
  • REPEATABLE READ: For settlement batches where consistency is critical
  • SERIALIZABLE: Rarely used, causes severe performance degradation

Closing

I know Spring Batch or commercial batch frameworks are good. But many real-world systems still run on batch classes that extend Thread. To create stable batches in such environments:

  1. Understand Thread Lifecycle: The difference between User Thread and Daemon Thread is fundamental
  2. Split Transactions Small: Don’t try to solve everything with one @Transactional
  3. Know Database Limits: Understand each DB’s rollback mechanism and limits
  4. Make Monitoring a Habit: Thread dumps and transaction monitoring are essential
  5. Design for Failure: Checkpoints and reprocessing strategies are not optional but mandatory

Don’t dismiss legacy systems. They contain 15 years of accumulated business logic and exception handling. New technology is good, but solid fundamentals are more important.

And remember the law: “Batches crash at 3 AM.” Prepare in advance.