The Uncomfortable Truth About Batch Processing - Thread, Daemon, and Transaction Rollback Limitations
Why @Transactional Alone Cannot Handle Large-Scale Batch Processing
Opening
“Why are you still creating batches by extending Thread when you could use Spring Batch?”
Yes, they’re right. If building something new in 2025, Spring Batch or commercial batch frameworks would be the correct choice. But what about reality? Legacy systems that are 10 or 15 years old are still full of batch classes that extend Thread. And when these batches crash after work hours, developers who don’t understand the difference between Thread and Daemon can’t even find the root cause.
Real-World Batch Systems - The Era of Thread extends
Why Still Thread extends Structure?
Looking inside financial institutions or large enterprise systems reveals a surprising sight. Hundreds of critical business logic batch jobs are implemented as classes that extend Thread, scheduled with cron or Thread.sleep().
Why do these structures still survive?
Historical Reasons: Until the mid-2000s, Spring Batch didn’t exist (first released in 2007). All we had was pure Java Thread and JDBC. Thread-based batches created then are maintained “because they work well.”
The Beauty of Simplicity: Everyone understands batch classes that extend Thread. Just implement the run() method, debugging is easy, and local testing is convenient. Immediate execution without complex framework configuration.
Independence: Each batch runs in an independent JVM, so if one dies, it doesn’t affect others. Even with memory leaks, the JVM terminates after execution, so no problem.
Operations Team Preference: Operations teams love simplicity. “java -cp batch.jar com.company.batch.DailySettlementThread” - that’s it. Complex batch admin screens? Not necessary.
Thread and Daemon - Why Batches Never End
User Thread vs Daemon Thread
Do you know the conditions for a Java application to terminate? The JVM terminates only when all User Threads have ended. The important point here is that Daemon Threads are not included in this count.
User Thread Characteristics:
- The thread executing the main method is typical
- Classes that extend Thread or threads created with new Thread() are User Threads by default
- JVM terminates only when all User Threads end
- Batch classes that extend Thread all operate as User Threads
Daemon Thread Characteristics:
- For background tasks like GC, monitoring
- Forcibly terminated immediately when all User Threads end
- Requires explicit setting with setDaemon(true)
Common Thread Problems in Batch Processing
Case 1: Batch That Won’t Terminate
The run() method of the batch class that extends Thread has ended but the JVM won’t terminate. The causes are usually:
- ExecutorService not shutdown()
- Connection pool maintenance threads created as User Threads
- File watching or directory polling threads still running
Case 2: Batch That Dies Suddenly
The batch terminates suddenly while processing. This happens because important work was processed with Daemon Threads:
- Asynchronous logging processed with Daemon Thread
- Important post-processing delegated to Daemon Thread
- Main batch Thread terminates first, forcing Daemon Thread termination
Thread Management Best Practices
1. Always Clean Up ExecutorService
If using ExecutorService for parallel processing in batches, you must properly terminate it. Combine shutdown() and awaitTermination() for safe termination. Be especially careful with shutdownNow() as it interrupts running tasks.
2. Use Daemon Threads Carefully
Use Daemon Threads only for supplementary tasks like logging, monitoring, and statistics collection. Never process business logic or data processing in Daemon Threads.
3. Monitor Thread State
Make it a habit to check active threads at batch start and before termination. Thread.getAllStackTraces() lets you check all currently running threads.
The Uncomfortable Truth About Transaction Rollback
The Limitations of @Transactional
Spring’s @Transactional is convenient, but it can be a trap in large-scale batches. What happens when you put @Transactional on a method processing 1 million records?
Memory Issues: All changes are kept in memory until the transaction commits. Undo logs keep accumulating, potentially causing out-of-memory errors.
Rollback Segment Exhaustion: Database rollback segments are not infinite. Large transactions can exhaust rollback segments.
Lock Contention: Long transactions block other operations. If online services and batches use the same tables, serious performance degradation occurs.
Database-Specific Rollback Limits
Each database has physical limits on the amount of data that can be rolled back. While adjustable through configuration, it cannot be increased indefinitely.
Oracle’s Case
Oracle uses rollback segments (Undo Segments). The UNDO_RETENTION parameter sets the retention period, and rollback segment size expands dynamically. However, tablespace size is the limit.
General recommendations:
- Set rollback segments to 10% of the largest table size
- Use 4 or fewer rollback segments per transaction
- Allocate separate large rollback segments for batch operations
Actual limits:
- Undo space available for a single transaction is limited by Undo tablespace size
- Typically several GB to tens of GB
- Beyond that, “ORA-01555: snapshot too old” errors occur
MySQL (InnoDB) Case
InnoDB uses Undo Logs, with size limited by innodb_max_undo_log_size. Default is 1GB, automatically truncated when exceeded.
Transaction limit calculation:
- Each rollback segment has 1024 undo slots
- Default settings: 128 rollback segments × 1024 slots = approximately 131,072 concurrent transactions supported
- For single large transactions, Undo Log can grow to tens of GB
Real experience:
- Cases exist where Undo Log grew to 180GB
- In such cases, rollback takes hours to days
- Rollback continues even after forced termination and restart
PostgreSQL’s Case
PostgreSQL uses WAL (Write-Ahead Log) for MVCC. Unlike Oracle or MySQL, there are no separate rollback segments; multiple row versions are stored in the table itself.
WAL size limits:
- max_wal_size: Maximum WAL size between checkpoints (default 1GB, PostgreSQL 9.5+)
- min_wal_size: Minimum WAL size to maintain for recycling (default 80MB)
- max_wal_size is a soft limit, so it can actually be exceeded (under high load or archiving delays)
- Older versions (9.4 and below) use checkpoint_segments parameter
Transaction size limits:
- Theoretically unlimited as long as disk space allows
- In practice, dead tuples not cleaned by VACUUM accumulate, causing table bloat
- Long transactions prevent VACUUM for other transactions
Practical Batch Transaction Strategies
1. Chunk Processing
Don’t process everything in one transaction. Processing in chunks of 1000 or 10000 records:
- Predictable memory usage
- Partial reprocessing possible on failure
- Minimized lock contention with other operations
2. Checkpoint Method
Record processing progress in a separate table:
- Store last processed ID or timestamp
- Restart from checkpoint on failure
- No need for full rollback
3. Compensation Transactions
Consider compensation transactions that perform opposite operations instead of rollback:
- Batch version of Saga pattern
- Implement compensation logic for each step
- Flexible response to partial failures
4. Temporary Table Utilization
Use temporary tables for large data processing:
- Minimize locks on original tables
- Apply all at once after processing
- Only DROP temporary table on failure
Real-World Case - Lessons from 100 Million Record Settlement Batch
Problem Situation
We had a batch that settled 100 million daily sales records. It started simple. Using MyBatis to query and process all data:
@Transactional
public void settleDailySales(LocalDate date) {
List<Sale> sales = saleMapper.selectDailySales(date);
for (Sale sale : sales) {
processSettlement(sale);
}
}
A month later, this batch died with OutOfMemoryError. Obviously, trying to load 100 million records into memory.
First Improvement - Direct JDBC Usage and Cursor
Feeling MyBatis’s limitations, we decided to use JDBC directly. We processed using Cursor method with PreparedStatement and ResultSet. But problems remained. The transaction became too long, Undo Log grew explosively, and online services started throwing “ORA-01555: snapshot too old” errors.
Second Improvement - Chunk Processing
We changed to process in chunks of 10,000 records while managing JDBC Connections directly. Each chunk was processed in a separate transaction, and after commit, we obtained a new Connection to process the next chunk. But after processing 50 million records, a network failure interrupted the batch, and we couldn’t tell where to restart processing.
Final Solution
We eventually designed it this way:
- Partition-based Parallel Processing: Split daily partitions into 24 hourly chunks
- Checkpoint Table: Record processing status of each chunk in separate table
- Thread Pool Utilization: Parallel processing with 4 threads, each with independent transactions
- Failure Recovery Strategy: Reprocess only failed chunks, manual processing after 3 retries
Results:
- Processing time: 8 hours → 2 hours
- Memory usage: Peak 32GB → Stable 4GB
- Disaster recovery: Full reprocessing → Failed chunks only (average 5 minutes)
Thread and Transaction Debugging Tips
Thread Dump Analysis
When a batch is stuck or won’t terminate, thread dumps are the best debugging tool:
jstack <PID> > thread_dump.txt
kill -3 <PID> # Dump to standard output
Patterns to watch for:
- Threads in BLOCKED state - Possible deadlock
- WAITING on monitor - Synchronization issues
- Many TIMED_WAITING - Possible connection pool exhaustion
Transaction Monitoring
Oracle:
-- Current active transactions and Undo usage
SELECT s.sid, s.serial#, s.username, t.used_ublk, t.used_urec
FROM v$session s, v$transaction t
WHERE s.taddr = t.addr
ORDER BY t.used_ublk DESC;
-- Rollback segment usage
SELECT segment_name, status, tablespace_name,
bytes/1024/1024 as size_mb,
blocks, extents
FROM dba_rollback_segs;
MySQL:
-- InnoDB transaction status
SELECT * FROM information_schema.INNODB_TRX;
-- Check Undo Log size
SELECT NAME, ALLOCATED_SIZE/1024/1024/1024 AS SIZE_GB
FROM INFORMATION_SCHEMA.INNODB_TABLESPACES
WHERE NAME LIKE '%undo%';
-- History List Length (MVCC load indicator)
SHOW ENGINE INNODB STATUS; -- Check TRANSACTIONS section
PostgreSQL:
-- Long-running transactions
SELECT pid, age(clock_timestamp(), query_start), query
FROM pg_stat_activity
WHERE state != 'idle'
AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start;
-- WAL usage
SELECT name, setting, unit
FROM pg_settings
WHERE name IN ('max_wal_size', 'min_wal_size', 'wal_keep_size');
Additional Considerations
The Importance of Connection Pool Settings
An often overlooked aspect of batch processing is Connection Pool configuration:
- validationQuery/testOnBorrow: Essential for long-running batches. You might not realize the DB connection is broken and continue processing, only to fail at commit time.
- maxWait: If too short, it will fail to obtain connections under high load. Set more generously for batches than online services.
- removeAbandoned: Be careful as batches may use a single connection for a long time by nature.
Utilizing Bulk Operations
Performance improves significantly when using bulk processing features provided by each database:
- JDBC Batch: Utilize PreparedStatement.addBatch() and executeBatch()
- MySQL LOAD DATA INFILE: Over 20x faster than INSERT for bulk loading CSV files
- Oracle SQL*Loader: External utility optimized for large data loads
- PostgreSQL COPY: The fastest method for bulk data input
Considering Transaction Isolation Levels
Choose appropriate isolation levels based on batch processing characteristics:
- READ UNCOMMITTED: For statistical batches where speed is more important than accuracy
- READ COMMITTED: Suitable for most batch operations
- REPEATABLE READ: For settlement batches where consistency is critical
- SERIALIZABLE: Rarely used, causes severe performance degradation
Closing
I know Spring Batch or commercial batch frameworks are good. But many real-world systems still run on batch classes that extend Thread. To create stable batches in such environments:
- Understand Thread Lifecycle: The difference between User Thread and Daemon Thread is fundamental
- Split Transactions Small: Don’t try to solve everything with one @Transactional
- Know Database Limits: Understand each DB’s rollback mechanism and limits
- Make Monitoring a Habit: Thread dumps and transaction monitoring are essential
- Design for Failure: Checkpoints and reprocessing strategies are not optional but mandatory
Don’t dismiss legacy systems. They contain 15 years of accumulated business logic and exception handling. New technology is good, but solid fundamentals are more important.
And remember the law: “Batches crash at 3 AM.” Prepare in advance.