Bulk Insert (大量 Insert) 的三種技術

羅國榮 2021/12/22 18:37:16

4 15318

前言

大量 Insert 資料至資料庫是很常會用到的實務功能，例如: 匯入客戶名單，增加入庫貨品...等等。

在Java的技術領域裡，可以做到「高效率」大量 Insert 資料的技術不只有一種，分別在JDBC、JdbcTemplate、JPA都有支援。

開發環境

Java 11

MySQL Community 8.0.27.0

Spring Tool Suite 4.11.1

建立 Spring Boot 專案

利用 Spring Tool Suite 來建立 Spring Boot 專案非常的方便快速。

步驟1：新增專案

步驟2：設定專案名稱

步驟3：選擇套件

步驟4：專案建立完成

步驟5：設定 application.properties

server.port=8080

logging.level.com.example=debug

# JPA
spring.jpa.hibernate.ddl-auto=create
spring.jpa.show-sql=false
spring.jpa.properties.hibernate.format_sql=false
#spring.jpa.properties.hibernate.jdbc.batch_size=10
#spring.jpa.properties.hibernate.order_inserts=true

# MySQL
spring.datasource.url=jdbc:mysql://localhost:3306/demo
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

※ 在使用 JdbcTemplate 機制處理大量 Insert 時，將 spring.jpa.properties.hibernate.jdbc.batch_size 和 spring.jpa.properties.hibernate.order_inserts 的註解打開，有機會改善 Insert 效能。(效能的改善會受限於 JDBC 驅動程式和資料庫)

步驟6：建立 Entity Bean

package com.example.model;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;

import lombok.Data;

@Data
@Entity
public class Customer {

	@Id
	@Column
	@GeneratedValue(strategy = GenerationType.IDENTITY)
	private Long id;

	private String firstName;

	private String lastName;

	public Customer(String firstName, String lastName) {
		this.firstName = firstName;
		this.lastName = lastName;
	}

}

※ 由於有引入 Lombok 套件，於是可以使用 @Data 注釋自動建立 getter & setter 函式，可以簡化不少建立 getter & setter 函式的操作。

※ 畫面上雖然看不到 getter & setter 函式，其卻是真實存在著，可從 Outline 視窗中看得到，如下圖

步驟7：建立 Repository Bean

package com.example.Repository;

import org.springframework.data.jpa.repository.JpaRepository;

import com.example.model.Customer;

public interface CustomerRepository extends JpaRepository<Customer, Long> {
}

步驟8：建立 Controller Bean

package com.example.controller;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

import javax.sql.DataSource;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import com.example.Repository.CustomerRepository;
import com.example.model.Customer;

import lombok.extern.slf4j.Slf4j;

@Slf4j
@RestController
public class CustomerController {

	@Autowired
	CustomerRepository customerRepository;

	@Autowired
	JdbcTemplate jdbcTemplate;

	@Autowired
	DataSource dataSource;

	static final int BATCH_SIZE = 1000;

	/**
	 * 準備要 Insert 的 Customer List
	 * 
	 * @return
	 */
	private List<Customer> prepareCustomerList() {
		List<Customer> customers = new ArrayList<>();
		for (int i = 0; i < BATCH_SIZE; i++) {
			customers.add(new Customer("firstName", "lastName"));
		}
		return customers;
	}

//...... 後續程式碼，在文章中列出並說明

}

※ 由於有引入 Lombok 套件，於是可以使用 @Slf4j 注釋讓其自動建立 log 物件，便於使用 log 物件輸出訊息。例: log.debug("......");

範例說明

一、使用 JPA 大量新增資料

	/**
	 * 利用 JPA 機制的 saveAll() 新增大批資料
	 * 
	 * @return
	 */
	@GetMapping("/jpaCase1")
	public ResponseEntity<String> jpaCase1() {
		List<Customer> customers = prepareCustomerList();

		long start = System.currentTimeMillis();

		customerRepository.saveAll((Iterable<Customer>) customers);

		long end = System.currentTimeMillis();

		String result = String.format("Total time: %d 毫秒", (end - start));

		log.debug(result);

		return ResponseEntity.ok(result);
	}

執行結果：

執行時間：540 毫秒

重點說明：

1) 使用 JPA 機制處理大量 Insert ，需使用 saveAll() ，才能發生批次新增的效果。

2) 可以用很簡單的程式語法就能夠做到批次新增。

3) 首次執行花費時間 540 毫秒(次快)。

二、使用 JdbcTemplate 大量新增資料

	/**
	 * 利用 JDBCTemplate 新增大批資料
	 * 
	 * @return
	 */
	@GetMapping("/jdbcTemplateCase1")
	public ResponseEntity<String> jdbcTemplateCase1() {
		String sql = "INSERT INTO customer (first_name, last_name) VALUES (?,?)";

		List<Customer> customers = prepareCustomerList();

		long start = System.currentTimeMillis();

		jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
			@Override
			public void setValues(PreparedStatement ps, int i) throws SQLException {
				ps.setString(1, customers.get(i).getFirstName());
				ps.setString(2, customers.get(i).getLastName());
			}

			@Override
			public int getBatchSize() {
				return BATCH_SIZE;
			}
		});

		long end = System.currentTimeMillis();

		String result = String.format("Total time: %d 毫秒", (end - start));

		log.debug(result);

		return ResponseEntity.ok(result);
	}

執行結果：

執行時間: 3026 毫秒

重點說明：

1) 使用 JdbcTemplat 機制處理大量 Insert ，需使用 batchUpdate() 搭配 BatchPreparedStatementSetter 物件，才能發生批次新增的效果。

2) 程式語法的寫法比 JPA 機制略為複雜，不過仍是很容易上手。

3) 首次執行花費時間 3026 毫秒(最慢)。

三、使用 JDBC 大量新增資料

	/**
	 * 利用 JDBC 新增大批資料
	 * 
	 * @return
	 * @throws SQLException
	 */
	@GetMapping("/jdbcCase")
	public ResponseEntity<String> jdbcCase() {
		Connection connection = null;
		PreparedStatement pstmt = null;
		long end = 0L;
		String result = "";

		String sql = "INSERT INTO customer (first_name, last_name) VALUES (?,?)";

		List<Customer> customers = prepareCustomerList();

		long start = System.currentTimeMillis();

		try {
			connection = dataSource.getConnection();

			connection.setAutoCommit(false);

			pstmt = connection.prepareStatement(sql);
			
			for (Customer customer : customers) {
				pstmt.setString(1, customer.getFirstName());
				pstmt.setString(2, customer.getLastName());
				pstmt.addBatch();
			}

			pstmt.executeBatch();

			end = System.currentTimeMillis();

			result = String.format("Total time: %d 毫秒", (end - start));

			connection.commit();
		} catch (SQLException e) {
			try {
				connection.rollback();
			} catch (SQLException sqlException) {
				sqlException.printStackTrace();
			}
		} finally {
			if (pstmt != null) {
				try {
					pstmt.close();
				} catch (SQLException e) {
					log.debug("preparedStatement.close() fail.", e);
				}
			}

			if (connection != null) {
				try {
					connection.setAutoCommit(true);
					connection.close();
				} catch (SQLException e) {
					log.debug("connection.close() fail.", e);
				}
			}
		}

		log.debug(result);

		return ResponseEntity.ok(result);
    }