Program Tip

SQL에서 UPDATE는 항상 DELETE + INSERT보다 빠릅니까?

programtip 2020. 12. 3. 19:08
반응형

SQL에서 UPDATE는 항상 DELETE + INSERT보다 빠릅니까?


다음 필드가있는 간단한 테이블이 있다고 가정합니다.

  1. ID : int, 자동 증분 (ID), 기본 키
  2. 이름 : varchar (50), 고유, 고유 색인 있음
  3. 태그 : int

내 응용 프로그램은 항상 이름 필드 작업을 기반으로하기 때문에 조회에 ID 필드를 사용하지 않습니다.

때때로 태그 값을 변경해야합니다. 다음과 같은 간단한 SQL 코드를 사용하고 있습니다.

UPDATE Table SET Tag = XX WHERE Name = YY;

위의 내용이 항상 다음보다 빠른지 아는 사람이 있는지 궁금합니다.

DELETE FROM Table WHERE Name = YY;
INSERT INTO Table (Name, Tag) VALUES (YY, XX);

다시-두 번째 예에서 ID가 변경되었음을 알고 있지만 내 응용 프로그램에는 중요하지 않습니다.


이 답변에 너무 늦었지만 비슷한 질문에 직면했기 때문에 동일한 컴퓨터에서 JMeter와 MySQL 서버를 사용하여 테스트했습니다.

  1. 두 개의 JDBC 요청 (Delete 및 Insert 문)이 포함 된 트랜잭션 컨트롤러 (상위 샘플 생성)
  2. Update 문을 포함하는 별도의 JDBC 요청입니다.

500 루프에 대한 테스트를 실행 한 후 다음과 같은 결과를 얻었습니다.

DEL + INSERT-평균 : 62ms

업데이트-평균 : 30ms

결과 : 결과


테이블이 클수록 (열 수 및 크기) 업데이트보다 삭제 및 삽입하는 데 더 많은 비용이 듭니다. UNDO와 REDO의 대가를 지불해야하기 때문입니다. DELETE는 UPDATE보다 UNDO 공간을 더 많이 사용하며 REDO에는 필요한 것보다 두 배 많은 문이 포함됩니다.

게다가, 그것은 비즈니스 관점에서 명백한 잘못입니다. 해당 테이블에 대한 개념적 감사 추적을 이해하는 것이 얼마나 더 어려울 지 고려하십시오.


이전 테이블에서 CTAS를 사용하여 새 테이블을 생성하고 (SELECT 절의 프로젝션에서 업데이트를 적용) 이전 테이블을 삭제하고 이름을 바꾸는 것이 더 빠른 테이블의 모든 행에 대한 대량 업데이트와 관련된 몇 가지 시나리오가 있습니다. 새 테이블. 부작용은 인덱스 생성, 제약 관리 및 권한 갱신이지만 고려할 가치가 있습니다.


같은 행에있는 하나의 명령은 항상 같은 행에있는 두 개보다 빠릅니다. 따라서 UPDATE만이 더 좋을 것입니다.

편집 테이블 설정 :

create table YourTable
(YourName  varchar(50)  primary key
,Tag int
)

insert into YourTable values ('first value',1)

내 시스템 (SQL Server 2005)에서 1 초가 소요됩니다.

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
    Set @x=@x+1
    update YourTable set YourName='new name' where YourName='new name'
    SET @y=@y+@@ROWCOUNT
end
print @y

내 시스템에서 2 초가 걸렸습니다.

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
    Set @x=@x+1
    DELETE YourTable WHERE YourName='new name'
    insert into YourTable values ('new name',1)
    SET @y=@y+@@ROWCOUNT
end
print @y

질문의 본문이 제목 질문과 관련이 없습니다.

제목에 답하려면 :

SQL에서 UPDATE는 항상 DELETE + INSERT보다 빠릅니까?

그러면 대답은 아니오입니다!

그냥 Google

  • "고가의 직접 업데이트"* "sql 서버"
  • "지연된 업데이트"* "sql 서버"

이러한 업데이트는 직접 삽입 + 업데이트보다 삽입 + 업데이트를 통한 업데이트 실현에 더 많은 비용 (더 많은 처리)을 초래합니다. 다음과 같은 경우입니다.

  • 하나는 고유 (또는 기본) 키로 필드를 업데이트하거나
  • 새 데이터가 할당 된 업데이트 전 행 공간 (또는 최대 행 크기)에 맞지 않을 때 (더 큰 경우), 결과적으로 조각화,
  • 기타

나의 빠른 (비 완전한) 검색은 하나를 덮는 척하지 않고 나에게 [1], [2]

[1]
Update Operations
(Sybase® SQL Server Performance and Tuning Guide
Chapter 7: The SQL Server Query Optimizer)
http://www.lcard.ru/~nail/sybase/perf/11500.htm
[2]
UPDATE Statements May be Replicated as DELETE/INSERT Pairs
http://support.microsoft.com/kb/238254


Just tried updating 43 fields on a table with 44 fields, the remaining field was the primary clustered key.

The update took 8 seconds.

A Delete + Insert is faster than the minimum time interval that the "Client Statistics" reports via SQL Management Studio.

Peter

MS SQL 2008


Keep in mind the actual fragmentation that occurs when DELETE+INSERT is issued opposed to a correctly implemented UPDATE will make great difference by time.

Thats why, for instance, REPLACE INTO that MySQL implements is discouraged as opposed to using the INSERT INTO ... ON DUPLICATE KEY UPDATE ... syntax.


In your case, I believe the update will be faster.

Remember indexes!

You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.

An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.


What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'

Is either choice better than the other, or are you screwed both ways?


Delete + Insert is almost always faster because an Update has way more steps involved.

Update:

  1. Look for the row using PK.
  2. Read the row from disk.
  3. Check for which values have changed
  4. Raise the onUpdate Trigger with populated :NEW and :OLD variables
  5. Write New variables to disk (The entire row)

    (This repeats for every row you're updating)

Delete + Insert:

  1. Mark rows as deleted (Only in the PK).
  2. Insert new rows at the end of the table.
  3. Update PK Index with locations of new records.

    (This doesn't repeat, all can be perfomed in a single block of operation).

Using Insert + Delete will fragment your File System, but not that fast. Doing a lazy optimization on the background will allways free unused blocks and pack the table altogether.


Obviously, the answer varies based on what database you are using, but UPDATE can always be implemented faster than DELETE+INSERT. Since in-memory ops are mostly trivial anyways, given a hard-drive based database, an UPDATE can change a database field in-place on the hdd, while a delete would remove a row (leaving an empty space), and insert a new row, perhaps to the end of the table (again, it's all in the implementation).

The other, minor, issue is that when you UPDATE a single variable in a single row, the other columns in that row remain the same. If you DELETE and then do an INSERT, you run the risk of forgetting about other columns and consequently leaving them behind (in which case you would have to do a SELECT before your DELETE to temporarily store your other columns before writing them back with INSERT).


The question of speed is irrelevant without a specific speed problem.

If you are writing SQL code to make a change to an existing row, you UPDATE it. Anything else is incorrect.

If you're going to break the rules of how code should work, then you'd better have a damn good, quantified reason for it, and not a vague idea of "This way is faster", when you don't have any idea what "faster" is.


It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.

I'm not saying I'm aware of any product that does this, but it's perfectly legal.


Every write to the database has lots of potential side effects.

Delete: a row must be removed, indexes updated, foreign keys checked and possibly cascade-deleted, etc. Insert: a row must be allocated - this might be in place of a deleted row, might not be; indexes must be updated, foreign keys checked, etc. Update: one or more values must be updated; perhaps the row's data no longer fits into that block of the database so more space must be allocated, which may cascade into multiple blocks being re-written, or lead to fragmented blocks; if the value has foreign key constraints they must be checked, etc.

For a very small number of columns or if the whole row is updated Delete+insert might be faster, but the FK constraint problem is a big one. Sure, maybe you have no FK constraints now, but will that always be true? And if you have a trigger it's easier to write code that handles updates if the update operation is truly an update.

Another issue to think about is that sometimes inserting and deleting hold different locks than updating. The DB might lock the entire table while you are inserting or deleting, as opposed to just locking a single record while you are updating that record.

In the end, I'd suggest just updating a record if you mean to update it. Then check your DB's performance statistics and the statistics for that table to see if there are performance improvements to be made. Anything else is premature.

An example from the ecommerce system I work on: We were storing credit-card transaction data in the database in a two-step approach: first, write a partial transaction to indicate that we've started the process. Then, when the authorization data is returned from the bank update the record. We COULD have deleted then re-inserted the record but instead we just used update. Our DBA told us that the table was fragmented because the DB was only allocating a small amount of space for each row, and the update caused block-chaining since it added a lot of data. However, rather than switch to DELETE+INSERT we just tuned the database to always allocate the whole row, this means the update could use the pre-allocated empty space with no problems. No code change required, and the code remains simple and easy to understand.


특정 경우에 Delete + Insert를 사용하면 시간을 절약 할 수 있습니다. 30000 홀수 행이있는 테이블이 있고 데이터 파일을 사용하여 이러한 레코드를 매일 업데이트 / 삽입합니다. 업로드 프로세스는 레코드가 이미 존재하므로 업데이트 문의 95 %를 생성하고 존재하지 않는 항목에 대한 삽입의 5 %를 생성합니다. 또는 임시 테이블에 데이터 파일 레코드를 업로드하고 임시 테이블의 레코드에 대한 대상 테이블을 삭제 한 다음 임시 테이블에서 동일한 레코드를 삽입하면 시간이 50 % 증가했습니다.

참고 URL : https://stackoverflow.com/questions/1271641/in-sql-is-update-always-faster-than-deleteinsert

반응형