응용 프로그램 개발자의 데이터베이스 개발 실수

Program Tip

응용 프로그램 개발자의 데이터베이스 개발 실수

programtip 2020. 10. 3. 11:32

응용 프로그램 개발자의 데이터베이스 개발 실수

응용 프로그램 개발자가 저지르는 일반적인 데이터베이스 개발 실수는 무엇입니까?

1. 적절한 지표를 사용하지 않음

이것은 비교적 쉬운 일이지만 여전히 항상 발생합니다. 외래 키에는 인덱스가 있어야합니다. 에서 필드를 사용하는 경우 WHERE(아마도) 인덱스가 있어야합니다. 이러한 인덱스는 종종 실행해야하는 쿼리를 기반으로 여러 열을 포함해야합니다.

2. 참조 무결성을 적용하지 않음

데이터베이스는 여기에서 다를 수 있지만 데이터베이스가 참조 무결성을 지원하는 경우 (즉, 모든 외래 키가 존재하는 엔터티를 가리 키도록 보장됨)이를 사용해야합니다.

MySQL 데이터베이스에서이 오류를 보는 것은 매우 일반적입니다. MyISAM이 지원한다고 생각하지 않습니다. InnoDB는 그렇습니다. MyISAM을 사용하거나 InnoDB를 사용하지만 어쨌든 사용하지 않는 사람들을 찾을 수 있습니다.

여기 더 :

3. 대리 (기술) 기본 키가 아닌 자연 키 사용

자연 키는 (표면적으로) 고유 한 외부 적으로 의미있는 데이터를 기반으로하는 키입니다. 일반적인 예로는 제품 코드, 두 글자로 된 주 코드 (US), 주민등록번호 등이 있습니다. 대리 또는 기술 기본 키는 시스템 외부에서 전혀 의미가없는 키입니다. 이들은 순전히 엔터티를 식별하기 위해 발명되었으며 일반적으로 자동 증가 필드 (SQL Server, MySQL 등) 또는 시퀀스 (특히 Oracle)입니다.

제 생각에는 항상 서로 게이트 키를 사용해야합니다. 이 문제는 다음 질문에서 제기되었습니다.

이것은 당신이 보편적 합의를 얻지 못할 다소 논란의 여지가있는 주제입니다. 자연 키가 어떤 상황에서는 괜찮다고 생각하는 사람들을 찾을 수 있지만, 틀림없이 불필요한 것 외에는 대리 키에 대한 비판을 찾지 못할 것입니다. 당신이 나에게 묻는다면 그것은 아주 작은 단점입니다.

국가 조차도 존재하지 않을 수 있습니다 (예 : 유고 슬라비아).

4. DISTINCT작동 하는 데 필요한 쿼리 작성

ORM 생성 쿼리에서 자주 볼 수 있습니다. Hibernate의 로그 출력을 보면 모든 쿼리가 다음으로 시작하는 것을 볼 수 있습니다.

SELECT DISTINCT ...

이것은 중복 행을 반환하여 중복 개체를 가져 오지 않도록하는 간단한 방법입니다. 때때로 사람들도이 일을하는 것을 볼 수 있습니다. 너무 많이 보면 진짜 위험 신호입니다. 그것은 DISTINCT나쁘거나 유효한 응용 프로그램이 없습니다. 두 가지 모두에 해당하지만 올바른 쿼리를 작성하기위한 대리 또는 임시 방편은 아닙니다.

내가 DISTINCT를 싫어하는 이유 에서 :

제 생각에 상황이 악화되기 시작하는 곳은 개발자가 상당한 쿼리를 작성하고 테이블을 함께 결합 할 때 갑자기 그가 중복 (또는 더 많은) 행과 즉각적인 응답을 얻는 것처럼 보인다 는 것을 깨닫는 것입니다 . 이 "문제"에 대한 그의 "솔루션"은 DISTINCT 키워드를 사용하고 POOF 그의 모든 문제를 해결하는 것입니다.

5. 조인보다 집계 선호

데이터베이스 애플리케이션 개발자가 흔히 저지르는 또 다른 실수는 GROUP BY조인에 비해 집계 (예 : 절)가 얼마나 더 비싼 지 깨닫지 못하는 것 입니다.

이것이 얼마나 널리 퍼져 있는지 알려 드리기 위해 저는 여기에이 주제에 대해 여러 번 글을 썼고 그에 대해 많이 비추천했습니다. 예를 들면 :

From SQL 문- "join"대 "group by and having" :

첫 번째 쿼리 :
SELECT userid
FROM userrole
WHERE roleid IN (1, 2, 3)
GROUP by userid
HAVING COUNT(1) = 3
쿼리 시간 : 0.312 초

두 번째 쿼리 :
SELECT t1.userid
FROM userrole t1
JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2
JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3
AND t1.roleid = 1
쿼리 시간 : 0.016 초

맞습니다. 내가 제안한 조인 버전 은 집계 버전보다 20 배 빠릅니다.

6.보기를 통해 복잡한 쿼리를 단순화하지 않음

모든 데이터베이스 벤더가 뷰를 지원하는 것은 아니지만, 뷰를 지원하는 경우 신중하게 사용하면 쿼리를 크게 단순화 할 수 있습니다. 예를 들어, 한 프로젝트 에서 CRM에 대한 일반 당사자 모델 을 사용했습니다 . 이것은 매우 강력하고 유연한 모델링 기술이지만 많은 조인으로 이어질 수 있습니다. 이 모델에는 다음이 있습니다.

파티 : 사람과 조직;
당사자 역할 : 당사자가 수행 한 작업 (예 : 직원 및 고용주)
당사자 역할 관계 : 해당 역할이 서로 관련되는 방식.

예:

Ted는 Party의 하위 유형 인 Person입니다.
Ted는 많은 역할을 가지고 있으며 그중 하나는 직원입니다.
인텔은 조직이며 당사자의 하위 유형입니다.
인텔은 많은 역할을하고 있으며 그중 하나는 고용주입니다.
Intel은 Ted를 사용합니다. 즉, 각각의 역할간에 관계가 있습니다.

따라서 Ted를 그의 고용주와 연결하기 위해 결합 된 5 개의 테이블이 있습니다. 모든 직원이 개인 (조직이 아님)이라고 가정하고 다음 도우미보기를 제공합니다.

CREATE VIEW vw_employee AS
SELECT p.title, p.given_names, p.surname, p.date_of_birth, p2.party_name employer_name
FROM person p
JOIN party py ON py.id = p.id
JOIN party_role child ON p.id = child.party_id
JOIN party_role_relationship prr ON child.id = prr.child_id AND prr.type = 'EMPLOYMENT'
JOIN party_role parent ON parent.id = prr.parent_id = parent.id
JOIN party p2 ON parent.party_id = p2.id

그리고 갑자기 원하는 데이터를 매우 간단하게 볼 수 있지만 매우 유연한 데이터 모델이 있습니다.

7. 입력을 살균하지 않음

이것은 거대한 것입니다. 이제 저는 PHP를 좋아하지만 무엇을하는지 모르면 공격에 취약한 사이트를 만드는 것이 정말 쉽습니다. Bobby Tables 의 이야기 보다 더 잘 요약하는 것은 없습니다 .

URL, 양식 데이터 및 쿠키 를 통해 사용자가 제공 한 데이터 는 항상 적대적인 것으로 취급되고 삭제되어야합니다. 기대하는 것을 얻고 있는지 확인하십시오.

8. 준비된 진술을 사용하지 않음

준비된 문은 삽입, 업데이트 및 WHERE절에 사용 된 데이터를 뺀 쿼리를 컴파일 한 다음 나중에 제공하는 경우입니다. 예를 들면 :

SELECT * FROM users WHERE username = 'bob'

SELECT * FROM users WHERE username = ?

또는

SELECT * FROM users WHERE username = :username

플랫폼에 따라.

이 작업을 통해 데이터베이스가 무릎을 꿇는 것을 보았습니다. 기본적으로 모든 최신 데이터베이스가 새 쿼리를 만날 때마다 컴파일해야합니다. 이전에 본 쿼리가 발견되면 데이터베이스에 컴파일 된 쿼리와 실행 계획을 캐시 할 수있는 기회를 제공합니다. 쿼리를 많이 수행하면 데이터베이스가이를 파악하고 그에 따라 최적화 할 수있는 기회를 갖게됩니다 (예 : 컴파일 된 쿼리를 메모리에 고정).

준비된 문을 사용하면 특정 쿼리가 사용되는 빈도에 대한 의미있는 통계도 제공됩니다.

준비된 문은 또한 SQL 주입 공격으로부터 사용자를 더 잘 보호합니다.

9. 충분히 정규화하지 않음

데이터베이스 정규화 는 기본적으로 데이터베이스 디자인을 최적화하는 프로세스 또는 데이터를 테이블로 구성하는 방법입니다.

이번 주에 누군가가 배열을 파열하여 데이터베이스의 단일 필드에 삽입 한 코드를 발견했습니다. 정규화는 해당 배열의 요소를 자식 테이블의 별도 행으로 처리하는 것입니다 (예 : 일대 다 관계).

이것은 사용자 ID 목록을 저장하는 최상의 방법 에서도 나왔습니다 .

다른 시스템에서 목록이 직렬화 된 PHP 배열에 저장되는 것을 보았습니다.

그러나 정규화의 부족은 여러 형태로 나타납니다.

더:

10. 너무 많은 정규화

이것은 이전 요점과 모순되는 것처럼 보일 수 있지만 많은 것들과 마찬가지로 정규화는 도구입니다. 그것은 목적을위한 수단이지 그 자체가 목적이 아닙니다. 많은 개발자들이 이것을 잊어 버리고 "수단"을 "끝"으로 취급하기 시작한다고 생각합니다. 단위 테스트가 이에 대한 대표적인 예입니다.

나는 한때 클라이언트를 위해 거대한 계층 구조를 가진 시스템에서 일한 적이 있습니다.

Licensee ->  Dealer Group -> Company -> Practice -> ...

의미있는 데이터를 얻으려면 약 11 개의 테이블을 함께 조인해야했습니다. 너무 멀리 떨어진 정규화의 좋은 예였습니다.

요컨대, 신중하고 고려 된 비정규 화는 성능에 큰 이점을 줄 수 있지만이 작업을 수행 할 때는 정말 조심해야합니다.

더:

11. 독점 호 사용

배타적 아크는 둘 이상의 외래 키를 사용하여 테이블을 생성하는 일반적인 실수이며 그중 하나만 널이 아닐 수 있습니다. 큰 실수. 한 가지는 데이터 무결성을 유지하는 것이 훨씬 더 어려워집니다. 결국, 참조 무결성이 있더라도 둘 이상의 외래 키가 설정되는 것을 막는 것은 없습니다 (복잡한 검사 제약에도 불구하고).

에서 관계형 데이터베이스 디자인에 대한 실용 가이드 :

코드를 작성하는 것이 어색하고 유지 관리 문제가 더 많이 발생할 수 있으므로 가능한 한 독점적 인 아크 생성에 대해 강력히 권고했습니다.

12. 쿼리에 대한 성능 분석을 전혀 수행하지 않음

실용주의는 특히 데이터베이스 세계에서 최고를 지배합니다. 원칙이 교리가되었다는 원칙을 고수한다면 아마도 실수를 한 것입니다. 위에서 집계 쿼리의 예를 살펴 보겠습니다. 집계 버전은 "좋아"보일 수 있지만 성능이 좋지 않습니다. 성능 비교는 논쟁을 끝냈어야했지만 (그러나 그렇지 않았 음) 요점은 더 중요합니다. 애초에 그러한 정보가 부족한 견해를 내세우는 것은 무지하고 위험 할 수도 있습니다.

13. UNION ALL, 특히 UNION 구조에 대한 과도한 의존

SQL 용어의 UNION은 일치하는 데이터 세트를 연결하기 만합니다. 즉, 동일한 유형과 열 수를가집니다. 그들 사이의 차이점은 UNION ALL은 단순한 연결이며 가능한 한 선호되어야하는 반면 UNION은 암시 적으로 DISTINCT를 수행하여 중복 튜플을 제거한다는 것입니다.

DISTINCT와 같은 UNION이 그 자리를 차지합니다. 유효한 신청서가 있습니다. 그러나 특히 하위 쿼리에서 많은 작업을 수행하는 경우 아마도 뭔가 잘못하고있는 것입니다. 쿼리 구성이 잘못되었거나 데이터 모델이 잘못 설계되어 이러한 작업을 수행해야하는 경우 일 수 있습니다.

특히 조인 또는 종속 하위 쿼리에 사용될 때 UNION은 데이터베이스를 손상시킬 수 있습니다. 가능한 한 피하십시오.

14. 쿼리에서 OR 조건 사용

이것은 무해한 것처럼 보일 수 있습니다. 결국 AND는 괜찮습니다. 아니면 괜찮을까요? 잘못된. 기본적으로 AND 조건 은 데이터 세트를 제한 하는 반면 OR 조건 은 데이터 세트를 증가 시키지만 최적화에 적합하지는 않습니다. 특히 다른 OR 조건이 교차하여 최적화 프로그램이 결과에 대한 DISTINCT 연산을 효과적으로 수행하도록 할 수 있습니다.

나쁜:

... WHERE a = 2 OR a = 5 OR a = 11

보다 나은:

... WHERE a IN (2, 5, 11)

이제 SQL 최적화 프로그램이 첫 번째 쿼리를 두 번째 쿼리로 효과적으로 전환 할 수 있습니다. 그러나 그렇지 않을 수도 있습니다. 그냥 하지마.

15. 고성능 솔루션에 적합하도록 데이터 모델을 설계하지 않음

이것은 정량화하기 어려운 점입니다. 일반적으로 그 효과로 관찰됩니다. 상대적으로 간단한 작업에 대한 질의를 작성하거나 상대적으로 간단한 정보를 찾기위한 쿼리가 효율적이지 않다면 데이터 모델이 좋지 않을 수 있습니다.

어떤면에서이 요점은 이전의 모든 사항을 요약하지만 쿼리 최적화와 같은 작업을 두 번째로 수행해야 할 때 먼저 수행하는 경우가 많다는 경고 이야기입니다. 무엇보다도 성능을 최적화하기 전에 좋은 데이터 모델이 있는지 확인해야합니다. Knuth가 말했듯이 :

조기 최적화는 모든 악의 근원입니다

16. 데이터베이스 트랜잭션의 잘못된 사용

특정 프로세스에 대한 모든 데이터 변경은 원자 적이어야합니다. 즉, 작업이 성공하면 완전히 수행됩니다. 실패하면 데이터는 변경되지 않습니다. - '반 완료'변경 가능성이 없어야합니다.

이상적으로이를 달성하는 가장 간단한 방법은 전체 시스템 설계가 단일 INSERT / UPDATE / DELETE 문을 통해 모든 데이터 변경을 지원하도록 노력하는 것입니다. 이 경우 데이터베이스 엔진이 자동으로 처리해야하므로 특별한 트랜잭션 처리가 필요하지 않습니다.

그러나 어떤 프로세스에서 데이터를 일관된 상태로 유지하기 위해 여러 명령문을 하나의 단위로 수행해야하는 경우 적절한 트랜잭션 제어가 필요합니다.

첫 번째 명령문 전에 트랜잭션을 시작하십시오.
마지막 문 이후 트랜잭션을 커밋합니다.
오류가 발생하면 트랜잭션을 롤백하십시오. 그리고 매우 NB! 오류 다음에 오는 모든 문을 건너 뛰거나 중단하는 것을 잊지 마십시오.

또한 이와 관련하여 데이터베이스 연결 계층과 데이터베이스 엔진이 상호 작용하는 방법의 세부 사항에주의를 기울이는 것이 좋습니다.

17. '집합 기반'패러다임을 이해하지 못함

SQL 언어는 특정 종류의 문제에 적합한 특정 패러다임을 따릅니다. 다양한 벤더별 확장에도 불구하고이 언어는 Java, C #, Delphi 등과 같은 언어에서 사소한 문제를 처리하는 데 어려움을 겪습니다.

이러한 이해 부족은 몇 가지 방식으로 나타납니다.

databse에 너무 많은 절차 적 또는 명령 적 논리를 부적절하게 부과합니다.
부적절하거나 과도한 커서 사용. 특히 단일 쿼리로 충분할 때.
트리거가 다중 행 업데이트에 영향을받는 행당 한 번 실행된다고 잘못 가정합니다.

책임의 명확한 구분을 결정하고 적절한 도구를 사용하여 각 문제를 해결하도록 노력하십시오.

개발자가 저지른 주요 데이터베이스 설계 및 프로그래밍 실수

이기적인 데이터베이스 설계 및 사용. 개발자는 데이터에서 다른 이해 관계자의 요구를 고려하지 않고 데이터베이스를 개인 영구 개체 저장소로 취급하는 경우가 많습니다. 이는 애플리케이션 설계자에게도 적용됩니다. 열악한 데이터베이스 설계 및 데이터 무결성은 제 3자가 데이터 작업을 어렵게 만들고 시스템의 수명주기 비용을 크게 증가시킬 수 있습니다. 보고 및 MIS는 응용 프로그램 디자인에서 열악한 사촌이되는 경향이 있으며 사후 고려 사항으로 만 수행됩니다.
비정규 화 된 데이터를 남용합니다. 비정규 화 된 데이터를 과도하게 사용하고 애플리케이션 내에서이를 유지하려는 시도는 데이터 무결성 문제의 원인이됩니다. 비정규 화는 드물게 사용하십시오. 쿼리에 조인을 추가하고 싶지 않다고해서 비정규 화에 대한 변명이 아닙니다.
SQL 작성을 두려워합니다. SQL은 로켓 과학이 아니며 실제로 그 일을 잘 수행합니다. O / R 매핑 계층은 단순하고 해당 모델에 잘 맞는 95 %의 쿼리를 수행하는 데 매우 능숙합니다. 때로는 SQL이 작업을 수행하는 가장 좋은 방법입니다.
독단적 인 '저장 프로 시저 없음'정책. 저장 프로 시저가 나쁘다고 생각하는지 여부에 관계없이 이러한 종류의 독단적 인 태도는 소프트웨어 프로젝트에 적합하지 않습니다.
데이터베이스 디자인을 이해하지 못합니다. 정규화는 당신의 친구이며 로켓 과학 이 아닙니다. 조인과 카디널리티는 매우 간단한 개념입니다. 데이터베이스 응용 프로그램 개발에 참여한다면이를 이해하지 못할 이유가 없습니다.

데이터베이스 스키마에서 버전 제어를 사용하지 않음
라이브 데이터베이스에 대해 직접 작업
고급 데이터베이스 개념 (인덱스, 클러스터형 인덱스, 제약 조건, 구체화 된 뷰 등)을 읽고 이해하지 않음
확장 성 테스트 실패 ... 3 또는 4 행의 테스트 데이터는 실제 라이브 성능에 대한 실제 그림을 제공하지 않습니다.

저장 프로 시저에 대한 과도한 사용 및 / 또는 의존성.

일부 응용 프로그램 개발자는 저장 프로 시저를 중간 계층 / 프런트 엔드 코드의 직접적인 확장으로 간주합니다. 이것은 마이크로 소프트 스택 개발자들에게 공통적 인 특징 인 것처럼 보이며 (저는 하나이지만 그로부터 성장했습니다) 복잡한 비즈니스 로직 및 워크 플로 처리를 수행하는 많은 저장 프로 시저를 생성합니다. 이것은 다른 곳에서 훨씬 낫습니다.

저장 프로시 저는 일부 실제 기술 요소가 사용 (예 : 성능 및 보안)을 필요로한다는 사실이 실제로 입증 된 경우 유용합니다. 예를 들어, 대규모 데이터 세트의 집계 / 필터링을 "데이터에 가깝게"유지합니다.

저는 최근에 비즈니스 로직과 규칙의 70 %가 1400 SQL Server 저장 프로 시저 (나머지 UI 이벤트 처리기)에 구현 된 대규모 Delphi 데스크톱 응용 프로그램을 유지 관리하고 향상시키는 데 도움을 주어야했습니다. 이는 주로 TSQL에 효과적인 단위 테스트를 도입하기 어렵고 캡슐화가 부족하며 도구 (디버거, 편집자)가 부족하기 때문에 악몽이었습니다.

과거에 Java 팀과 함께 일하면서 저는 종종 그 환경에서 완전히 반대되는 경우가 많다는 것을 금방 알게되었습니다. Java Architect는 "데이터베이스는 코드가 아닌 데이터를위한 것입니다."라고 말했습니다.

요즘에는 저장된 procs를 전혀 고려하지 않는 것이 실수라고 생각하지만 유용한 이점을 제공하는 상황에서는 (기본값이 아닌) 아껴서 사용해야합니다 (다른 답변 참조).

첫 번째 문제? 장난감 데이터베이스에서만 테스트합니다. 그래서 그들은 데이터베이스가 커질 때 SQL이 크롤링 될 것이라는 사실을 모르고 누군가가 와서 그것을 나중에 고쳐야합니다 (당신이들을 수있는 소리는 내 이빨이 갈리는 소리입니다).

인덱스를 사용하지 않습니다.

상관 된 하위 쿼리로 인한 성능 저하

대부분의 경우 상관 된 하위 쿼리를 피하려고합니다. 하위 쿼리 내에 외부 쿼리의 열에 대한 참조가있는 경우 하위 쿼리가 상관됩니다. 이 경우 하위 쿼리는 반환 된 모든 행에 대해 적어도 한 번 실행되며 상관 하위 쿼리를 포함하는 조건이 적용된 후 다른 조건이 적용되면 더 많이 실행될 수 있습니다.

인위적인 예와 Oracle 구문을 용서하십시오. 그러나 마지막으로 매장이 하루에 $ 10,000 미만의 매출을 올린 이후 어떤 매장에서든 고용 된 모든 직원을 찾고 싶다고 가정 해 보겠습니다.

select e.first_name, e.last_name
from employee e
where e.start_date > 
        (select max(ds.transaction_date)
         from daily_sales ds
         where ds.store_id = e.store_id and
               ds.total < 10000)

이 예의 하위 쿼리는 store_id에 의해 외부 쿼리와 상호 연관되며 시스템의 모든 직원에 대해 실행됩니다. 이 쿼리를 최적화 할 수있는 한 가지 방법은 하위 쿼리를 인라인보기로 이동하는 것입니다.

select e.first_name, e.last_name
from employee e,
     (select ds.store_id,
             max(s.transaction_date) transaction_date
      from daily_sales ds
      where ds.total < 10000
      group by s.store_id) dsx
where e.store_id = dsx.store_id and
      e.start_date > dsx.transaction_date

In this example, the query in the from clause is now an inline-view (again some Oracle specific syntax) and is only executed once. Depending on your data model, this query will probably execute much faster. It would perform better than the first query as the number of employees grew. The first query could actually perform better if there were few employees and many stores (and perhaps many of stores had no employees) and the daily_sales table was indexed on store_id. This is not a likely scenario but shows how a correlated query could possibly perform better than an alternative.

I've seen junior developers correlate subqueries many times and it usually has had a severe impact on performance. However, when removing a correlated subquery be sure to look at the explain plan before and after to make sure you are not making the performance worse.

In my experience:
Not communicating with experienced DBAs.

Using Access instead of a "real" database. There are plenty of great small and even free databases like SQL Express, MySQL, and SQLite that will work and scale much better. Apps often need to scale in unexpected ways.

Forgetting to set up relationships between the tables. I remember having to clean this up when I first started working at my current employer.

Using Excel for storing (huge amounts of) data.

I have seen companies holding thousands of rows and using multiple worksheets (due to the row limit of 65535 on previous versions of Excel).

Excel is well suited for reports, data presentation and other tasks, but should not be treated as a database.

I'd like to add: Favoring "Elegant" code over highly performing code. The code that works best against databases is often ugly to the application developer's eye.

Believing that nonsense about premature optimization. Databases must consider performance in the original design and in any subsequent development. Performance is 50% of database design (40% is data integrity and the last 10% is security) in my opinion. Databases which are not built from the bottom up to perform will perform badly once real users and real traffic are placed against the database. Premature optimization doesn't mean no optimization! It doesn't mean you should write code that will almost always perform badly because you find it easier (cursors for example which should never be allowed in a production database unless all else has failed). It means you don't need to look at squeezing out that last little bit of performance until you need to. A lot is known about what will perform better on databases, to ignore this in design and development is short-sighted at best.

Not using parameterized queries. They're pretty handy in stopping SQL Injection.

This is a specific example of not sanitizing input data, mentioned in another answer.

I hate it when developers use nested select statements or even functions the return the result of a select statement inside the "SELECT" portion of a query.

I'm actually surprised I don't see this anywhere else here, perhaps I overlooked it, although @adam has a similar issue indicated.

Example:

SELECT
    (SELECT TOP 1 SomeValue FROM SomeTable WHERE SomeDate = c.Date ORDER BY SomeValue desc) As FirstVal
    ,(SELECT OtherValue FROM SomeOtherTable WHERE SomeOtherCriteria = c.Criteria) As SecondVal
FROM
    MyTable c

In this scenario, if MyTable returns 10000 rows the result is as if the query just ran 20001 queries, since it had to run the initial query plus query each of the other tables once for each line of result.

Developers can get away with this working in a development environment where they are only returning a few rows of data and the sub tables usually only have a small amount of data, but in a production environment, this kind of query can become exponentially costly as more data is added to the tables.

A better (not necessarily perfect) example would be something like:

SELECT
     s.SomeValue As FirstVal
    ,o.OtherValue As SecondVal
FROM
    MyTable c
    LEFT JOIN (
        SELECT SomeDate, MAX(SomeValue) as SomeValue
        FROM SomeTable 
        GROUP BY SomeDate
     ) s ON c.Date = s.SomeDate
    LEFT JOIN SomeOtherTable o ON c.Criteria = o.SomeOtherCriteria

This allows database optimizers to shuffle the data together, rather than requery on each record from the main table and I usually find when I have to fix code where this problem has been created, I usually end up increasing the speed of queries by 100% or more while simultaneously reducing CPU and memory usage.

For SQL-based databases:

Not taking advantage of CLUSTERED INDEXES or choosing the wrong column(s) to CLUSTER.
Not using a SERIAL (autonumber) datatype as a PRIMARY KEY to join to a FOREIGN KEY (INT) in a parent/child table relationship.
Not UPDATING STATISTICS on a table when many records have been INSERTED or DELETED.
Not reorganizing (i.e. unloading, droping, re-creating, loading and re-indexing) tables when many rows have been inserted or deleted (some engines physically keep deleted rows in a table with a delete flag.)
Not taking advantage of FRAGMENT ON EXPRESSION (if supported) on large tables which have high transaction rates.
Choosing the wrong datatype for a column!
Not choosing a proper column name.
Not adding new columns at the end of the table.
Not creating proper indexes to support frequently used queries.
creating indexes on columns with few possible values and creating unnecessary indexes.
...more to be added.

Not taking a backup before fixing some issue inside production database.
Using DDL commands on stored objects(like tables, views) in stored procedures.
Fear of using stored proc or fear of using ORM queries wherever the one is more efficient/appropriate to use.
Ignoring the use of a database profiler, which can tell you exactly what your ORM query is being converted into finally and hence verify the logic or even for debugging when not using ORM.

Not doing the correct level of normalization. You want to make sure that data is not duplicated, and that you are splitting data into different as needed. You also need to make sure you are not following normalization too far as that will hurt performance.

Treating the database as just a storage mechanism (i.e. glorified collections library) and hence subordinate to their application (ignoring other applications which share the data)

Dismissing an ORM like Hibernate out of hand, for reasons like "it's too magical" or "not on my database".
Relying too heavily on an ORM like Hibernate and trying to shoehorn it in where it isn't appropriate.

1 - Unnecessarily using a function on a value in a where clause with the result of that index not being used.

Example:

where to_char(someDate,'YYYYMMDD') between :fromDate and :toDate

instead of

where someDate >= to_date(:fromDate,'YYYYMMDD') and someDate < to_date(:toDate,'YYYYMMDD')+1

And to a lesser extent: Not adding functional indexes to those values that need them...

2 - Not adding check constraints to ensure the validity of the data. Constraints can be used by the query optimizer, and they REALLY help to ensure that you can trust your invariants. There's just no reason not to use them.

3 - Adding unnormalized columns to tables out of pure laziness or time pressure. Things are usually not designed this way, but evolve into this. The end result, without fail, is a ton of work trying to clean up the mess when you're bitten by the lost data integrity in future evolutions.

Think of this, a table without data is very cheap to redesign. A table with a couple of millions records with no integrity... not so cheap to redesign. Thus, doing the correct design when creating the column or table is amortized in spades.

4 - not so much about the database per se but indeed annoying. Not caring about the code quality of SQL. The fact that your SQL is expressed in text does not make it OK to hide the logic in heaps of string manipulation algorithms. It is perfectly possible to write SQL in text in a manner that is actually readable by your fellow programmer.

This has been said before, but: indexes, indexes, indexes. I've seen so many cases of poorly performing enterprise web apps that were fixed by simply doing a little profiling (to see which tables were being hit a lot), and then adding an index on those tables. This doesn't even require much in the way of SQL writing knowledge, and the payoff is huge.

Avoid data duplication like the plague. Some people advocate that a little duplication won't hurt, and will improve performance. Hey, I'm not saying that you have to torture your schema into Third Normal Form, until it's so abstract that not even the DBA's know what's going on. Just understand that whenever you duplicate a set of names, or zipcodes, or shipping codes, the copies WILL fall out of synch with each other eventually. It WILL happen. And then you'll be kicking yourself as you run the weekly maintenance script.

And lastly: use a clear, consistent, intuitive naming convention. In the same way that a well written piece of code should be readable, a good SQL schema or query should be readable and practically tell you what it's doing, even without comments. You'll thank yourself in six months, when you have to to maintenance on the tables. "SELECT account_number, billing_date FROM national_accounts" is infinitely easier to work with than "SELECT ACCNTNBR, BILLDAT FROM NTNLACCTS".

Not executing a corresponding SELECT query before running the DELETE query (particularly on production databases)!

The most common mistake I've seen in twenty years: not planning ahead. Many developers will create a database, and tables, and then continually modify and expand the tables as they build out the applications. The end result is often a mess and inefficient and difficult to clean up or simplify later on.

a) Hardcoding query values in string
b) Putting the database query code in the "OnButtonPress" action in a Windows Forms application

I have seen both.

Not paying enough attention towards managing database connections in your application. Then you find out the application, the computer, the server, and the network is clogged.

Thinking that they are DBAs and data modelers/designers when they have no formal indoctrination of any kind in those areas.
Thinking that their project doesn't require a DBA because that stuff is all easy/trivial.
Failure to properly discern between work that should be done in the database, and work that should be done in the app.
Not validating backups, or not backing up.
Embedding raw SQL in their code.

Here is a link to video called ‘Classic Database Development Mistakes and five ways to overcome them’ by Scott Walz

Not having an understanding of the databases concurrency model and how this affects development. It's easy to add indexes and tweak queries after the fact. However applications designed without proper consideration for hotspots, resource contention and correct operation (Assuming what you just read is still valid!) can require significant changes within the database and application tier to correct later.

Not understanding how a DBMS works under the hood.

You cannot properly drive a stick without understanding how a clutch works. And you cannot understand how to use a Database without understanding that you are really just writing to a file on your hard disk.

Specifically:

Do you know what a Clustered Index is? Did you think about it when you designed your schema?
Do you know how to use indexes properly? How to reuse an index? Do you know what a Covering Index is?
So great, you have indexes. How big is 1 row in your index? How big will the index be when you have a lot of data? Will that fit easily into memory? If it won't it's useless as an index.
Have you ever used EXPLAIN in MySQL? Great. Now be honest with yourself: Did you understand even half of what you saw? No, you probably didn't. Fix that.
Do you understand the Query Cache? Do you know what makes a query un-cachable?
Are you using MyISAM? If you NEED full text search, MyISAM's is crap anyway. Use Sphinx. Then switch to Inno.

Using an ORM to do bulk updates
Selecting more data than needed. Again, typically done when using an ORM
Firing sqls in a loop.
Not having good test data and noticing performance degradation only on live data.

참고URL : https://stackoverflow.com/questions/621884/database-development-mistakes-made-by-application-developers

'Program Tip' 카테고리의 다른 글

JavaScript 변수를 설정 해제하는 방법은 무엇입니까? (0)	2020.10.03
보기가 창 계층 구조에없는 UIViewController에 UIViewController를 제공하려고합니다. (0)	2020.10.03
std :: move () 란 무엇이며 언제 사용해야합니까? (0)	2020.10.03
ExpandoObject의 진정한 이점은 무엇입니까? (0)	2020.10.03
모든 정규식 일치 (0)	2020.10.03

현재글응용 프로그램 개발자의 데이터베이스 개발 실수

programtip

응용 프로그램 개발자의 데이터베이스 개발 실수

응용 프로그램 개발자의 데이터베이스 개발 실수

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

티스토리툴바

응용 프로그램 개발자의 데이터베이스 개발 실수

응용 프로그램 개발자의 데이터베이스 개발 실수

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

관련글

티스토리툴바