Skip to content

Fabrizio1994/ECGClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

240 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arrhythmia detection using on standard ECG databases

Dependencies

The modules are implemented for use with Python 3.x and they consist of the following dependencies:

  • scipy
  • numpy
  • matplotlib

Introduction

The function of human body is frequently associated with signals of electrical, chemical, or acoustic origin.
Extracting useful information from these biomedical signals has been found very helpful in explaining and identifying various pathological conditions.
The most important are the signals which are originated from the heart's electrical activity. This electrical activity of the human heart, though it is quite low in amplitude (about 1 mV) can be detected on the body surface and recorded as an electrocardiogram (ECG) signal.
The ECG arise because active tissues within the heart generate electrical currents, which flow most intensively within the heart muscle itself, and with lesser intensity throughout the body. The flow of current creates voltages between the sites on the body surface where the electrodes are placed.

QRS Region

The QRS complex is the central part of an ECG. It corresponds to the depolarization of the right and left ventricles of the human heart. A Q wave is any downward deflection after the P wave. An R wave follows as an upward deflection, and the S wave is any downward deflection after the R wave. An example of a PQRST segment can be seen in the picture below. QRS complex
The normal ECG signal consists of P, QRS and T waves. The QRS interval is a measure of the total duration of ventricular tissue depolarization.
QRS detection provides the fundamental reference for almost all automated ECG analysis algorithms. Before to perform QRS detection, removal or suppresion of noise is required.

MIT-BIH Arrhythmia Database

The MIT-BIH Arrhytmia DB contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects (records 201 and 202 are from the same subject) studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Of these, 23 were chosen at random from a collection of over 4000 Holter tapes, and the other 25 (the "200 series") were selected to include examples of uncommon but clinically important arrhythmias that would not be well represented in a small random sample.
Each signal contains cardiologists annotations, which describe the behaviour of the signal in the location in which they are placed.

Incart Database

This database consists of 75 annotated recordings extracted from 32 Holter records. Each record is 30 minutes long and contains 12 standard leads, each sampled at 257 Hz. Each signal contains cardiologists annotations, which describe the behaviour of the signal in the location in which they are placed. The algorithm generally places beat annotations in the middle of the QRS complex (as determined from all 12 leads). The locations have not been manually corrected, however, and there may be occasional misaligned annotations as a result.

Annotations

Beat Annotation Meaning
N Normal Beat
L Left bundle branch block beat
R Right bundle branch block beat
B Bundle branch block beat (unspecified)
A Atrial premature beat
a Aberrated atrial premature beat
J Nodal (junctional) premature beat
S Supraventricular premature beat
V Premature ventricular contraction
r R-on-T premature ventricular contraction
F Fusion of ventricular and normal beat
e Atrial escape beat
j Nodal (junctional) escape beat
n Supraventricular escape beat (atrial or nodal)
E Ventricular escape beat
/ Paced beat
f Fusion of paced and normal beat
Q Unclassifiable beat
? Beat not classified during

For the other kind of annotations, which do not refer to QRS region, look at the references.

MIT-BIH ANNOTATIONS

patient A N E S V F R j f / a L e J Q
100 33 2239 0 0 1 0 0 0 0 0 0 0 0 0 0
101 3 1860 0 0 0 0 0 0 0 0 0 0 0 0 2
102 0 99 0 0 4 0 0 0 56 2028 0 0 0 0 0
103 2 2082 0 0 0 0 0 0 0 0 0 0 0 0 0
104 0 163 0 0 2 0 0 0 666 1380 0 0 0 0 18
105 0 2526 0 0 41 0 0 0 0 0 0 0 0 0 5
106 0 1507 0 0 520 0 0 0 0 0 0 0 0 0 0
107 0 0 0 0 59 0 0 0 0 2078 0 0 0 0 0
108 4 1739 0 0 17 2 0 1 0 0 0 0 0 0 0
109 0 0 0 0 38 2 0 0 0 0 0 2492 0 0 0
111 0 0 0 0 1 0 0 0 0 0 0 2123 0 0 0
112 2 2537 0 0 0 0 0 0 0 0 0 0 0 0 0
113 0 1789 0 0 0 0 0 0 0 0 6 0 0 0 0
114 10 1820 0 0 43 4 0 0 0 0 0 0 0 2 0
115 0 1953 0 0 0 0 0 0 0 0 0 0 0 0 0
116 1 2302 0 0 109 0 0 0 0 0 0 0 0 0 0
117 1 1534 0 0 0 0 0 0 0 0 0 0 0 0 0
118 96 0 0 0 16 0 2166 0 0 0 0 0 0 0 0
119 0 1543 0 0 444 0 0 0 0 0 0 0 0 0 0
121 1 1861 0 0 1 0 0 0 0 0 0 0 0 0 0
122 0 2476 0 0 0 0 0 0 0 0 0 0 0 0 0
123 0 1515 0 0 3 0 0 0 0 0 0 0 0 0 0
124 2 0 0 0 47 5 1531 5 0 0 0 0 0 29 0
200 30 1743 0 0 826 2 0 0 0 0 0 0 0 0 0
201 30 1625 0 0 198 2 0 10 0 0 97 0 0 1 0
202 36 2061 0 0 19 1 0 0 0 0 19 0 0 0 0
203 0 2529 0 0 444 1 0 0 0 0 2 0 0 0 4
205 3 2571 0 0 71 11 0 0 0 0 0 0 0 0 0
207 107 0 105 0 105 0 86 0 0 0 0 1457 0 0 0
208 0 1586 0 2 992 373 0 0 0 0 0 0 0 0 2
209 383 2621 0 0 1 0 0 0 0 0 0 0 0 0 0
210 0 2423 1 0 194 10 0 0 0 0 22 0 0 0 0
212 0 923 0 0 0 0 1825 0 0 0 0 0 0 0 0
213 0 3251 0 0 0 0 0 0 0 0 0 0 0 0 0
214 0 0 0 0 256 1 0 0 0 0 0 2003 0 0 2
215 3 3195 0 0 164 1 0 0 0 0 0 0 0 0 0
217 0 244 0 0 162 0 0 0 260 1542 0 0 0 0 0
219 7 2082 0 0 64 1 0 0 0 0 0 0 0 0 0
220 94 1954 0 0 0 0 0 0 0 0 0 0 0 0 0
221 0 2031 0 0 396 0 0 0 0 0 0 0 0 0 0
222 208 2062 0 0 0 0 0 212 0 0 0 0 0 1 0
223 72 2029 0 0 473 14 0 0 0 0 1 0 16 0 0
228 3 1688 0 0 362 0 0 0 0 0 0 0 0 0 0
230 0 2255 0 0 1 0 0 0 0 0 0 0 0 0 0
231 1 314 0 0 2 0 1254 0 0 0 0 0 0 0 0
232 1382 0 0 0 0 0 397 1 0 0 0 0 0 0 0
233 7 2230 0 0 831 11 0 0 0 0 0 0 0 0 0
234 0 2700 0 0 3 0 0 0 0 0 0 0 0 50 0

QRS and R peak detection

The first phase we have to work on is to describe different approaches for encountering the problems of QRS and R peak detection, and evaluating and comparing the results obtained using the same standard ECG databases.
In order to do this, we used three different algorithms:

  • one based on the KNN classifier
  • one based on a heuristic method
  • one based on the PanTompkins approach
pip install wfdb

KNN APPROACH

Signal Preprocessing

Before the classification phase, signals are processed by using a band pass filter, in order to reduce the recording noise that would lead to uncorrect classifications.
The sample frequency is set to 360 sample per second.
We used the butter method from scipy to obtain the filter coefficients in the following way:

b, a = signal.butter(N, Wn, btype="bandpass")

where N is the order of the filter, Wn is a scalar giving the critical frequencies, btype is the type of the filter.
Once we get the coefficients, we apply a digital filter forward and backward to a signal :

filtered_channel = filtfilt(b, a, x)

where b and a are the coefficients we computed above and x is the array of data to be filtered.
Once we filtered the channels, we apply the gradient to the whole signal with the diff function from numpy. It actually calculates the n-th discrete difference along the given axis. After this, we just square the signal to get no zeros and eventually, we normalize the gradient.

Data Loading

Data are available in the PhysioNet website, precisely at the link below:
https://www.physionet.org/physiobank/database/mitdb/
Raw signals are loaded inside the Python module using the wfdb library.

Classification

In this scope, the QRS detection problem is encountered as a binary classification problem:
A signal is decomposed in features of variable size that can be detected either as a QRS complex or not, and considered as input for a KNN classifier.
A portion of the signal is labeled as a QRS complex depending on whether it contains a Beat Annotation.
Then a classifier is trained with the 80% of the length of the signal, while the last 20% is used for testing purpose.
KNN classifiers are trained by means of a 5-fold Cross Validated Grid Search in the following space of parameters :

parameters = {  
'n_neighbors': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19],  
'weights': ['uniform', 'distance'] 
}

The Grid Search provides, for eache signal, the best configuration of parameters according to the accuracy score.
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html for further readings.

Feature Extraction

In this approach, the signal is divided in windows of samples of fixed sizes of 36 samples, which are considered as features. The feature vector is composed by the gradients referred to all samples in the window.
A window is labeled as QRS region if contains a Beat Annotation.

RPEAK DETECTION HEURISTIC

Signal Processing

Also here, the filter chosen is a passband since they maximize the energy of different QRS complexes and reduce the effect of P/T waves, motion artifacts and muscle noise. After filtering, a first-order forward differentiation is applied to emphasize large slope and high-frequency content of the QRS complex. The derivative operation reduces the effect of large P/T waves. A rectification process is then employed to obtain a positive-valued signal that eliminates detection problems in case of negative QRS complexes. In this approach, a new nonlinear transformation based on squaring, thresholding process and Shannon energy transformation is designed to avoid to misconsider some R-peak.

For further information, please see the references.

Pan Tompkins Approach

In order to get all the peaks of each signal we used a matlab version of the algorithm wrote by Pan-Tompkins which you can find in the repository. It takes the signals as input and give a list containing the peaks' location as output.

EVALUATION

MIT-BIH Arrythmia Database

KNN generic Pan Tompkins
signal precision recall precision recall precision recall
100 1 0,962 0,557 0,999 1 1
101 0,997 0,965 0,949 0,989 0,998 0,997
103 1 0,963 0,995 0,99 0,999 1
105 0,976 0,952 0,218 0,713 0,992 0,980
106 0,998 0,934 0,859 0,883 0,996 0,997
108 0,881 0,772 0,065 0,437 0,909 0,763
109 0,998 0,962 0,947 0,976 0,997 1
111 0,999 0,91 0,89 0,836 0,991 0,997
112 1 0,968 0,978 0,462 1 1
113 1 0,968 1 0,832 1 1
114 1 0,97 0,996 0,981 0,733 0,961
115 1 0,942 1 0,95 1 1
116 0,999 0,824 0,877 0,879 0,99 0,999
117 1 0,961 0,995 0,48 1 1
118 0,999 0,962 0,547 0,293 1 1
119 0,996 0,96 0,998 0,325 1 0,999
121 1 0,967 0,009 0,101 0,999 1
122 1 0,967 1 0,224 1 1
123 1 0,968 0,998 0,905 0,998 1
124 1 0,948 0,836 0,644 0,988 0,994
200 0,999 0,956 0,715 0,959 0,997 0,997
201 0,997 0,885 0,881 0,978 0,966 1
202 1 0,965 0,951 0,987 0,996 1
203 0,989 0,735 0,271 0,804 0,946 0,967
205 0,998 0,96 0,765 1 0,996 0,999
207 0,977 0,683 0,604 0,964 0,832 0,880
208 0,932 0,8 0,836 0,958 0,970 0,982
209 0,999 0,964 0,999 0,977 1 1
210 0,995 0,916 0,085 0,97 0,974 0,987
212 1 0,957 0,969 0,958 1 1
213 0,993 0,95 0,989 0,857 0,990 0,990
214 0,999 0,957 0,944 0,887 0,996 0,997
215 1 0,966 0,867 0,996 0,999 1
219 1 0,951 0,992 0,396 1 1
220 1 0,963 1 0,844 1 1
221 1 0,956 0,995 0,911 0,999 1
222 0,949 0,922 0,638 0,857 0,990 0,990
223 1 0,851 0,852 0,929 0,995 1
228 0,995 0,946 0,193 0,705 0,996 0,995
230 1 0,949 0,992 0,992 1 1
231 1 0,957 1 0,998 0,997 1
232 0,999 0,949 0,988 0,948 1 0,999
233 0,996 0,945 0,986 0,939 0,999 1
234 1 0,942 0,988 0,969 0,999 1
average 0,986 0,919 0,789 0,788 0,981 0,987

RR ANALYSIS FOR BEAT CLASSIFICATION

Going on we have to classify the peaks we located with the approaches described above. For this purpose we decided to use a rule based approach described by Tsipouras. The paper we took into account can be found in the references at the end of this readme. Once we labeled the beats, we went on labeling the RR intervals and the results can be found below. Following the rules written in the paper we can finally detect Arrhythmias.

RR RESULTS

SENSITIVITY

100 series

- N PVC VF BII
RPEAK 98% 62% - -
ANNOTATION 99% 85% - -
PAN-TOMPKINS 90% 41% - -

200 series

- N PVC VF BII
RPEAK 88% 35% 1% 100%
ANNOTATION 93% 74% 51% 100%
PAN-TOMPKINS 87% 37% 36% 100%

PRECISION

100 series

- N PVC VF BII
RPEAK 99% 34% - 0%
ANNOTATION 99% 46% - -
PAN-TOMPKINS 98% 20% 0% 0%

200 series

- N PVC VF BII
RPEAK 92% 26% 25% 0.2%
ANNOTATION 98% 55% 50% 0.25%
PAN-TOMPKINS 92% 25% 40% 0.2%

Beat Classification

Weighted SVM

- N S V F Average
Recall 0.795 0.688 0.829 0.928 0.81
Precision 0.985 0.318 0.878 0.05 0.55

Average Accuracy: 0.795 rebalanced dataset, with scale factors [-14, 4, 1, 10] C = 0.1

Neural Network Approach

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors