Chapter 4
Data Processing
Following set up of the developed system, we proceed with the analysis, appliedfor collected measurements: (1) change point detection, (2) anomaly detection,
(3) activity correlation. Each item is responsible for particular function and
represents different level of analysis. The first technique detects and registers
every unexpected sudden drop/jump during the monitoring process. The next
method, anomaly detection is aiming on detecting and highlighting unusual patterns
in the signal that might represent extra interest for a medical specialist.
Both algorithms are complementary and serve as a pre-step towards the on-line
monitoring system. Ideally, after change point is detected we should add other
parameters and perform correlation analysis. As it was discussed in Chapter 2
fuzzy logic can be used in this case to combine medical parameters and perform
a certain level of reasoning about patients health conditions. However, as
it was previously mentioned, particular qualities of the correlation are not thoroughly
verified when it comes to continuous datasets. Therefore it is essential
to conduct a series of experiments involving patients from different age groups
and with various diagnosis. After collecting this data we can apply the above
mentioned techniques to examine the nature of the correlation between medical
parameters involved in the monitoring. This will help to enhance the final
reasoning algorithm and avoid triggering an alarm when it is not necessary. Finally,
as a separate part of the correlation analysis, we consider accelerometer
data and examine how it can complement and improve reasoning algorithm.
The first part of data processing in the developed system is focused on a change
point detection procedure. Ideally, we consider on-line monitoring of the patient,
which normally should be maintained on a regular” continuous basis.
However, if processing is performed on the phone, it is superfluous to analyze
situations when nothing wrong is happening to the patient and all the parameters
are within a normal range. It might unnecessary sophisticate the process
and affect memory consumption. Thus, there is no use in full-scale analysis unless
one of the parameters is significantly off the normal range or/and abrupt
drop/jump is detected. The first case can be easily controlled by thresholding of
the input signal (pulse rate or oxygen saturation). However, in the second situation
we r”equire more sophisticated analysis. At the same time, apart from
on-line, there is a high demand in off-line processing of the data, where change
point detection can also be effective. It will register every abrupt change in the
signal and give a better perspective on a patients health profile. The algorithm,
described in Chapter 2 [2] is sufficient to be applied for the on-line monitoring,
however, prediction of the data distribution is not among the goals of this thesis.
Another alternative would be calculating a mean value every time step after
a new observation is received. Normally, all the measurements vary within a
certain range represented by a green line on Figure 4.1. Every time next pulse
measurement jumps/drops significantly from this line, we register this point and
mark it as a danger point as it might be potentially dangerous for a patient.
The whole process consists of several steps followed by a plotting command.
We should, firstly, load the file and split the parameters into three categories:
time, pulse and oxygen saturation. The next step excludes all ”corrupted” measurements
and replace them with the previous value in a sequence to calculate
accurate mean value:
¯x
=
1
n
Xn
i=1
xi
, where n is a number of measurements and xi is a current measurement. We
then compare each received measurement with this mean and detect change
points through out the signal. Once the next point is registered we reset this
number and start calculating from zero. It allows us to update a mean value
and split data into different segments through out the signal.
A zoomed in part of the parameter variations on Figure 4.1 demonstrates a
danger point detection process. We can observe two sectors where pulse measurement
first drops and then jumps beyond the normal range. Both cases are
detected and marked as a change/danger point. The green line represents the
mean value of the pulse signal and is used as a main criteria for abnormality.
Oxygen saturation is depicted with a red line and (in case of on-line reasoning)
is addressed after danger point has been detected, initiating correlation analysis.
The pseudo code for selected algorithm is presented below:
( in Java )
-READ FROM FILE
( in C++)
WHILE( x ( i ) )
IF ( incoming s i g n a l i s cor rupt ed )
r e p l a c e ( x ( i ) , x ( i -1) )
mean ( xi )
IF ( x ( i ) } > thr e shold or x ( i ) < thr e shold )
x ( i ) i s a dange r_point
add ( oximetry , a c t i v i t y , age , weight )
fuz z yCor r e l a t ion ( pul se , oximetry , a c t i v i t y , age , weight )
alarmLeve l ( )
END WHILE
( in Java )
-SEND INFORMATION
After performing a change point detection on collected data, we are able to
provide a summary of results for a subsequent analysis by medical specialists
(see Table 5.3 in Chapter 5). However, we can also make one extra step towards
the on-line monitoring system and create a simulation model. This model will
serve as a testing platform before transferring the above mentioned algorithms
directly to the processing device. It can also perform analysis for the collected
data and tests with different types of input. It is essential to confirm reliability
of the system before the on-line analysis take place on a real device. A scheme
of the simulation model on Figure 4.2 contains different medical parameters
and personal data as an input, processing block which is performing change
point detection and a fuzzy logic block.
Figure 4.2: Signal Processing Scheme
After change/danger point is detected, the system should perform decision
making algorithm based on a fuzzy logic block. It can be implemented by attaching
the rest of the parameters and executing correlation analysis with fuzzy
logic rules. However, it is still unclear how the collected measurements should
be processed in terms of correlation. The large amount of continuous data has
not been thoroughly explored yet and the nature of correlation is not verified.
Therefore, we concentrate our attention on the off-line processing and consider
the on-line reasoning options as a future work, which will be discussed in
Chapter 6.
4.2 Anomaly Detection
The next step towards complete correlation analysis in the off-line mode impliesanomaly detection procedure. It is intended to complement change point
detection and has several main distinctions discussed in Section 2.3.2 of Chapter
2. Here, our aim is to search through the whole data set and find pieces
of data which are the most unusual and rare. In many cases, these anomalies
can be a reason for an emergency situation, and thus should be registered
and highlighted for a further medical investigation. This problem has been ad4.2.
ANOMALY DETECTION 47
dressed previously which resulted in a wide choice of algorithms to implement
[7][31][29]. However, a particular approach involving Symbolic Aggregate Approximation
of the signal has proved its efficiency when compared to other
methods[16]. It can be applied for a medical data and significantly reduces
computational demands when it comes to processing of large datasets. We describe
the algorithm and analyze its performance in the next section.
Anomaly Detection with SAX
As previously mentioned, we are looking for a simple and straight forward
technique which would help to find and highlight the most unusual parts of the
data sequence. Assuming the amount of measurements we collect, it is highly
required to employ the algorithm which can significantly simplify data processing
without reducing accuracy. A Symbolic Aggregate Approximation (SAX) is
a relatively novel technique, developed and described by Eamonn Keogh, Jessica
Lin and Ada Fu in their article ”HOT SAX: Finding the most Unusual
Time Series Subsequences: Algorithms and Application”[16]. It was previously
tested on a medical data including anomaly detection in Electrocardiogram and
change detection in patient monitoring[17].
The first step in SAX implementation is data conversion. We need to transform
a numeric data into a symbolic format. This procedure is subdivided into
following parts: with the first step we represent a given time series with a length
n in a w-dimensional space where every element is calculated with a special
equation:
C =
w
n
n
Xwi
j= n
w(i+1)+1
Cj
, in other words we divide data into w equally sized ”frames”, calculate a mean
value of the data within each frame and form a vector C consisting of these
values[16]. This form of data representation is also known as Piecewise Aggregate
Approximation (PAA).
With a next step we apply further transformation to obtain a discrete signal
[16]. Tests on more than 50 datasets showed that normalized subsequences
have highly Gaussian distribution [11]. Thus, it is possible to determine the
”breakpoints” that will produce equal-sized areas under Gaussian curve. So,
after we obtained a PAA of time series, all coefficients that are below the smallest
breakpoint are mapped to symbol ”a”, all coefficients greater or equal than
the smallest breakpoint are mapped to symbol ”b”, etc. The Figure 4.3 below
depicts the algorithm results.
Figure 4.3: A sample signal after mapping
Once a signal has been transformed to a symbolic representation, we can
now perform anomaly detection. The brute force algorithm can be implied as
an algorithm for finding discords. We simply take each possible subsequence
obtained after transformation and find the distance to the nearest non-self
match. The subsequence with the greatest distance is obviously a discord. The
pseudo code of the method is shown below.
Figure 4.4: Brute force algorithm
However, in spite of the obvious simplicity and straight forward implementation
of the current procedure, there is one significant drawback: it has O(m2)
time complexity, which makes it non-applicable in case of large datasets. In
order to improve the algorithm and reduce complexity, the previous code (Figure
4.4) was modified into a new version, depicted on Figure 4.5.
Figure 4.5: Modified brute force algorithm
The main distinction from the earlier method is based on the way we order
and search the discord. This becomes possible due to the following observations:
• In the inner loop we don’t actually need to find the true nearest neighbor
to the current candidate. As soon as we find any subsequence that is
closer to the current candidate than the best_so_far, we can abandon the
instance of the inner loop, safe in the knowledge that the current candidate
not be the time series discord.
• The utility of the above optimization depends on the order in which the
outer loop considers the candidates for the discord, and the order which
the inner loop visits the other subsequences in its attempt to find a sequence
that will allow an early abandon of the inner loop[16].
Unlike the standard brute force algorithm, the modified version is able to break
the searching loop much earlier without going through the entire dataset. At
the same time, a symbolic representation of the dataset can help to define a
particular order for the algorithm which will significantly reduce computational
time. With the introduced improvements we are able to achieve functionality
which will only requires O(m) time.
Table 4.1: Brute force vs. SAX
Number of samples (duration) Brute force SAX
1000 (45 min) 15.6 sec 3.7 sec
5000 (3 h.) 7 min 17.5 sec
10000 (6 h.) 28 min 35.7 sec
A number of tests with different amount of data, combined in Table 4.1
demonstrate a change in the processing time mentioned before. We can obviously
see the direct advantage of using a modified version of the brute force
algorithm. It becomes even more essential in our case assuming the final perspective
of on-line monitoring, huge amount of data and limited computational
power of the processing device. The obtained result of SAX anomaly detection
is illustrated on Figure 4.6.
We use a subplot where the first plot represents a particular section from the
dataset with detected anomalies and the second graph corresponds to a pulse
variation withing the same period of time. This representation makes it easier
to register the time when anomaly has been detected. It will be useful while
creating patients personal health profile in future work.
Another important issue concerns signal correlation. A concrete answer on
pulse/oxygen saturation/activity interrelation should be done after collecting a
moderately large database of measurements. This process presumes a tight collaboration
with physiologists and other specialists. However, our aim on this
stage of the project is to provide a reasonable assistance in a future research.
Thus we consider it useful to make an addition to anomaly detection procedure
and calculate correlation coefficients between pulse and oxygen saturation for
anomaly sections. Whenever discord is detected, we can check a correlation
value of the corresponding index, register signals behavior and follow the tendency
throughout the entire dataset. A special sliding window is used for this
particular purpose which goes through the time series and applies a Matlab
function giving a coefficient as an output. What we have in the end is a number
between 0 and 1 showing the strength of relation between two parameters.
The bigger this number the more one signal proportional to another. In case
this value is negative, pulse and oxygen saturation are inversely proportional
to each other.We provide more results of the processing with SAX in Chapter 5
of the thesis.
4.3 Activity Correlation
An important part of the signal processing is a correlation with the third parameterrepresenting a patients activity during the monitoring time. In some
situations, person’s position or current movement can be a crucial factor in
a decision making algorithm. This issue has been a subject of recent research
within continuous supervision of the health state problem[40].
This particular data, as it was previously mentioned in Chapter 3 comes
directly from the inbuilt accelerometer in the smart-phone device and is stored
in the internal memory of the phone in a special format, consisting of four
columns. Repeating the case with pulse and activity, first column is an exact
time of the measurement. The next three columns are representing x, y, and z
values respectively of the phone’s acceleration.
This data can potentially provide us with patients posture information and
allow to perform a motion detection. It has a potential to improve the analysis
of the monitoring system. According to the research in the Berlin Technical
University[40] a general approach is basically to distinguish between high and
low activity, whereas low activity also is divided into passivity and marginal
activity e.g. caused by slow posture changes during sleep. In order to understand
the term activity, a special equation is used for calculating an empirically
developed activity measure Act:
Act = E[jv2
a - E[v2
a]j]; va =
q
a2
x + a2
y + a2
z
Feature extraction and then classification are executed whenever this value rises
above a threshold for a certain frame of accelerometer data.
Another examples is a fall detection routine which warns a user every time
a fall is detected[41]. The main steps of this functionality are depicted below:
1. Filter data to remove accelerometer offset
2. Look for high acceleration value
3. If found check for high delta acceleration in previous 3 seconds, else return
step 1
4. If found check for change in device orientation over next 10 seconds, else
return to step 1
5. Declare that a fall has occurred, use change in orientation to determine
fall type
6. Issue Alert with time, fall type, mote id
After adding orientation of the accelerometer sensor in relation to the ground
data is further tested with the algorithm described and a fall is detected.
However, at this point, the initial problem is to examine the correlation
between the accelerometer data and a pulse variation. A first step is to match a
particular change in activity to a medical data and vice versa. For this particular
purpose we present both parameters along the time axis. Now, after performing
a fall or any other motion detection we can observe the precise time of the event
and correlate this event to a concrete change in a pulse or oxygen saturation.
Figure 4.7: Pulse variation and acceleration measurements vs. time
All the straight lines on the activity (first) graph (see Figure 4.7) are showing
zero activity level during the night or rest time. They correspond to a low
pulse rate values. The process of correlation should be entirely supervised by
a medical specialists in order to prepare a reliable background for a future advanced
analysis, which can further improve level of the on-line monitoring and
reasoning.
if u like the post just say thank u in comment box.
No comments:
Post a Comment
its cool