Available Technology
A method for determining if two series of events are related, or are merely similar by coincidence
Technology:
Maximum covariance of time-uncertain series for dynamic time warping algorithm
Markets Addressed
Industries that rely on data sets that change with time will be able to use this method to determine just how similar new data sets are to previously obtained and characterized data sets.
For Medical Devices:
Many medical devices monitor the body’s electrical signals over time to determine the health of an individual. Examples include electrocardiography (EKG) and electroencephalography (EEG) for surveillance of the heart and brain, respectively. Anticipatory next-generation medical devices are beginning to monitor other time-dependent signals, such as walking gait, in order to remotely determine the health of patients and allow for quicker medical intervention when such intervention if necessary. This algorithm will find application in comparing patient signals with signals associated with medical events.
For Manufacturing:
Industries that rely on strict quality control in their process engineering, such as the pharmaceutical and chemical industries, often use chromatographic data to evaluate whether a certain process is performing as desired. This data takes the form of signal over time and engineers are trained to visually observe abnormalities in this graph. The algorithm presented here can be used to quantify the quality of chromatographic data mathematically rather than visually. This can be advantageous for reducing user error and expanding the number graphs that can be simultaneously monitored.
For Research:
The burgeoning fields of tissue engineering and stem cell therapy routinely transform one cell-type into another. Often the success of this transformation is determined by gene expression over time. There is currently a focus on quality control of the therapeutic cells, but determining the exact state of the cell can be difficult. The algorithm presented here will allow for gene expression patterns to be compared to standards and the results quantified in a meaningful way. This process may be useful for both the production of therapeutic cells and subsequently ensuring cell identity before use.
Innovations and Advantages
Current computational methods allow for two time-uncertain series of data to be matched to each other, but cannot account for how likely this match was to occur by chance. Consequently, the chance of such a match being a false-positive is not quantified. This lack of confidence is problematic when it is important to classify a time series as signifying a positive or negative result, or when one wants to state that two time series are likely related.
The algorithm presented here was developed by Harvard mathematicians to address this unmet challenge. First, two time-series, one or both of which are time-uncertain, are compared and a maximum covariance (or other ‘goodness of fit’ metric) is determined. This maximum covariance can be determined by any of the well established methods in the field. Next, one of the two series is selected to be randomized under the constraint that it maintains the same statistical characteristics as the original. The non-randomized series is then compared to the random series and a maximum covariance is determined. This process is repeated many times. The result is a certain ratio of random series having an equal or greater maximum covariance than the original series. This ratio represents the possibility that the similarity between the two original series could have arisen by chance.
With this advance, time-series data can not only be correlated but the quality of this correlation can be quantified. This allows for the rejection of a correlation if it is likely to have happened by chance alone.
Additional Information
Intellectual Property Status: Patent(s) pending
Publications:
Aach, J and Church, M. “Aligning gene expression time series with time warping algorithms” Bioinformatics, 17, 495-508 (2000)
Gordon, AD and Buckland, ST. “A permutation test for assessing the similarity of ordered sequences” Mathematical Geology, 28 (1996)
Haam, E and Huybers, P. “A test for the presence of covariance between time-uncertain series of data with application to the Dongge Cave speleothem and atmospheric radiocarbon records” Paleoceanography, PA2209 (2010)
Tweet
Inventor(s):
Haam, Kwang-Chung Eddie
Huybers, Peter John
Categories:
For further information, please contact:
Sam Liss, Director of Business Development
(617) 495-4371
Reference Harvard Case #3929
