Excel : Scan Numeric Data From a Time Series Plot Image

This post introduces a very nice scanning excel tool for scanning numerical values approximately from a plot image. We sometimes encounter the case when some academic papers show a figure plotting data which is hard to find. In this case, I think this might be useful.


Scan Numeric Data From a Figure Image by using Excel



Sometimes we might find ourself trying to find a raw data used in an academic paper but it is hard to find in many cases. In this case, we would think that if we read numeric value from a figure which plot data, it will be very good even though it's not that accurate. Since our purpose is to replicate results of that article approximately.

I found an interesting Excel tool for this purpose. So I just introduce how to use it. You can find a sample Excel file (DigitGraph.xls) from the following blog.



Data Plot


The following figure is a plot of a time series of yield spreads from an academic paper but it is hard to find this data. This data ranges from 1995/01 ~ 2007/03 and the number of observations is 147.
Excel : Scan Numeric Data From a Time Series Plot Image


How to use


Given the above plot image and template Excel file (DigitGraph.xls) downloaded which has some vba macro function, by applying the following procedures, we can get a set of generated numerical values from the plot image.


  1. Paste a graph image
  2. Insert a rectangular box and name it as "Axes"
  3. draw one freeform shape line over a real time series (mimic a time series) and name it as "measured"
  4. call a vba macro function like with CTRL+SHIFT+ENTER
    "=digitgraph("measured", max X, max Y, min X, min Y)"
  5. apply a linear interpolation to match the time scale


Numeric arguments of digitgraph() function are max X, max Y, min X, min Y with the first fixed string argument "measured".

There are two points to note. First, drawing a freeform shape line on a time series is essentially mimicking that time series so that it involves some numerical difference since we can't draw it exactly only one time.


Second, the 2nd and 3rd arguments of digitgraph() function is min X and min Y from the Axes box respectively and the 4th and 5th arguments are max X and max Y from the Axes box respectively.

Therefore, we need to be positioning the Axes box appropriately to measure this min and max values easily.


Carefully I try to draw the freeform shape line that passes as close as possible to the original time series. I set max of X and Y as 147 (the number of last observation) and 30 ( maximum of Y axis) respectively and also set min X and Y as 1 (the number of first observation) and -20 (minimum of Y axis).



Descriptions of Each Step


1) Paste a image which is a drawing of a plot of a time series.
Excel : Scan Numeric Data From a Time Series Plot Image


2) Insert a rectangular box and name it as "Axes"
Excel : Scan Numeric Data From a Time Series Plot Image


3) draw one freeform shape line over a real time series (mimic a time series) by a mouse or an electronic pen and name it as "measured"
Excel : Scan Numeric Data From a Time Series Plot Image


Based on the box and freeform shape line, I use ("measured", 147,30,1, -20) as arguments.

In sum, a rectangular box and freeform shape line are superimposed on the original image plot as follows.


Excel : Scan Numeric Data From a Time Series Plot Image


Run digitgraph() function with CTRL+SHIFT+ENTER


Now we call a vba macro function like "=digitgraph("measured",147,30,1,-20)" with CTRL+SHIFT+ENTER like the next figure. In this case the number of generated x-y pair is 685.

Sometimes when the neighborhoods of the last point has some inconsistency but we can just select the appropriate value for the last point. It is no problem since our purpose is to approximate the original time series which is not known to us now.


In particular, it is necessary to call this function with two columns of 685 cells selected (dragged). But we don't know 685 in advance. Therefore, we need to know the number of total rows from a trial run and then use the exact number of total rows.

Excel : Scan Numeric Data From a Time Series Plot Image

The time scale of the generated X values is not the same as that of the original time series. Therefore I apply a linear interpolation and match the same time scale of the original one and finally can get a set of generated data which contains a total of 147 observations.



Final result with a linear Interpolation


The original data ranges from 1995/01 to 2007/03 as explained earlier. Therefore I assign the appropriate dates to the corresponding row numbers and compare two graphs from the generated and original data.

Excel : Scan Numeric Data From a Time Series Plot Image

It is worth noting that the generated result is dependent on the degree to which mimicking result is affected by precise scales.

Period.


No comments:

Post a Comment