Scan Numeric Data From a Figure Image by using Excel
Sometimes we might find ourself trying to find a raw data used in an academic paper but it is hard to find in many cases. In this case, we would think that if we read numeric value from a figure which plot data, it will be very good even though it's not that accurate. Since our purpose is to replicate results of that article approximately.
I found an interesting Excel tool for this purpose. So I just introduce how to use it. You can find a sample Excel file (DigitGraph.xls) from the following blog.
Data Plot
The following figure is a plot of a time series of yield spreads from an academic paper but it is hard to find this data. This data ranges from 1995/01 ~ 2007/03 and the number of observations is 147.
How to use
Given the above plot image and template Excel file (DigitGraph.xls) downloaded which has some vba macro function, by applying the following procedures, we can get a set of generated numerical values from the plot image.
|
Numeric arguments of digitgraph() function are max X, max Y, min X, min Y with the first fixed string argument "measured".
There are two points to note. First, drawing a freeform shape line on a time series is essentially mimicking that time series so that it involves some numerical difference since we can't draw it exactly only one time.
Second, the 2nd and 3rd arguments of digitgraph() function is min X and min Y from the Axes box respectively and the 4th and 5th arguments are max X and max Y from the Axes box respectively.
Therefore, we need to be positioning the Axes box appropriately to measure this min and max values easily.
Carefully I try to draw the freeform shape line that passes as close as possible to the original time series. I set max of X and Y as 147 (the number of last observation) and 30 ( maximum of Y axis) respectively and also set min X and Y as 1 (the number of first observation) and -20 (minimum of Y axis).
Descriptions of Each Step
1) Paste a image which is a drawing of a plot of a time series.
2) Insert a rectangular box and name it as "Axes"
3) draw one freeform shape line over a real time series (mimic a time series) by a mouse or an electronic pen and name it as "measured"
Based on the box and freeform shape line, I use ("measured", 147,30,1, -20) as arguments.
In sum, a rectangular box and freeform shape line are superimposed on the original image plot as follows.
Run digitgraph() function with CTRL+SHIFT+ENTER
Now we call a vba macro function like "=digitgraph("measured",147,30,1,-20)" with CTRL+SHIFT+ENTER like the next figure. In this case the number of generated x-y pair is 685.
Sometimes when the neighborhoods of the last point has some inconsistency but we can just select the appropriate value for the last point. It is no problem since our purpose is to approximate the original time series which is not known to us now.
In particular, it is necessary to call this function with two columns of 685 cells selected (dragged). But we don't know 685 in advance. Therefore, we need to know the number of total rows from a trial run and then use the exact number of total rows.
The time scale of the generated X values is not the same as that of the original time series. Therefore I apply a linear interpolation and match the same time scale of the original one and finally can get a set of generated data which contains a total of 147 observations.
Final result with a linear Interpolation
The original data ranges from 1995/01 to 2007/03 as explained earlier. Therefore I assign the appropriate dates to the corresponding row numbers and compare two graphs from the generated and original data.
It is worth noting that the generated result is dependent on the degree to which mimicking result is affected by precise scales.
Period.
No comments:
Post a Comment