Sunday, 19 July 2015

Mini Blog: Classical Cryptography's Hill Cypher

The contents of this blog post today will detail about a cipher in classical cryptography. It is not used nowadays because it is easily vulnerable. In fact, the way it works can also be understandable to any high schooler or first year college student who has an elementary-level knowledge of linear algebra.

Say you have a message you want to decrypt, e.g. "I like pie". We convert each character into a numerical representation (e.g. perhaps ASCII). Then we form a matrix out of these characters. Let's call it matrix B.

Let's use the concept of matrix multiplication and invertible matrices in order to decrypt and encrypt this matrix B message.

We can easily encrypt it by matrix-multiplying it with an arbitrary matrix A. The resulting matrix, AB, will be sent as the "encrypted message". Note that of course, the number of columns in matrix A must be equal to the number of rows in matrix B for matrix multiplication of A and B to work.

Now, you can send this encrypted message AB to anyone. But how can they decrypt it? Simple. We can use the property of invertible matrices.

(A^-1)(AB) = B

So, we just need to multiply the inverse of matrix A to the encrypted message AB, to get back the decrypted message B. Note that A must be an invertible matrix then.


Saturday, 4 July 2015

MiniBlog: Paint Applications, and Splines

My friend and I recently had a conversation where he and his dad were talking about splines, which reminded me of how my professor taught me a good software application for splines.

What are Splines?

I honestly think the Wikipedia definition is much more clear than whatever way I could define a spline, so here you go:

In mathematics, a spline is a numeric function that is piecewise-defined by polynomial functions, and which possesses a sufficiently high degree of smoothness at the places where the polynomial pieces connect (which are known as knots).
In interpolating problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results to interpolating with higher degree polynomials while avoiding instability due to Runge's phenomenon. In computer graphics, parametric curves whose coordinates are given by splines are popular because of the simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes through curve fitting and interactive curve design.

Polynomial interpretation is where you have a set of dots/points, and you try to fit ONE polynomial function over the whole thing. Runge's phenomenon is where you try to fit a high-degree polynomial function over a set of equidistant data points, but then near the edges, really bad and big oscillations occur, which isn't really good for data analysis and prediction!

Rutger's phenomenon, where the blue curve is a 5th order interpolating polynomial, and the green which is really oscillating, is a 9th order interpolating polynomial

The problem of most simple drawing applications

When I was a newb at android application development, I tried to make a simple drawing application to learn more about touch gestures of a mobile device. Android has good APIs for gesture detection. In your custom Canvas class that you make for your drawing application, you can override the OnTouchEvent method and handle the incoming user gestures.

To draw what the user draws, you simply have to get the x-y point when the user first presses down on the screen, and then draw a line between that point and the point the user moves to next. This means that as the user moves his finger across the screen, Android is getting x-y coordinates every XXX milliseconds, and each XXX milliseconds, Android draws a line between the previous x-y coordinates the user was at and the current x-y coordinates the user is currently at.

This has some problems, of course. Let's say the user swiped his finger really fast in a short amount of time. Because the Android API only gets x-y coordinates of the user's finger vs. time every XXX milliseconds, the resulting line that the user drew on the drawing application doesn't look smooth. It looks like a piece-wise function of straight lines connected to each other by dots, instead of a smooth curve. Below is an illustration I made about this:

Notice the lack of smoothness due to the C shaped curve being composed of multiple straight lines

So, in order to fix this, all you would have to do is take the history of all these points as the user first initially presses down, until the user releases his finger, and using that list of point objects, you use spline interpolation to make the whole line smoother, and then draw that smooth line onto the canvas!

Edge Detection For Dummies

Author's Note: I'm working on an application that uses an edge detection library at Microsoft. Obviously, as a lowly undergraduate intern that I am, all I need to do is call a method and it just magically detects edges on images. I wanted to understand more about edge detection and not just use "a library". This blog post is part of my "for dummies" series, where I try to explain hard concepts in a simple way without super-sophisticated language, so that the average person (like me) can understand the concept.

References for all the information I'm about to explain is at the bottom of this blog post.

Edge detection is what it sounds like. You have an image, and you want to detect "edges". Pictures are worth a thousand words, so the below figure should answer any of your questions on what "edge detection" is.

In the study of edge detection and image processing, an edge is formally defined as a "significant local change in the image intensity, usually associated with a discontinuity in either the image intensity or the first derivative of the image intensity". This sounds really hard, but bear with me here.

Let me first talk about image intensity, and a bit later in this post, I'll get back to the "first derivative of image intensity". Image intensity is a bit hard to describe, but for a normal greyscale image, like the one I'm about to show you below, it can be thought of as "dark vs light", or colour. If you want to learn more about image intensity, you can find it here.

Now I'll try to explain more about what they mean by "discontinuity" and "first derivative of image intensity". Discontinuities in the image intensity can be visualized as step discontinuities, where the image intensity value literally changes from one value to a totally different value. You can represent this as a step function. Discontinuities can also be in the form of line discontinuities, where the image intensity also abruptly changes value, but then returns to the starting value within some short distance. Think of a very thick, vertical, black line you draw in a paint application. As you go across the line, you go from the paint application's default white background, to black, and then to white again.

Now, in a perfect image, these discontinuities would be sharp and we can easily detect them. However, because we don't live an ideal world and because camera apps sometime like to "smoothen" captured photos, sharp discontinuities rarely exist in real photos. The step and line discontinuities you will actually obtain from processing the image will instead look like "ramp functions" or "roof functions". Examples of these two functions are drawn in the below figure.

In addition to sharp discontinuities not existing, another reason why developing a a good edge detection algorithm is hard is because images have "noise" in them, and we need to make algorithms pretty good at being able to filter out this "noise".

As you can see in this above figure, there is noise on the left most and right most part of the graph, where both parts seem to have this "ramp" function... but as you can tell from the actual image above the graph, those ramp-like parts are just noise. There is nothing on the left and right parts of the actual image.

I'm going to now try to write a short list of definitions of some commonly used terms, which will be used later in this blog post. I will try to make these terms as understandable as possible.


Edge Point - I'm sure you all know what a point means. An edge point is basically a point in the image that is part of what is considered to be an edge by whatever edge detection algorithm you use. So, basically, it is an (x, y) coordinate at a location where there is a significant image intensity difference.

Edge Fragment - An edge fragment is basically an edge point, along with that particular point's "edge orientation". An edge orientation is basically what direction/orientation the edge is. For example, I could draw a line horizontally on my paint application, and that edge would have a horizontal orientation. Obviously, this edge orientation is an actual angle/mathematical value, but I want to make things simple.

Edge - When image processing experts talk about "edges", they actually mean "edge points" or "edge fragments". Read the definition below on an edge detector if you don't know what I meant from that.

Edge Detector - an edge detecting algorithm that actually produces "edges" (a.k.a. edge points or edge fragments) from an image.

Contour - A contour is a list of "edges" (a.k.a. edge points/edge fragments) or the mathematical curve that models the list of "edges" (a.k.a. edge points/edge fragments), into something most of us define as an "edge".

Edge Linking - "Edge linking is a process of forming an ordered list of "edges" from an unordered list. By convention, "edges" are ordered by traversal in a clockwise direction". To simplify, edge linking is by linking edge points together on a small scale (e.g. within a few pixels of each other, we determine that edge point A and edge point B can be linked).

Edge Following - "Edge following is the process of searching the {filtered} image to determine contours." To simplify, edge following is where you look at the whole image by filtering out noises, and then determining the lines. It is edge detection on a big scale, rather than a small scale.

Edge coordinates could be in the coordinate system of the original image (e.g. an image of pixel width X and pixel height Y), though more likely, it will be in a different coordinate system as many edge detectors use a process called "filtering" in order to filter out noise. Filtering an image actually changes the coordinate system of the image, which may translate or scale the image coordinates.

Now, edge detectors cannot be perfect. The types of edges that can be obtained from an edge detector are: correct edges (so a human testing an edge detector can verify that "yep it identified this edge correctly"), false edges (where the algorithm mistaken a non-edge for an edge), and missing edges (where the algorithm should have identified an edge but didn't). False edges are called false positives, and missing edges are called false negatives.

Now, remember when I talked about the "first derivative of image intensity"? What does this mean? Well, let's take the example of a step edge associated with a significant difference in image intensity. The first derivative, if you've took an intro to calculus course, is just that - the first derivative. The slope. The below figure, taken from a neat website I referenced at the bottom of this blog post, has two images: the first one is the step function (though more like a continuous ramp function), and the graph below that is the derivative of that ramp function - a.k.a. the "first derivative of image intensity".

I will now define the "gradient". The gradient, as the figure above describes, is the first derivative of image intensity, and can be used to detect edges. How? Well, we know that the first graph can symbolize an edge because it goes from a low value to a high value very abruptly. Every time a change like this occurs, the first derivative has a local maxima! So, that means everytime a big enough local maxima occurs, we can determine that there is an edge point!

So, you can imagine that an image is just a huge 2D array of these continuous functions of image intensities, and by interpreting all of these graphs, we can eventually detect the edges in a whole image.

Now, the figure above only represents a 1D gradient and a 1D image intensity. Pictures are 2D, so we need to somehow define a 2D gradient. Below is a screenshot of a page from a textbook that describes the "formal definition" of 2D gradient:

Most edge detection algorithms uses something called a "mask" in order to approximate the gradient values in an image. An example of some different types of masks is shown in the below figure.

How do you use one of these masks, you might ask? The method is pretty simple. Let's say we are using the Sobel mask (which has a mask in the x-direction and the y-direction). For each 3x3 group of pixels, we apply Sobel's x- and y- masks by a process known as "convolution". An easy example of how to do convolution is in this link.

These matrices that get convoluted with the original source image's matrix are not only known as masks, but also filters, because they filter out noisy signals to get the signal that we want - the edges.

There are other types of filters as well, such as ones with sizes of 2x1, 1x2, et cetera. But these these are a bit hard to think about because there isn't really a "center" pixel in that mask/matrix thing. Some of these mask matrices can be very huge as well, which we will talk about below in this blog post.

The simplest numerical approximation for gradient would be:

 Gx = f[i,j + 1]- f[i,j]

and Gy = f[i, j] - f[i + 1,j]

j corresponds to the x direction and i to the negative y direction.

The corresponding masks for this simple gradient approximation are illustrated below:

Algorithms for edge detection generally contain these three steps:

Filtering: Images have noise. Filtering is used to reduce this noise. However, there's actually a trade-off, because by filtering to reduce noise, you could potentially reduce edge strength in the images as well. More filtering to reduce noise results in a loss of edge strength.

Enhancement: Enhancement emphasizes pixels where there are local "first derivative intensity value" maxima and are usually performed by computing the gradient magnitude. There are several ways to compute the gradient (e.g. Roberts, Sobel, etc.), which we will talk about below

Detection: In an image, there might be lots and lots of local maxima when looking at the first derivative of image intensity. How do we actually determine what is an edge and what isn't? Typically, making a threshold of e.g. "if the maxima is above X value, then it's an edge!" is used.

Many edge detectors have been developed in the last two decades. I will discuss now talk about some commonly used edge detectors.

Roberts Operator

The Roberts cross operator provides a simple approximation to the gradient magnitude:

Ok, maybe it doesn't look simple. You don't need to formally know the definitions, as you can just "use the mask". Here's the mask that you would use in the following figure though:

Sobel Operator

The Sobel operator is actually probably one of the most commonly used edge detectors, quoting from the below screenshot of the textbook I read. Also, the below screenshot gives you the Sobel matrix:

Second Derivative Operators

Now that I've talked about how to do edge detection using the gradient of image intensity, I will now try to explain how you can also use the "second derivative of image intensity" in order to do edge detection. The edge detectors I talked about earlier calculated the first derivative and, if it was above a certain X threshold, the presence of an edge point was assumed. This actually results in the detection of too many edge points. A better approach would be to find only the points that have local maxima in gradient values and consider them edge points. This means that at edge points, there will be a peak in the first derivative and, equivalently, there will be a zero crossing in the second derivative. 

Thus, edge points may be detected by finding the zero crossings of the second derivative of the image intensity. This is the second major way of edge detection, instead of just the "gradient" way. 

There are two operators in two dimensions that correspond to the second derivative: the Laplacian and second directional derivative. This sounds hard, but basically, there are two ways at looking at the second derivative, and correspondingly, two different ways of making an algorithm for edge detection using the "zero crossing" method.

I won't explain too much into this, other than the fact that Laplacian and second directional derivative can give you masks that you can use in your edge detection code, just like the "gradient" methods:

I will make a follow up blog-post that goes more into zero-crossing methods of edge detection, including talking about the canny-edge detector algorithm.


Friday, 3 July 2015

Mini Blog: Shadows on Celestial Bodies, Umbras, and Umbrellas

I'm going to be starting a series called "Mini Blog", where each post in this series is significantly shorter than the lengthy posts I usually write. In fact, sometimes it may be just a picture. Like this one!

This picture defines umbra, penumbra, and antumbra.

Penumbra in this picture is B, and umbra is A.

The reason why the moon doesn't totally cover up the sun from our perspective on earth during a total eclipse is because the distance between the moon and the earth is around the same distance where the apex of the moon's umbra because of the sun is. Therefore, we see an antumbra instead of a full umbra.

Another fun fact: umbra is the Latin word for shadow. The English word, "umbrella", is also derived from the Latin word "umbra".

Thursday, 2 July 2015

Identifying Concavity/Convexity in Polygons (for Dummies)

The following blog post is purely based off of this link: I like explaining things in my own way to make it more understandable, which is what I will attempt to here.

This blog post is about how to check whether a polygon is convex or not, (and subsequently, whether a polygon goes clockwise or counter-clockwise.) If you do not understand what convexity means, then I think a picture would be the best way to explain it:

As you can see from the above picture (which was taken off of the site I linked above), this polygon has concavity in all of its vertices except at (x5, y5).

If you do not understand what a clockwise vs counter-clockwise polygon is, then the following description will help: say if all of the edges of a polygon are represents as vectors pointing from one edge to the next. Did you visualize it? Are all the vectors going clockwise or counter-clockwise? That is basically the difference between a clockwise vs counter-clockwise polygon.

Now let us get back to determining the convexity of a polygon. This "isConvex" logic can be easily applicable in software engineering if you are (for example,) building an app that requires the user to manipulate some sort of polygon, though the polygon must always be convex. Using this "convexity" check will help in implementing this.

The test to check convexity of a vertex would be to check the cross product of the two adjacent sides that meet at that vertex.

A result of a cross product of two vectors would be another resultant vector that is 90 degrees from both of those two vectors. However, the vector could point either way along that axis which is  90 degrees from both of the two original vectors. By convention, we use the right hand rule to know which direction the resultant vector is pointing towards. This can be visualized in the below figure.

Figure 1

So by convention, if a is "behind" b like in this figure, then the cross product vector would go pointing up.

That said, let's say if we changed it so that the index finger in this illustration and the middle finger changed places, so that it looks something like this:

Figure 2. Like Figure 1, but where a is now pointing in b's direction, and b is pointing in a's direction.

Forgive me for my horrible drawing skills. As you can see, because the middle finger (which represents vector b) is now more "behind" the vector a (which is now more "in front/pointing towards the audience"), then that means the cross product will instead point downwards into the ground. This is just a convention mathematicians set up.

So, surprisingly enough, with this convention, we can actually test whether a polygon is convex or not. 

Figures 3 and 4

I will now describe how this test actually works. Basically, you do this test at each vertex by looking at its adjacent sides as vectors. Two vectors of a vertex is visualized above on the left-most picture (Figure 3) as the two red arrows, and the vertex is the yellow point.

You can do a cross product of these two vectors and see if the resultant cross product is going out of the page or into the page using the right hand rule.

By definition, the cross product of two vectors, both of which rely only on the x-y plane, can be simply defined as:
cross product = ((xi - xi-1),(yi - yi-1)) x ((xi+1 - xi),(yi+1 - yi)) = (xi - xi-1) * (yi+1 - yi) - (yi - yi-1) * (xi+1 - xi)
The points (Xi, Yi), (Xi-1, Yi-1), and (Xi+1, Yi+1) can all be seen in the two pictures I drew above.

Now, because vectors can be "moved" up or down on its line of direction, in the right-most picture (Figure 4), I moved the vector (that pointed from i-1 towards i) into the position that is shown using the blue arrow. I did this to make it easier to visualize how you can use the right hand rule in this test, where the thumb can be at the yellow vertex, and your middle and index fingers can be represented by the vector (i to i+1) and the blue vector.

Using the cross product equation above, we determine that the blue arrow would be vector a, and the red arrow that goes from  i to i+1 would be vector b. This corresponds with figure 2, so therefore, if you do the math for the specific example shown in Figure 4, the cross product is negative (going into the ground/page). A negative cross-product for a convex vertex (such as the one in Figure 4) means that the polygon is a clockwise polygon. This makes sense, since in figures 3 and 4, I drew the two red arrows going clockwise around the polygon!

In order to actually test for convexity versus concavity, you would have to do this similar test for all vertices of your polygon. If all vertices have the same sign, then that means your polygon is convex. If your polygon has a mixture of positive and negative cross products at the vertices, then that means the polygon must be concave at some points.

To determine exactly which points are convex or concave depends on whether you made the vectors of the polygon edges go around the polygon in a clockwise or counter-clockwise fashion. Like I said before, if your edge vectors around your polygon were clockwise, then convex points would result in negative-signed cross products. If the polygon was clockwise and you evaluate the cross product at a certain vertex which gave you a positive-signed cross product, then you can conclude that vertex would be concave.

Sunday, 28 June 2015

Humanity Forgetting Terrible Diseases (& History of Tuberculosis)

Sanatorium for Tuberculosis patients

This blog post was inspired by the recent PBS documentary "The Forgotten Plague" (which concentrates on the history of TB, primarily in the United States).

It seems that recently, numerous diseases that we thought were "virtually wiped out", such as the whooping cough, and even polio, have revived in certain areas of the United States. This is primarily due to parents disallowing their children vaccination, and subsequently, protection from these life-threatening diseases.

One opinion of the cause of this is because humans, even within a generation or two, completely forget the horrible history of these terrible sicknesses.

I'd like to focus on one in particular, that was wide-spread and very common up until only the mid-early 1900s: tuberculosis, or "consumption" back when this disease was still common. The reason why it was called "consumption" was because a classic symptom was weight loss.

Although TB was thought to be hereditary, people also thought that fresh air and going outside in the wilderness could help reduce the symptoms. In fact, although there were many other reasons why people went out into the American West, one major cause was the fact that developers advertised how "being in the wilderness" and "going out West" can cure TB.

In 1882, the bacteria that caused TB was discovered, and was named "tuberculosis bacillus" due to its tuber-like shape. So, when people realized that it was not hereditary, and in fact contagious by being in vicinity with a TB-infected person, many special isolated places sprung up to house infected people, and most of these places were in the wilderness.

These places were called "sanatoriums", the first of which was founded by Edward Trudeau, who was also the American to first verify the fact that TB was in fact caused by a bacteria, by taking an excruciatingly long period of time to make the microscope, to adjust the room temperature to be exactly 36-37 degrees celsius for hours (which was extremely difficult back in the early 1900s).

Trudeau's "sanatorium" had grown in popularity in its name, and as a result, many more sprung up, including in Europe as well. All of these sanatoriums tried to mimic how Trudeau ran his sanatorium: patients needed to be in bed all day, eat enough food to be well, and surprisingly, and sitting outside on the porch for hours (even if it was freezing cold outside).

What Trudeau discovered is basically common sense today: that rest and relaxation helps your immune system fight against foreign bacteria.

Eventually, in 1943, the antibiotic streptomycin was discovered by Albert Schatz in order to effectively combat tuberculosis. Schatz painstakingly spent hours and hours getting samples of dirt from outside, and isolate all of the bacteria until he found one that was effective against TB.

And it's 2015 right now, only 72 years later, and most of us have forgotten about this plague - this forgotten plague. In an interview within the documentary with an elderly man who was still alive to tell us his stories about TB, he actually talked about his experiences within the sanatorium, and how his mom, his dad, and most of his relatives died from it. It was one day in 1943 when he was finally told that there was a cure, and lived to tell the tale due to this miracle antibiotic.

Unfortunately for us millenials, there are now many strains of tuberculosis that are resistant to streptomycin, and currently a combination of different antibiotics are used to treat tuberculosis. Perhaps in the near future, there will be even further resistant tuberculosis strains, and I am a bit scared by that, and of any other harmful bacteria that can develop resistance.

Saturday, 20 June 2015

Knapsack Problem For Dummies

This blog post will be a "knapsack problem explanation to dummies".

I hope this might help others who are also not CS majors, but want to know more about this famous problem (like I did).

The Knapsack Problem
You have an n amount of items. Each has a certain weight, w_i, and a certain cost, c_i. We want to fill up our backpack, that only holds a maximum weight W with these items. While doing so, we must maximize the total cost of the items in the backpack (our optimization goal!). Of course, we cannot exceed the weight W, so we can only fit a limited number of items, which means the maximum number of items in our backpack must be be less or equal to the n amount of items available to us.

There are two "types" of this knapsack problem. The first is the divisible knapsack problem, and the second is the indivisible knapsack problem. The difference between the two is that just one is the knapsack problem I just described, with the additional note that each of the n amount of items can be divided into a fraction of that i'th item, whereas the other one assumes that all n items are indivisible. So, you could think of the divisible knapsack problem as a problem where all the items are fully constructed lego toys, and that in order to truly maximize the lego blocks you can stash in your knapsack, you can of course try to divide each lego toy by ripping it apart to only take the most valuable lego parts.

Divisible Knapsack Problem
This version of the problem assumes that arbitrary fractions of items can be included in the knapsack (like the lego example I used). This problem can be solved using the following algorithm:
  1. Sort all of the items from the one that has the biggest c_i/w_i ratio (the cost per weight ratio) to the least cost per weight ratio. Typically the complexity of this sorting algorithm depends on which sorting algorithm you employ, e.g. merge sort would give you a complexity of O(nlogn).
  2. Loop through the list of items you just sorted in step 1, and as long as the knapsack is not yet full, add that i'th item into the knapsack.
  3. Once you got to the i'th item that CANNOT fit into the knapsack anymore, figure out how much the knapsack needs until it is truly full. For example, your knapsack's total allowable weight might be 5 kg, and from step 2, you have already filled it up to 4 kg. The next i'th item that could not be stored into the knapsack weighs 2 kg. Therefore, you break apart/cut apart that i'th item into halves in this case, and store one of those halves into your knapsack. Your knapsack should now be optimized for the best total cost per weight ratio.
This algorithm can be called a "greedy algorithm". A greedy algorithm is just a term for an algorithm that goes through a set of items, and after comparing some of the items e.g. items A and items B, it never goes back to compare it again. This should be ok in this scenario, unless somehow magically item A becomes a lot more valuable, which could occur in real situations. For example, let's say you were in some online MMO game where each player only has a limited inventory (based on "weight"). In this game, the value of all items change all the time depending on the in-game economy (e.g. lots of players might thought "AWESOME SWORD" was super OP at the time, and the in-market price for that was 1000 in-game currency, but then all of a sudden the MMO updated and now obtaining the "AWESOME SWORD" is a lot easier. What happens? The cost of the "AWESOME SWORD" goes down of course.)

Indivisible Knapsack Problem
Remember that this version of the knapsack problem is essentially the same, except that items must be included into the knapsack in all of its entirety. If the item as a whole does not fit into the knapsack, we cannot "cut it up into pieces" and fit those "pieces" into the knapsack.

This version of the knapsack problem actually makes the problem a bit more difficult.

We could go with a "greedy" approach. In fact, we could potentially use our previous greedy approach, which was to sort the items based on cost per weight, and fill the backpack with the items that had highest cost per weight. However, if you actually tested this with lots of test cases, you will discover that sometimes this approach will not always give you the most optimal solution.

There are other "greedy" approaches as well, including;
  • sorting by cost (not cost per weight!) and filling the knapsack until it is full
  • sorting by weight (from least weight to greatest), and filling the knapsack until it is full
These two may sometimes actually give the optimal result, but as with the cost per weight approach, they do not always give the optimal solution.

So, how do we solve the indivisible knapsack problem optimally every time? There is no greedy approach which is optimal, as none will 100% give an optimal solution.

However, we can use some concepts to help us achieve in finding the algorithm to solve this indivisible knapsack problem. These concepts include backtracking and dynamic programming.

Backtracking is basically just brute forcing (though I am leaving maybe a bit of stuff out for simplification.)

Dynamic programming can be explained better with an example first, and then a more general definition after. Let's take the example of the knapsack problem. The problem definition is pretty general. You have an arbitrary n amount of items, each with an arbitrary cost and an arbitrary weight. We can create a lot of more specific subproblems by purely inserting actual numbers into each of these variables. For example, we can try to solve the knapsack problem with a 5 kg knapsack, with 2 available items: a 5 kg $1 item, and a 2 kg $2 item. By solving many of these more specific problems, and remembering the actual numerical answer for each of these by saving them/caching them, whenever we need to solve the knapsack problem again for those particular specific situations (like the specific one described above), we can just "look up" the answer instead of computing the whole answer again.

Oh and also better still, not only is this caching of answers is efficient in itself, but with each cache of each answer, any future computations can become easier as well. Basically, future computations can rely on the previous cached answers to be even faster to be solved. What I just described is something called memoization.

For example, we all know the recursive solution for a fibonacci function. It's just to return F(n-1) + F(n-2). We can actually improve this using memoization. Each time the user/someone calls this fibonacci function, we cache the actual answer, including its sub-answers for F(n-1) and F(n-2). For example, say we use memoization. The user calls F(135). Then, whenever the function is called again, for maybe even a bigger number e.g. F(264), because we called F(135) the first time, the answer for F(135) is cached, and so are its subsequent sub-fibonacci problems e.g. F(134), F(133), F(132), ... , to F(3). Therefore, when calling F(264), the computer will just figure out what F(135) is in a blink of an eye, and compute all the way up to F(264).

Ok, I just explained a lot, and we will try to incorporate all of these new concepts into the actual solution for the indivisible knapsack problem.

We know that there should be an optimal solution, and that this solution includes some number of items. An easy general subproblem for the indivisible knapsack problem would be whether or not each item is actually part of the "optimal collection of items to be put into the backpack". We then solve all of these sub problems, and from there we can get the "best solution".

If n is in solution: the solution will have the cost of the n'th item in the solution.
Knapsack(n,W) = C_n + Knapsack(n-1, W-w_n)
Where that second term, Knapsack(n-1, W-w_n) is precomputed due to memoization.

If n is not in solution: the solution will not consider the n'th item's cost or weight at all.
Knapsack(n,W) = Knapsack(n-1, W)
Where Knapsack(n-1, W) is precomputed due to memoization.

The full and official solution for the indivisible knapsack problem is:
  1. Create a two-dimensional array with its rows being 0 to n, and its columns being numbers from 0 to W (max weight of the knapsack). Obviously the first column and the first row have values of all 0 because a backpack that holds 0 weight will have a maximum cost solution of 0, and a backpack that can contain a maximum of 0 items will have a maximum cost solution of 0.
  2. For each (i.j)th element in the array, where i is a number between 1 and n and j is a number in between 1 and W
    1. Initialize the (i,j)th solution as just SolMatrix[i-1][j] (which is the previously calculated solution without the i'th item considered, and only items 1 to i-1th items considered.)
    2. Change the (i,j)th solution to SolMatrix[i-1][v-w_i]+ cost_of_ith_item IF:
      1. (weight_of_ith_item ≤ j) AND
      2. (SolMatrix[i-1][j-weight_of_ith_item]+ cost_of_ith_item > SolMatrix[i-1][j]) 
I worry that the algorithm for the above indivisible knapsack problem is a bit hard to understand, from how I typed it up.

If you need any further explanations on the indivisible knapsack problem, please click the following links for more details, as they have helped me tremendously on understanding the concepts explained in this post:

How to Deal with Rejection

I got rejected from a national engineering scholarship recently that was very prestigious. I spent many days trying to finish the arduous application process, only to receive that email that explains "the selection process was very tough", and a few words after, receiving the unfortunate news, "we are sorry to inform you...".

I have to admit that I have a lot of general anxiety about anything and everything I encounter in life, but I try very well to hide my fears and worries. I have low self confidence especially, and constantly try to deal with the thought that "I'm not good enough" in every situation. Because of this, my first thought after receiving the email of rejection was that "I'm not good enough.

I suck.

I'll just give up."

I feel very weak for thinking this way at many times in my life, and for admitting this right now. I have thought this exact same way in numerous other situations earlier in my life. However, I know of plenty of people who get stuck in this kind of mental rut as well whenever they try to aim for success.

I know this seems obvious to probably 99% of you reading this, but what I'm about to say was really an eye-opener for a pessimist like me, and will hopefully be a positive motivation for anyone reading this: that my mindset I exhibited was a terrible way to deal with rejection - that I should change it, as it will never make me improve, and will just hurt me and my confidence more.

After seeking advice from multiple friends, from books, and from articles online, I began to realize that instead of being stuck in a prolonged period of depression after I receive a rejection of some sort, I should instead change my mindset to be more productive and I should be more focused on improvement.

"I got rejected from XXX. My approach was probably wrong and I need to improve more!"

This is the most wise method for whenever a rejection is received, in my opinion. There is nothing wrong with yourself, even if you got rejected. You simply just approached it a bit wrong that didn't work for that specific situation. Instead of moping around, try to see any improvements that you could have easily did in order to improve your chances for success, for any future opportunities (trust me, there will always be future opportunities!)

Don't be afraid of rejection either. If you never get rejected, then that means you aren't trying to get out of your comfort zone, and are not at trying to reach your limits. Try to reach them!

Sunday, 7 June 2015

Tell Your World

To remember the intangible feelings that I hold dear in my heart
I broke free from this life, this somber life that some of you know
I hummed these secret songs for years, but now I want to share one with you…
So spread this news far wide… and let it fly high with soaring birds in the blue clear sky!

It’s all those words to share and bare with others
It’s all those notes to sing and send to others
Each and every link forms a new connection
It echoes far away into the distance...

It’s all those words to share and bare with others
It’s all those notes to sing and send to others
Then all your links combine into a circle
With everyone together “tell your world” it’s everywhere!

The dark encapsulates your room, you're blinded by terrible heartache
But the light that shines right through, those gaps between your fingers, brilliantly
I've caught this little tune, and tapped its beat on top of the window sill
And it'll reverberate... and resonate throughout the air and into the sky!

It’s all those words to fare and bare with others
It’s all those notes to sing and shout to others
Each and every link forms a new connection
It echoes far away into the distance...

It’s all those words to fare and bare with others
It’s all those notes to sing and shout to others
Then all your links combine into a circle
With everyone together “tell this world” it’s everywhere!

Never doubted the days, this song that I played
will heal everyone's sorrows,
Life is such a wondrous journey that some people may lose their footpath on...
Now every moment shakes, with each tapping beat, so our world begins to change,
Tell your world today... about the sounds within our hearts!

It’s all those words to share and bare with others
It’s all those notes to sing and send to others
Each and every person forms a connection
We echo far away into the distance...

It’s all those words to share and bare with others
It’s all those notes to sing and shout to others
Then everyone holds hands in a big circle
With everyone together “tell your world” it’s everywhere!

The tune of these lyrics are to the beat of the Japanese vocaloid song, "Tell Your World" by Hatsune Miku. It is not a direct translation, as I wanted to be more creative. You can find the music video for this song here.

The chorus' lyrics specifically is inspired by Zessei Bijin's English cover. You can find it here. Everything else are 100% my own words.

Sunday, 10 May 2015

How Cats Always Land on their Feet

My boyfriend recently showed me a gif that he saw on Reddit that startled me initially, as it involved a cat falling from a high building. You'd think the cat would die, but it didn't. It turned itself feet first onto the ground, and scurried away.

You can see the gif here: (Scary, isn't it?)

This blog post will detail the findings my boyfriend and I uncovered whilst researching about how cats always land feet first, and how cats can survive falls from high buildings. Resources that helped aid in this post's information is linked or mentioned throughout this post.

Why don't cats go SPLAT when they fall from a really high building?

Everyone who took physics knows that the reason why we all fall and come back down to earth is because of gravity. That is,

F = mg

The only force trying to resist this force on earth would be air resistance,

Cd  = coefficient of drag (around 0.2)
  ρ  = density of air
  A  = area of object
  v  = velocity of object

Now, you probably know that terminal velocity is basically the maximum velocity a body experiences after X seconds of free falling. This terminal velocity remains the same, or to put it in another way, is constant after X seconds due to the air resistance force and the gravitational force cancelling each other out so that the net F = 0.

During free fall, a cat is usually splaying its body out like a flying squirrel.

When calculating the terminal velocity of the cat, we see that because the cat is splaying its body out, the "projected area" is bigger, so the overall terminal velocity of the cat is smaller. When you actually do experiments, you discover that the terminal velocity of an average cat is calculated to be around 60 mph. Compare that to 120 mph of an average skydiver.

Because of this, cats will generally not die from the force that occurs when landing on the ground. They will not fall to death like a human likely would.

How does a cat always land feet first? (An Explanation of the 'Cat Righting Reflex')

First I would like to explain what angular momentum is, as this is what we need to explain first. Angular momentum is how much an object is rotating, but it depends on the moment of inertia and the angular velocity of the body. It defined in the picture below:

I is the moment of inertia (you can simplify it as the quantity of area away from the axis of rotation), and w is the angular velocity. This picture also shows that angular momentum is the reason why when a ballerina's arms go in, she twirls faster - because angular momentum is preserved. In case (a), I is big because the area around her is big, since her arms are spread out. In the second case, because I is smaller now and L should be constant, w must increase, so she spins faster.

So, now, we should start explaining how a cat actually falls on their feet.

Before photography came into existence (so we could capture frame by frame of how exactly a cat's body orientates itself feet-first to the ground), we thought that a cat flipping feet first was some kind of paradox. This is because it seemed to violate the law of conservation of angular momentum.

This is described in the following youtube video:

A screenshot is below:

As the above screenshot of the video described, it seems that every cat in the world is violating the law of conservation of angular momentum, as angular momentum is always conserved unless an outside force acts upon it.

The only two forces that act on the cat are gravity and air resistance.

Gravity cannot really add or remove angular momentum of the cat, as gravity acts on the center of mass on the cat, which is "in line" with the cat's axis of rotation, so therefore, gravity cannot affect the conservation of the cat's angular momentum. The only other force would be air resistance, but this isn't the correct answer either.

After photographing carefully what cats do when freefalling, we get realize that cats do in fact not violate this law, and do not use air resistance to help conserve angular momentum. But how do they conserve angular momentum, you may ask?

Please click the following gif, as it will explain to you how angular momentum is preserved when a cat is rotating to land feet-first:

A screen shot is here:

After looking at the gif, you can see now how angular momentum is actually conserved! Cats have a very flexible back, so they are able to do these weird rotations of their bodies to orientate themselves upright!

Other fun facts about how cats have 9 lives

How can the cat from the first link I shared in this post survive such a tall height? Read the following from Emily Russell's presentation on "The Physics of Cats" (2007)
A study was released in the Journal of the American Veterinary Medical Association in 1987 of 132 cats which had fallen from high-rise windows in New York. The average fall height was 5.5 stories. 90% of the cats survived. Above a fall height of 7 stories (around 70 feet), the number of injuries the cat sustained actually decreasedThis is because once it has turned over and reach terminal velocity, the cat relaxes its muscles, so that its landing is softer.
Neat, eh? TIL (Today I learned) how cats fall safely to the ground!

Friday, 8 May 2015

My Recent Discoveries and Ramblings on Tetraethyllead Gas & Alexander Gettler's Forewarnings

I personally am an avid fan of documentaries, watching them since I could understand English when I first immigrated to Canada. Recently though, I feel like it's getting harder and harder to find the specific kinds of documentaries that I like. I would say a lot of the documentaries I encounter are just interviews of people about various political and cultural events. However, many of the people are ordinary average joes, that give their biased opinions on such matters. I like the type of documentaries that tell me facts, like science or history documentaries, where they give me a story about how a man discovered vaccination, or how the D-Day was planned.

This brings me to talk about a delightful documentary I recently watched, that I was initially going to skip: "The Poisoner's Handbook: The Standards for the Rest of America". It chronicles the lives of two influential scientists, Alexander Gettler and Charles Norris, and their constant fight to legitimize forensic science, even when no one believed them or when people had a conflict of interests in a couple of murders they were trying to solve using facts, not stupidity. Gettler was a genius in chemistry, and Norris studied medicine at Columbia. They both worked in New York as the first "forensic science" office, if you could call it that. Basically, back in those times (early 1900s), many people were not familiar with the legitimacy of forensic science, so Gettler and Norris had a lot of opposition, while all they were trying to do was to make science a significant voice in the argument over whether a person was innocent or guilty of a certain crime.

Some of the bodies they had to investigate had evidence of serious chronic lead poisoning. These people had worked in a factory for "ethyl gas", the car companies (e.g. GM) called it. In actuality, however, this was a very euphemistic name, since this gas had lead in it - a kind of lead that the human body can absorb easily and can make a person succumb to death by lead poisoning. The full name of this gas should properly be called tetraethyllead gas.

3D Ball and Stick Model of TEL Compound

So, while the car companies wanted to use this new "ethyl gas" as it was very profitable (tetraethyllead was an anti-knock additive), and while they were telling the public that it was "safe", after Gettler (the toxicologist) found out that all of the factory workers who died had lead poisoning, he publicly denounced ethyl gas and wanted it to be banned. 

The documentary then proceeds to talk about how through a bit of corruption, the car companies managed to hire "industry scientists" that deemed ethyl gas was safe as long as factory workers had gloves and other safety equipment on, and that the gas itself will not harm the public as millions of cars emit this gas, "ethyl gas" went unbanned. By the end of the documentary, I was still a bit confused, as the documentary did not explain what happened to "ethyl gas". I mean, was it banned? Did lots of people get poisoned from inhaling car smoke all the time after some decades? What really happened?

Well, I did a quick Google search, and found a very startling result: tetraethyllead gas was only banned in most industrialized countries in 2000. It wasn't even banned before I was born! That honestly really startles me. I mean, lead has been well known as a lethal poison, for thousands of years! How did car companies get away with this?

Although I haven't read more into the following yet, I am particularly interested and concerned about just how much lead (exactly) has accumulated in the environment since the 1920s (when ethyl gas was first widely used).

I then came across the name Derek Bryce-Smith. After reading about him, I noticed many striking similarities between his attempts to righteously ban ethyl gas, just like Gettler back in the early 1900s. Bryce-Smith also was ostracized and criticized. Car companies didn't like what he had to say about the dangers of their gas. Both he and Gettler understood social responsibilities however, and confronted the "dogma and financial interests" of the big car companies.

Here's a quote I'd like to share, from the Telegraph's article about Bryce-Smith:

As a result [of tetraethyllead gas], countless millions of children suffered damage to their brains. By the time leaded fuel was phased out in Britain, at least one child in 20 – by the government’s own figures – had been exposed at levels known to diminish intelligence: in India, the proportion was one in two. No one will ever know what potential was lost worldwide in consequence, or how many children developed severe learning difficulties. But raised lead levels have also been associated with ADHD, accelerated ageing, and even a greater tendency to criminal behaviour.
This is frightening. I never even knew all of this happened. I'm also disappointed that the general public had no idea about this either. I hope more people know about the story of Alexander Gettler (and his work to make forensic science legitimate in criminal cases), and of course, Derek Bryce-Smith, who campaigned against lead gas more than half a century ago, and died in 2011. Fortunately by the time of his death, only a few countries still used this lead-laden gas.


Resources: (where I found out TEL wasn't banned until 2000 in most industrialized countries, and discovered the social good that Derek Bryce-Smith set out, where he tried to ban ethyl gas.) (The Guardian's obituary of Derek Bryce-Smith) (Another article about Derek Bryce-Smith. Has a lot more detail than the Guardian's.)

Tuesday, 10 March 2015

Developing the "Footsies" App at PennApps

Author's Note: I have decided to do a blog post for all the hackathons/projects I have done thus far. I feel like I can explain a lot about what I did and how each of the hackathons went a lot better in a blog post than just boringly talking about it on my resume.

Author's Note 2: Thanks for Global Hackathon Seoul for giving me a Raspberry Pi as the winner of the "describe your hack" contest! I submitted this blog post as my submission!

Github Repo of the App:

My Experience at the Hackathon

Poster of PennApps Photographed at PennApps 2015

As you can tell from the above picture, I fortunately got the opportunity to go to PennApps for the first time! (I've never went from Canada to the US just for a hackathon before!)

We figured we wanted to do a hardware and health app, considering that PennApps was promoting those two types of hacks that year. So, we got our hands on a Sensoria Sock, which is basically a sock that has pressure sensors and an accelerometer, and we could read those raw values into our Android application by bluetooth, and interpret the data from there.



We figured we wanted to do a hardware and health app, considering that PennApps were promoting those two types of hacks that year. So, we got our hands on a Sensoria Sock, which is basically a sock that has pressure sensors and an accelerometer, and we could read those raw values into our Android application by bluetooth, and interpret the data from there.

We got this idea to create a gait (walking) and posture analysis/rehabilitation app. Until now, most smart fitness devices only measured heart rate, blood pressure, and other things in the body. There has never been a consumer device which can measure the pressure on the bottom of your foot (gait). It is also novel because patients can send data about their gait/walking data to doctors and it can diagnose walking problems and train/rehabilitate walking problems all through a mobile app.


Posture and the way you walk is important. Very slight variations in walking posture times thousands of steps a day can mean extra stress on various part of your foot, leg, and even the rest of your body. Thus, our target user is anyone interested in analyzing their own walking behavior. Some people may be walking wrong their whole lives without even knowing it. This app is good for people who might not even know they have common foot problems to "self-diagnose" themselves (though obviously, consulting a real doctor is better).


Our app, Footsies, is currently on the Android platform. Even though Sensoria has developed a suite of frameworks for the smart sock (on iOS and Windows Phone), there has not been any major framework for Android. Thus, the technical difficulty was difficult as we needed to store/interpret all of the raw data/numbers from the sensors and manually look at all of the accelerometer data/pressure data, and calibrate the app so that we can accurately determine what is a step, and also manually make a visual map of the pressures on the feet in real time. It connects to the Sensoria smart sock via Bluetooth, and allows the user to get a variety of information from the device.

It calibrates to each users' steps' pressure values through a short and easy process of having them pose in the four basic gait phases (pictured below) for a few seconds each, allowing the app to get more a more meaningful/fine-tuned measurement for accurate diagnosis, monitoring, etc.

We had the user calibrate the sensor with the four main stance phases of gait.

This kind of screen appeared for each calibration pose the user had to do. They would press the calibrate button, and then after all the calibrations are done, main menu (the figure below this one) would pop up.

The app has three main modes: diagnosis, monitor, and training

In diagnosis mode, the user is asked to take 10 normal steps, and based on those steps the app attempts to make a diagnosis based on the gait pressures measured. In particular, we focused on being able to diagnose two types of "simple-to-diagnose" foot conditions (because we only had limited time as this was a hackathon, and could only research about these two): pronation (inward feet) and supination (outward feet).

Our app could easily diagnose this, but if we had more time at the hackathon, we could've researched more common feet problems and then we could've diagnosed those conditions with our app too.

Even if no meaningful conclusion is reached, the app graphs the data (gait phase type as a function of time) to which may help to find correlations. This graph data is also sent to Google Fit API and our own server. This server would help "connect" doctors to patients using this app by easily allowing the doctor to see the data remotely, without the patient being beside the doctor in real life. 

Screen that had an interactive circle that filled up as you took the 10 steps to diagnose your feet problem.

Note: I didn't take a screenshot of the "diagnostics results" page unfortunately... but this screen looked cool, trust me. We had a good designer (a.k.a. one of my teammates!) Below is a figure demonstrating what the graph kind of looked like (we just used a simple interactive-graph-library for Android, that's open source on Github).

Link to the library here:

In monitor mode, the user can watch the details of their every step. A "heat" map of pressure is shown, which can show if a user has a habit of leaning in- or out-wards, for example. Data collected could be sent to the doctor, so in essence, our app could help patients with foot problems by not having to force them to stay in the hospital. This app can easily be used in the patient's own home and can be used for therapy in their own home as well.

Screenshot of monitor screen. The foot at the bottom was an interactive heat map I made, that highlighted with color the varying intensities of pressure on the foot. The official frameworks for Sensoria Sock on iOS and Windows Phone had this heat-map, but Android didn't, so I made it myself by heavily calibrating/modifying/hacking.

In training mode, the user can decide on a number of steps as a goal, and the app will count them to that goal while alerting them of any bad posture steps they made along the way. Footsies even has Pebble integration; the smartwatch will alert you via vibrations if you make a "bad step".

Screenshot of training screen. The graph updates in real time each time the user took a step.