Showing posts with label computer science. Show all posts
Showing posts with label computer science. Show all posts

Sunday, 19 July 2015

Mini Blog: Classical Cryptography's Hill Cypher

The contents of this blog post today will detail about a cipher in classical cryptography. It is not used nowadays because it is easily vulnerable. In fact, the way it works can also be understandable to any high schooler or first year college student who has an elementary-level knowledge of linear algebra.

Say you have a message you want to decrypt, e.g. "I like pie". We convert each character into a numerical representation (e.g. perhaps ASCII). Then we form a matrix out of these characters. Let's call it matrix B.

Let's use the concept of matrix multiplication and invertible matrices in order to decrypt and encrypt this matrix B message.

We can easily encrypt it by matrix-multiplying it with an arbitrary matrix A. The resulting matrix, AB, will be sent as the "encrypted message". Note that of course, the number of columns in matrix A must be equal to the number of rows in matrix B for matrix multiplication of A and B to work.

Now, you can send this encrypted message AB to anyone. But how can they decrypt it? Simple. We can use the property of invertible matrices.

(A^-1)(AB) = B

So, we just need to multiply the inverse of matrix A to the encrypted message AB, to get back the decrypted message B. Note that A must be an invertible matrix then.


Saturday, 4 July 2015

Edge Detection For Dummies

Author's Note: I'm working on an application that uses an edge detection library at Microsoft. Obviously, as a lowly undergraduate intern that I am, all I need to do is call a method and it just magically detects edges on images. I wanted to understand more about edge detection and not just use "a library". This blog post is part of my "for dummies" series, where I try to explain hard concepts in a simple way without super-sophisticated language, so that the average person (like me) can understand the concept.

References for all the information I'm about to explain is at the bottom of this blog post.

Edge detection is what it sounds like. You have an image, and you want to detect "edges". Pictures are worth a thousand words, so the below figure should answer any of your questions on what "edge detection" is.

In the study of edge detection and image processing, an edge is formally defined as a "significant local change in the image intensity, usually associated with a discontinuity in either the image intensity or the first derivative of the image intensity". This sounds really hard, but bear with me here.

Let me first talk about image intensity, and a bit later in this post, I'll get back to the "first derivative of image intensity". Image intensity is a bit hard to describe, but for a normal greyscale image, like the one I'm about to show you below, it can be thought of as "dark vs light", or colour. If you want to learn more about image intensity, you can find it here.

Now I'll try to explain more about what they mean by "discontinuity" and "first derivative of image intensity". Discontinuities in the image intensity can be visualized as step discontinuities, where the image intensity value literally changes from one value to a totally different value. You can represent this as a step function. Discontinuities can also be in the form of line discontinuities, where the image intensity also abruptly changes value, but then returns to the starting value within some short distance. Think of a very thick, vertical, black line you draw in a paint application. As you go across the line, you go from the paint application's default white background, to black, and then to white again.

Now, in a perfect image, these discontinuities would be sharp and we can easily detect them. However, because we don't live an ideal world and because camera apps sometime like to "smoothen" captured photos, sharp discontinuities rarely exist in real photos. The step and line discontinuities you will actually obtain from processing the image will instead look like "ramp functions" or "roof functions". Examples of these two functions are drawn in the below figure.

In addition to sharp discontinuities not existing, another reason why developing a a good edge detection algorithm is hard is because images have "noise" in them, and we need to make algorithms pretty good at being able to filter out this "noise".

As you can see in this above figure, there is noise on the left most and right most part of the graph, where both parts seem to have this "ramp" function... but as you can tell from the actual image above the graph, those ramp-like parts are just noise. There is nothing on the left and right parts of the actual image.

I'm going to now try to write a short list of definitions of some commonly used terms, which will be used later in this blog post. I will try to make these terms as understandable as possible.


Edge Point - I'm sure you all know what a point means. An edge point is basically a point in the image that is part of what is considered to be an edge by whatever edge detection algorithm you use. So, basically, it is an (x, y) coordinate at a location where there is a significant image intensity difference.

Edge Fragment - An edge fragment is basically an edge point, along with that particular point's "edge orientation". An edge orientation is basically what direction/orientation the edge is. For example, I could draw a line horizontally on my paint application, and that edge would have a horizontal orientation. Obviously, this edge orientation is an actual angle/mathematical value, but I want to make things simple.

Edge - When image processing experts talk about "edges", they actually mean "edge points" or "edge fragments". Read the definition below on an edge detector if you don't know what I meant from that.

Edge Detector - an edge detecting algorithm that actually produces "edges" (a.k.a. edge points or edge fragments) from an image.

Contour - A contour is a list of "edges" (a.k.a. edge points/edge fragments) or the mathematical curve that models the list of "edges" (a.k.a. edge points/edge fragments), into something most of us define as an "edge".

Edge Linking - "Edge linking is a process of forming an ordered list of "edges" from an unordered list. By convention, "edges" are ordered by traversal in a clockwise direction". To simplify, edge linking is by linking edge points together on a small scale (e.g. within a few pixels of each other, we determine that edge point A and edge point B can be linked).

Edge Following - "Edge following is the process of searching the {filtered} image to determine contours." To simplify, edge following is where you look at the whole image by filtering out noises, and then determining the lines. It is edge detection on a big scale, rather than a small scale.

Edge coordinates could be in the coordinate system of the original image (e.g. an image of pixel width X and pixel height Y), though more likely, it will be in a different coordinate system as many edge detectors use a process called "filtering" in order to filter out noise. Filtering an image actually changes the coordinate system of the image, which may translate or scale the image coordinates.

Now, edge detectors cannot be perfect. The types of edges that can be obtained from an edge detector are: correct edges (so a human testing an edge detector can verify that "yep it identified this edge correctly"), false edges (where the algorithm mistaken a non-edge for an edge), and missing edges (where the algorithm should have identified an edge but didn't). False edges are called false positives, and missing edges are called false negatives.

Now, remember when I talked about the "first derivative of image intensity"? What does this mean? Well, let's take the example of a step edge associated with a significant difference in image intensity. The first derivative, if you've took an intro to calculus course, is just that - the first derivative. The slope. The below figure, taken from a neat website I referenced at the bottom of this blog post, has two images: the first one is the step function (though more like a continuous ramp function), and the graph below that is the derivative of that ramp function - a.k.a. the "first derivative of image intensity".

I will now define the "gradient". The gradient, as the figure above describes, is the first derivative of image intensity, and can be used to detect edges. How? Well, we know that the first graph can symbolize an edge because it goes from a low value to a high value very abruptly. Every time a change like this occurs, the first derivative has a local maxima! So, that means everytime a big enough local maxima occurs, we can determine that there is an edge point!

So, you can imagine that an image is just a huge 2D array of these continuous functions of image intensities, and by interpreting all of these graphs, we can eventually detect the edges in a whole image.

Now, the figure above only represents a 1D gradient and a 1D image intensity. Pictures are 2D, so we need to somehow define a 2D gradient. Below is a screenshot of a page from a textbook that describes the "formal definition" of 2D gradient:

Most edge detection algorithms uses something called a "mask" in order to approximate the gradient values in an image. An example of some different types of masks is shown in the below figure.

How do you use one of these masks, you might ask? The method is pretty simple. Let's say we are using the Sobel mask (which has a mask in the x-direction and the y-direction). For each 3x3 group of pixels, we apply Sobel's x- and y- masks by a process known as "convolution". An easy example of how to do convolution is in this link.

These matrices that get convoluted with the original source image's matrix are not only known as masks, but also filters, because they filter out noisy signals to get the signal that we want - the edges.

There are other types of filters as well, such as ones with sizes of 2x1, 1x2, et cetera. But these these are a bit hard to think about because there isn't really a "center" pixel in that mask/matrix thing. Some of these mask matrices can be very huge as well, which we will talk about below in this blog post.

The simplest numerical approximation for gradient would be:

 Gx = f[i,j + 1]- f[i,j]

and Gy = f[i, j] - f[i + 1,j]

j corresponds to the x direction and i to the negative y direction.

The corresponding masks for this simple gradient approximation are illustrated below:

Algorithms for edge detection generally contain these three steps:

Filtering: Images have noise. Filtering is used to reduce this noise. However, there's actually a trade-off, because by filtering to reduce noise, you could potentially reduce edge strength in the images as well. More filtering to reduce noise results in a loss of edge strength.

Enhancement: Enhancement emphasizes pixels where there are local "first derivative intensity value" maxima and are usually performed by computing the gradient magnitude. There are several ways to compute the gradient (e.g. Roberts, Sobel, etc.), which we will talk about below

Detection: In an image, there might be lots and lots of local maxima when looking at the first derivative of image intensity. How do we actually determine what is an edge and what isn't? Typically, making a threshold of e.g. "if the maxima is above X value, then it's an edge!" is used.

Many edge detectors have been developed in the last two decades. I will discuss now talk about some commonly used edge detectors.

Roberts Operator

The Roberts cross operator provides a simple approximation to the gradient magnitude:

Ok, maybe it doesn't look simple. You don't need to formally know the definitions, as you can just "use the mask". Here's the mask that you would use in the following figure though:

Sobel Operator

The Sobel operator is actually probably one of the most commonly used edge detectors, quoting from the below screenshot of the textbook I read. Also, the below screenshot gives you the Sobel matrix:

Second Derivative Operators

Now that I've talked about how to do edge detection using the gradient of image intensity, I will now try to explain how you can also use the "second derivative of image intensity" in order to do edge detection. The edge detectors I talked about earlier calculated the first derivative and, if it was above a certain X threshold, the presence of an edge point was assumed. This actually results in the detection of too many edge points. A better approach would be to find only the points that have local maxima in gradient values and consider them edge points. This means that at edge points, there will be a peak in the first derivative and, equivalently, there will be a zero crossing in the second derivative. 

Thus, edge points may be detected by finding the zero crossings of the second derivative of the image intensity. This is the second major way of edge detection, instead of just the "gradient" way. 

There are two operators in two dimensions that correspond to the second derivative: the Laplacian and second directional derivative. This sounds hard, but basically, there are two ways at looking at the second derivative, and correspondingly, two different ways of making an algorithm for edge detection using the "zero crossing" method.

I won't explain too much into this, other than the fact that Laplacian and second directional derivative can give you masks that you can use in your edge detection code, just like the "gradient" methods:

I will make a follow up blog-post that goes more into zero-crossing methods of edge detection, including talking about the canny-edge detector algorithm.


Saturday, 20 June 2015

Knapsack Problem For Dummies

This blog post will be a "knapsack problem explanation to dummies".

I hope this might help others who are also not CS majors, but want to know more about this famous problem (like I did).

The Knapsack Problem
You have an n amount of items. Each has a certain weight, w_i, and a certain cost, c_i. We want to fill up our backpack, that only holds a maximum weight W with these items. While doing so, we must maximize the total cost of the items in the backpack (our optimization goal!). Of course, we cannot exceed the weight W, so we can only fit a limited number of items, which means the maximum number of items in our backpack must be be less or equal to the n amount of items available to us.

There are two "types" of this knapsack problem. The first is the divisible knapsack problem, and the second is the indivisible knapsack problem. The difference between the two is that just one is the knapsack problem I just described, with the additional note that each of the n amount of items can be divided into a fraction of that i'th item, whereas the other one assumes that all n items are indivisible. So, you could think of the divisible knapsack problem as a problem where all the items are fully constructed lego toys, and that in order to truly maximize the lego blocks you can stash in your knapsack, you can of course try to divide each lego toy by ripping it apart to only take the most valuable lego parts.

Divisible Knapsack Problem
This version of the problem assumes that arbitrary fractions of items can be included in the knapsack (like the lego example I used). This problem can be solved using the following algorithm:
  1. Sort all of the items from the one that has the biggest c_i/w_i ratio (the cost per weight ratio) to the least cost per weight ratio. Typically the complexity of this sorting algorithm depends on which sorting algorithm you employ, e.g. merge sort would give you a complexity of O(nlogn).
  2. Loop through the list of items you just sorted in step 1, and as long as the knapsack is not yet full, add that i'th item into the knapsack.
  3. Once you got to the i'th item that CANNOT fit into the knapsack anymore, figure out how much the knapsack needs until it is truly full. For example, your knapsack's total allowable weight might be 5 kg, and from step 2, you have already filled it up to 4 kg. The next i'th item that could not be stored into the knapsack weighs 2 kg. Therefore, you break apart/cut apart that i'th item into halves in this case, and store one of those halves into your knapsack. Your knapsack should now be optimized for the best total cost per weight ratio.
This algorithm can be called a "greedy algorithm". A greedy algorithm is just a term for an algorithm that goes through a set of items, and after comparing some of the items e.g. items A and items B, it never goes back to compare it again. This should be ok in this scenario, unless somehow magically item A becomes a lot more valuable, which could occur in real situations. For example, let's say you were in some online MMO game where each player only has a limited inventory (based on "weight"). In this game, the value of all items change all the time depending on the in-game economy (e.g. lots of players might thought "AWESOME SWORD" was super OP at the time, and the in-market price for that was 1000 in-game currency, but then all of a sudden the MMO updated and now obtaining the "AWESOME SWORD" is a lot easier. What happens? The cost of the "AWESOME SWORD" goes down of course.)

Indivisible Knapsack Problem
Remember that this version of the knapsack problem is essentially the same, except that items must be included into the knapsack in all of its entirety. If the item as a whole does not fit into the knapsack, we cannot "cut it up into pieces" and fit those "pieces" into the knapsack.

This version of the knapsack problem actually makes the problem a bit more difficult.

We could go with a "greedy" approach. In fact, we could potentially use our previous greedy approach, which was to sort the items based on cost per weight, and fill the backpack with the items that had highest cost per weight. However, if you actually tested this with lots of test cases, you will discover that sometimes this approach will not always give you the most optimal solution.

There are other "greedy" approaches as well, including;
  • sorting by cost (not cost per weight!) and filling the knapsack until it is full
  • sorting by weight (from least weight to greatest), and filling the knapsack until it is full
These two may sometimes actually give the optimal result, but as with the cost per weight approach, they do not always give the optimal solution.

So, how do we solve the indivisible knapsack problem optimally every time? There is no greedy approach which is optimal, as none will 100% give an optimal solution.

However, we can use some concepts to help us achieve in finding the algorithm to solve this indivisible knapsack problem. These concepts include backtracking and dynamic programming.

Backtracking is basically just brute forcing (though I am leaving maybe a bit of stuff out for simplification.)

Dynamic programming can be explained better with an example first, and then a more general definition after. Let's take the example of the knapsack problem. The problem definition is pretty general. You have an arbitrary n amount of items, each with an arbitrary cost and an arbitrary weight. We can create a lot of more specific subproblems by purely inserting actual numbers into each of these variables. For example, we can try to solve the knapsack problem with a 5 kg knapsack, with 2 available items: a 5 kg $1 item, and a 2 kg $2 item. By solving many of these more specific problems, and remembering the actual numerical answer for each of these by saving them/caching them, whenever we need to solve the knapsack problem again for those particular specific situations (like the specific one described above), we can just "look up" the answer instead of computing the whole answer again.

Oh and also better still, not only is this caching of answers is efficient in itself, but with each cache of each answer, any future computations can become easier as well. Basically, future computations can rely on the previous cached answers to be even faster to be solved. What I just described is something called memoization.

For example, we all know the recursive solution for a fibonacci function. It's just to return F(n-1) + F(n-2). We can actually improve this using memoization. Each time the user/someone calls this fibonacci function, we cache the actual answer, including its sub-answers for F(n-1) and F(n-2). For example, say we use memoization. The user calls F(135). Then, whenever the function is called again, for maybe even a bigger number e.g. F(264), because we called F(135) the first time, the answer for F(135) is cached, and so are its subsequent sub-fibonacci problems e.g. F(134), F(133), F(132), ... , to F(3). Therefore, when calling F(264), the computer will just figure out what F(135) is in a blink of an eye, and compute all the way up to F(264).

Ok, I just explained a lot, and we will try to incorporate all of these new concepts into the actual solution for the indivisible knapsack problem.

We know that there should be an optimal solution, and that this solution includes some number of items. An easy general subproblem for the indivisible knapsack problem would be whether or not each item is actually part of the "optimal collection of items to be put into the backpack". We then solve all of these sub problems, and from there we can get the "best solution".

If n is in solution: the solution will have the cost of the n'th item in the solution.
Knapsack(n,W) = C_n + Knapsack(n-1, W-w_n)
Where that second term, Knapsack(n-1, W-w_n) is precomputed due to memoization.

If n is not in solution: the solution will not consider the n'th item's cost or weight at all.
Knapsack(n,W) = Knapsack(n-1, W)
Where Knapsack(n-1, W) is precomputed due to memoization.

The full and official solution for the indivisible knapsack problem is:
  1. Create a two-dimensional array with its rows being 0 to n, and its columns being numbers from 0 to W (max weight of the knapsack). Obviously the first column and the first row have values of all 0 because a backpack that holds 0 weight will have a maximum cost solution of 0, and a backpack that can contain a maximum of 0 items will have a maximum cost solution of 0.
  2. For each (i.j)th element in the array, where i is a number between 1 and n and j is a number in between 1 and W
    1. Initialize the (i,j)th solution as just SolMatrix[i-1][j] (which is the previously calculated solution without the i'th item considered, and only items 1 to i-1th items considered.)
    2. Change the (i,j)th solution to SolMatrix[i-1][v-w_i]+ cost_of_ith_item IF:
      1. (weight_of_ith_item ≤ j) AND
      2. (SolMatrix[i-1][j-weight_of_ith_item]+ cost_of_ith_item > SolMatrix[i-1][j]) 
I worry that the algorithm for the above indivisible knapsack problem is a bit hard to understand, from how I typed it up.

If you need any further explanations on the indivisible knapsack problem, please click the following links for more details, as they have helped me tremendously on understanding the concepts explained in this post:

Tuesday, 14 June 2011


Edit: I have deleted the picture of a graphic poster on this post. I accidentally deleted the picture along with a lot of other pictures that were saved on Google picasa, that apparently now stores Blogspot pictures. This graphic poster had the following text: "Keep Calm and Save Computer Science".

My local high school (which is probably the most academic school in my region) might be cancelling COMPUTER SCIENCE!!!

''Grade 11 Computer Science at WCI has been made inaccessible to most of the students who registered. Because of this the Grade 11 and 12 Computer Science program at WCI is in danger of being canceled.

The purpose of this page is to bring this issue to WCI students, their parents, and the administration of WCI. ICS3UI, Grade 11 Computer Science, has been put in the same block as Pre-AP Math. This means that around two thirds to three quarters of the students will probably have to drop the course, and the course will likely be canceled as a result. Students have talked to the administration, and they have acknowledged that there is no practical impediment to changing the time of the course. However, they refuse to do so. Mr. Damian is fine with moving the course, so this is not inconveniencing staff. The course is likely to be cancelled if it is not moved, so it would not hurt students either. There is no reason why the course block should be changed. This group was made to try to give the students who registered in the course (around 30) the opportunity to study Computer Science at WCI.''