Zen and The Art of Sketch Recognition: 2010

Tuesday, December 14, 2010

Reading #30 Tahuitu

Summary
Tahuti is a sketch based framework to create UML diagrams, which is another domain for sketch recognition. It also gives option to type in text. In the user study, the participants were more satisfied in using Tahuti against other UML diagram creation software.

Discussion
I believe UML diagrams should be relatively easier to recognize, so I expect that recognition should work with high accuracy. Is there any more paper left to read?

Reading #29 Scratch Input

Comment:
Johnatan

Summary
This paper propose a system to recognize a sketch by using the sound of the sketching device only. A stethoscope mic is attached to the surface that the sketching will be made and the audio during the sketch process is recorded. Then the audio file is processed and looking at the amplitude data, some recognition decisions are made. The author talks about %90 accuracy for some simple shapes.

Discussion
Having worked on the same idea, I can say that the problem is not an easy one. There are to many variations in the recordings of the same shape. Only very very simple shapes can be recognized effectively. And I wonder there could be any application domain for this idea.

Reading #28 iCanDraw

Summary
iCanDraw is a master thesis by Daniel Dixon, that assist people in drawing a human face by recognizing how well they are doing. It provides useful interface tools, such as rulers, markers to help users to preadjust the proportions of face elements. It a provides user a headshot to draw and can check the user's level of correctness by comparing the sketch to a contour set extracted from the photo.

Discussion
Dan is a cool friend of mine and actually I was a test user for this. It was a fun experience. I remember that I did better with the aid of the system. Can u hear me Dan??

Reading #27 K-Sketch

Summary
K-Sketch proposes an easy to use sketch interface for making simple 2D animations. It is Teddy to modeling K-Sketch to animation in that sense. User can draw animation paths as strokes, and sketch figures are animated by translation along those path. The intention of K-Sketch is mostly for entertainment between amateur users.

Discussion
I have seen a video of K-Sketch, it looked premature but promising. Also my advisor knows the person who developed the system and told me that he actually applied to A&M for a faculty position once but did not get selected :/

Reading #26

Summary
This paper proposes a multi-player game to collect sketch data for researchers which is the same deal on reading #24. The game has drawing or describing modes. In drawing mode, player sketches something based on a description text. In description mode, the player is expected to do the inverse, describe a given sketch in text. The description of one player is sent to another player to draw, and their drawing can be send to another player to describe and the message may change significantly on the way.

Discussion
This is just an extended version of paper #24.

Reading #25 Image Retrieval

Summary
This paper proposes as sketch based query system to search an image database of millions of images. The backbone of the search system is the image descriptors, essentially some data obtained by performing some edge detection on the images. The descriptor of the sketch is obtained the same way an image descriptor is obtained.

Discussion
This is another application domain for sketching, where words are not enough for describing things. It reminded me another work, where a sketch based query system was used to retrieve 3D models from a mesh database.

Reading #24 Games for Sketch Data Collection

Comment:
Drew

Summary
This paper came up with two simple game ideas so that it is fun to collect sketch data from users. In the first game, Picturephone, one or more players draw a sketch based on a description read by another player. In the second one, stellasketch, the players individualy label a sketch drawn by a user. At the end of the day, the users should have collected plenty of strokes.

Discussion
This is a cool idea indeed, if you need some sketch data. However I am not how they can find a context for the data collected.

#23 InkSeine

Summary
InkSeine is a note taking app for tablet-pcs, that uses a gesture based interface for functions as searching, linking, and gathering. The interface promotes linking&interaction between the existing notes.

Discussion
This is actually an HCI paper than, sketch regocnition, though sketch recognition could be thought of a subset of HCI. The system sounds neat, but I need to see it in action for a good evaluation.

Reading #22 Plushie

Comment:
Sam

Summary
Plushie is a system that allows non-experienced users to design their own plush toys using the Teddy system. The gesture based interface provide tools tailored for manipulating the mesh for sewing. The system can also run a physics simulation on the mesh to simulate the dynamics of the real toy.

Discussion
I think this is a cool way to commercialize Teddy; Teddy has a almost grandma friendly interface, and the models design using Teddy system looks like nothing but plush toys. I'd like to know how well it did in the market.

Reading #20 MathPad2

Comment:
Chris

Summary
The paper presents a prototype application for sketching math equations and diagrams. The application allows handwriting of regular math notation and free form diagrams. The relations between the two can be established implicitly by the application or by explicit user gestures. This process is called association. The application can also solve some complex equations.

Discussion
I this is a very important domain in sketch recognition. Writing equations+diagrams for papers using mark-up languages has always been painful for scholars. I would not care about the equation solver, but if this tool just worked fine in converting hand written equations & diagrams to a mark-up language, it would be a useful one.

Reading #21 Teddy

Summary
Teddy is pretty much the paper that started all about the sketch based modeling. The idea is that given a closed free-from curve, finding its medial axis and placing evenly distributed circles on the medial axis perpendicular to the projection plane. These circles are then combine together by triangulation and a mesh is generated. Using the same interface user can modify the mesh, create handles or holes.

Discussion
I actually implemented a fair amount of ideas from this paper for my own research. The time it was published, it was like revolutionary approach to modeling. However it could not find real life applications besides being a digital toy.

Monday, December 13, 2010

Reading #18 Spatial Recognition Text & Graphics

Summary

This paper is another one using relation graphs like #16, but it uses the graph to build spatial relationships between strokes. After building the graph, they use classifier to classify the known subgraphs.

Discussion

It has been interesting to me again, since it used graphs, but I did not quite get how their classifier worked. If someone can comment on that, I'd be happy.

Reading #17 Distinguishing Text vs Graphics

Comment: sampath

Summary

This is another text distinguishing paper. It uses a feature set that includes gaps between strokes, their relation to each other and some characteristic features for classification, and uses about 9 of them.

Discussion
The classifier they use sounds like an overkill to me. The entropy paper was a lot simpler and still had similar accuracy.

Reading #16 Graph Based Symbol Recognizer

Comment:
Johnatan.

Summary

As the name suggest, this paper recognizes symbols by building a relational graph and matching the graph to the existing graph templates.

Graph isomorphisim is an np-hard problem in general. This paper uses 4 methods to find the closest match; Stochastic match, Greedy, Sort and Error Driven Matching.

The method scored about %90 accuracy tested on about 20 shapes.

Discussion
This idea is particularly interesting to me since I have been using graphs for my recognition assignments. And actually our final project was almost entirely based on this idea. Graph isomorphisim is an hard problem, but you can reduce the complexity by benefiting from geometry data as we&they did.

Saturday, December 11, 2010

Reading #15 Image Based Recognizer

Summary
This paper introduces an image based approach to sketch recognition as the title suggest. In this method, is sketch is compared to the bitmap templates as 48x48 bitmap image. The rotation invariance is achieved using polar coordinate analysis. The method scored an overall %90 accuracy.

Discussion
This paper essentially discusses a grid based method. The idea is similar to "electronic napkin". I am actually surprised to see their %90 accuracy. I'd expect to be lower than that.

Reading #14 Shape vs Text

Summary
This paper brings up the shape vs text discussion again by handling it from entropy side, which is a degree of a randomness in a source. Text strokes have generally higher entropy than the shape strokes and the paper is based on this idea.It introduces an entropy model alphabet, which stores the degree of curvature each letter goes through and measures the density based on the bbox of letters. The method had %92 overall accuracy in distinguishing text.

Discussion
I believe that this paper does a better job than ink features handling this problem. I think entropy should be the way to go when it comes to distinguishing texts from shapes.

Reading #13 Ink Features

Summary
This paper experimented a method to distinguish shapes over texts using a feature set that includes curvature, speed, intersection and etc that add up to 46. The experiment was tested on 26 participants and only 8 of these features were found to be helpful in the recognition

Discussion:
The problem addressed in this paper is an important one but the contribution of the paper seems to be nothing substantial.

Reading #12 Constellation Models

Summary
This paper talks about a pictorial approach for recognizing strokes of a certain classes. The approach is describing the parts of the big picture in relation to other parts in the scene. The lower parts are recognized by a very simple feature set such as bounding boxes, slopes, diagonals etc.

Each model is trained with labeled data and a provability distribution is computed for each object. Then the recognition process turns out to be running an ML search for the given case. The model was tested on facial recognition.

Discussion
The paper reminded me LADDER where the constrains were declared between lower level shapes to recognize higher level shapes. For sure, this is a common recognition approach, but it can fail drastically if recognition of lower level shapes fails.

Reading #11 LADDER

Summary
This paper describes LADDER, a cool language developed for describing shapes for
recognition purposes. Using LADDER, one can define shapes by describing geometric constrains between primitives, lines, curves, arcs, rectangles, polygons etc. These constraints define how these primitives should interact with each other to form a meaningful shape. The constraints are based on human perception rather than precise distance measures, such as parallel, above, perpendicular etc.

Using LADDER, it is also possible to construct high level shapes using low level constructed shapes. So it works like nesting matryoshka dolls in that sense. It is also possible to specify options to override recognition and allow beautification.

Discussion
LADDER is a first. I think it is pretty cool in that sense. I really like that it uses relaxed constraints based on human cognition rather than scientific measure. Some challenge to it is that it becomes harder to define shapes programmaticly as complexity and details increases. However, it sill works fine most of the time.

Monday, October 18, 2010

Reading #10 Herot

Summary:
This paper presents the Hunch System, a set of Fortran programs design to process freehand sketches. The programs are designed to work modular, one's out put can be an input to another one. STRAIT, the corner finding program, finds corners based on speed and curvature. CURVIT fits the strokes to B-splines.

STRAIT used a Latching method to merge vertices that fell in the same radius which failed in some cases, particularly the corners that were apart in z values were being lathed. Overtracing, a users intention to represent a single line with several overlapping strokes had the similar problem. It was not easy to distinguish between two carefully drawn parallel strokes and overtraced ones. The program could also infer 3D data in perspective drawings by using some simple projection rules. It also had a room finding algorithm for simple floor plans that worked by mapping the drawing to simple 2d grid.

Discussion:

Considering the STRAIT and CURVIT, this paper could be thought of father of PaleoSketch. This paper was particularly interesting to me since they tried to do some of the work I'm doing for my research 30 years ago :) I was amazed to see that they ran into similar king of problems that I did, especially in latchig, overtracing.

Saturday, October 16, 2010

Reading #9 PaleoSketch

Summary
PaleoSketch is a low level recognition that recognize eight primitive shapes and their combinations. The system also gives beautified versions of these shapes. The recognition process starts with pre-recognition where a stoke is processed and duplicate points are removed from the stroke. Next a series of graphs and values are computed from the stroke data. In addition, the paper introduces new features; NDDE & DCR. Also tails are removed in pre-recognition step. Finally, the stroke is tested against over-tracing and closed conditions in this phase.

In the recognition process, lines are tested against all 8 primitives one by one; Line, PolyLine, Ellipse, Spiral, Arc, Circle, Curve and Complex Type. Each test has its own requirements. Most of them computes the error between the best fit of the primitive and original stroke and compares it against a threshold. If the stroke meets the requirements of a test, it is called as that primitive.

PaleoSketch uses a ranking system to pick the best primitive guess for the given stroke. Each primitive has an initial score based on its complexity. The score of a complex primitive is computed as sum of the scores of its components. The best shape is the one with lowest score.

Discussion
PaleoSketch is the best primitive recognizer available in the market :) It not only breaks a stroke down into primitive components but can also optionally beautify the output. Having used it for academic purposes, I must say that I was mostly satisfied
with its performance.

Monday, October 11, 2010

Reading #8 $N recognizer

Summary
As the title suggest, this paper is an extension to $1 recognizer with an attempt to make it a multiple stroke recognizer. $N share similar goals with $1 but it's more versatile by
1.recognizing gestures comprising multiple strokes,
2. Automatically generalizing from one multistroke template to all possible multistrokes with alternative stroke orderings and directions
3. recognizing 1D gestures such as lines
4. providing bounded rotation invariance.

$N precomputes all possible permutations of stroke orders and directions in a multistroke gesture. At runtime, when a multistroke gesture is drawn, its single stroke components are connected in the order they have drawn to make a unistroke. Then the unistroke is compared against unistroke templates using euclidian distance as in $1. The best matching template is picked.

Reading #7 Sketch Based Interfaces

Summary
In this paper, authors intended to bring traditional paper&pen pipeline to user interfaces particularly in design software applications. Their approach consists of three phases:

Approximation; fits lines and curves to a given set of noisy pixel data. Paper talks about how they combined speed and curvature data for corner finding.

Beautification; modifies the approximation and makes it more visually pleasing.

Basic Recognition; Recognizes the basic primitives such as circle, rectangle, triangle etc.

The system helped users to draw beautified shapes only by freehand, without the need to switch between different design tools (ie line tool, rectangle tool, circle tool)

Discussion
In my understanding, this paper serves as a good starting point for scholars outside the field. It covers most of the major aspects in sketch recognition and also introduces common problems in the field.

Wednesday, September 22, 2010

Reading #6 Protractor

Summary
Protractor, by Yang Li, introduces a template based recognizer just like $1 recognizer, but with a twist: The angle of the vector from centroid to the sample point is used for error computation instead of distance. The protractor consumes less memory and time than $1 recognizer. The algorithm is stated to be suitable for mobile devices

Discussion
Angle might seem to be a better metric than position at first sight, but there is the fact that a stoke point on the same vector will give the same angle regardless of its position which may cause error in recognition.

Monday, September 20, 2010

Reading #5 $1 Recognizer

Summary
As the title suggests, a cheap way of gesture recognition is introduced in this paper. Due to it's simplicity, the algorithm can be implemented on light-weight interfaces such as browsers and mobile devices and actually is designed for that purpose in mind.

The recognizer has the following steps:
-Resample stroke points so you can have a more uniform distribution of them on the stroke.
-Reset the orientation of stroke by rotating it based on indicative angle; the angle between the first point and the centroid.
-scale&translate
-compute the total distance from each template and pick the template which minimizes the distance.

Discussion
I think this paper presents the most straightforward approach to sketch recognition. It is so straightforward that I could almost get the same type of answer if I asked my mum to describe a recognition algorithm. But I guess someone has to publish that paper and it should be there in the literature regardless of its complexity. But it works.

Monday, September 13, 2010

Reading #4 Ivan's Sketchpad

Summary
In this paper, Ivan Sutherland, one of the most influential names in CG, talks about an HCI device even before mouse. SketchPad is a pen based system used to draw and define geometric shapes and also apply constrains upon them.

The first part focuses on drawing the primitives and the overall design of the system. The second part explains the data structure of the system and the well defined format geometries are stored for other applications. The ring structure allowed efficient operations on the shapes.

The rest of the paper talks about how light pen is tracked and the shapes are drawn to the screen. It also talks about the constraints that can be placed on shapes.

Discussion
I think that this paper is a fundamental stone for CAD/CAM systems. It addresses main issues of CAD systems from user interface to data structures. Probably 60 years ago the material was very innovative. Today I feel like any decent computer scientist can unsurprisingly solve it.

Wednesday, September 8, 2010

Reading #1 Gesture Recognition (Hammond)

The paper is pretty much gesture recognition GREC 101. First thing it stresses out is that each gesture must be a path of pen in a single stroke and drawn in the same manner each time to be properly recognized.

Each stroke is represented as a vector in a gesture recognition system. The paper introduces two gesture recognition methods:

Rubin's method, which is the most popular one, computes 13 features of the stroke vector and uses a trained linear classifier to recognize a gesture. Rubin's method is reported as %95 accurate.

Next comes up Long's gesture recognition, which extended Rubin's 13 feature to 22 (took 11 from Rubin's and added 11 more combinations).

The paper finally introduces Wobbrocks $1 Gesture Recognizer, an easy to implement method but slower than linear classifier at run time. But not that you can find it in a dollar store :P

Discussion:
My personal scientific (?) opinion is that this paper is a great 101 to the subject matter. It is an easy read and popular gesture recognizers are clearly outlined. Maybe, I'd expect to see some more on Long's recognizer and it's comparison to Rubin's.

Monday, September 6, 2010

Me, Myself, etc

Contact: mozgurgonen@yahoo.com

Standing: 1st year PhD w/masters.

Why 624: My research is on sketch based modeling & 2.5D sketching.

Can Bring: I have done fair amount of work with graph theory; topology construction/manipulation.

In 10 Years: In front of the computer.

Biggest advancement in CS: Machine Learning will advance. Not skynet, but smarter software that customizes itself based on user needs.

Favorite Undergrad Course: Data Structures.

Favorite Movie: Le Fabuloux Destin d'Amelie Poulain. Because watching it always makes you feel better.

Time Travel: Larry Page & Sergey Brin, when they launched Google in a garage. It'd be fun to ask them where they saw themselves in ten years.

Interesting Fact: I thought of a Facebook like college-networking site when I was in undergrad, but I was discouraged by some friends who claimed that there were gazillions of similar websites :/