Monday, 17 October 2016

MM 7152 DIGITAL IMAGE PROCESSING AND PATTERN RECOGNITION

M.E. / M.TEch. DEGREE EXAMINATION

FIRST ASSESSMENT TEST

PART - A (5 X 2 = 10 Marks)

1. How many pixels would be there in a 128 X 128 grey scale image? How much time it takes to transmit this image using a 256 Kbps modem.

2. List out two image applications in visible light and microwave region of a EM wave spectrum.

3. What is a spatial convolution and correlation? What is the convolution and correlation of an one dimensional image [ 1 3 5] with mask [1 1].

4. What is a basis image? What is an mathematical transform and list out its advantages?

5. List out the differences between image enhancement and image restoration?

PART - B

Answer any Two Questions ( 2 X 5 = 10 Marks)

1. What is DFT of the sequence { 1, 3, 3, 1}. Verify whether inverse Fourier transform works.

2. List out the stages of a image processing application with a neat diagram. List out the important points of every stage.

3. Consider the following 8 X 8 image frequency distribution and apply histogram equalization and specification.

Intensity	0	1	2	3	4	5	6	7
Pixels	2	2	8	8	10	10	12	12

The target histogram should be like

Intensity	0	1	2	3	4	5	6	7
Pixels	0	0	10	10	10	20	10	4

Plot the histograms.

Sunday, 17 May 2015

Computer Vision Resources

Textbooks

Computer Vision: Algorithms and Applications (Szeliski, 2010)
Computer Vision: A Modern Approach (Forsyth and Ponce, 2003)
Multiple View Geometry in Computer Vision (Hartley and Zisserman, 2004)
Feature Extraction & Image Processing for Computer Vision (Nixon and Aguado, 2012)
Machine Vision (Jain, Kasturi, Schunck, 1995)
Robot Vision (Horn, 1986)
Introductory Techniques for 3-D Computer Vision (Trucco and Verri, 1998)
Computer Vision (Shapiro and Stockman, 2001)
Three-Dimensional Computer Vision (Faugeras, 1993)
A Guided Tour of Computer Vision (Nalwa, 1993)
Computer Vision (Ballard and Brown, 1982)
Readings in Computer Vision (Fischler and Firschein, 1987)
Vision in Man and Machine (Levine, 1985)
Computer Vision (Kasturi, 1991)
Computer Vision for human-machine interaction (Cipola, 1998) Vision (Marr, 1982)

Conferences

The main computer vision conferences:

Other regular conferences of interest:

See also Keith Price's listing of computer vision conferences

Journals and E-publishing

Technical papers

CiteSeer ResearchIndex - Just about any technical article related to computer vision that is available online can be found here, and links to related papers
Google Scholar
UCSB Library Electronic Journals
IEEE Xplore
ACM Digital Library
Keith Price's annotated computer vision bibliography

Trade publications

Newsgroups

The Vision List

At UCSB

Vision related research at UCSB

Human and biological vision, perception

Other links of interest

Video, etc.

Basic principles of video (pdf)
Video Demystified (pdf)

Tools

OpenCV - open source computer vision library
opencv.org
OpenCV DevZone

Programming

Matlab

Mathematics

Linear Algebra review 1
Linear algebra review 2
Multivariable Calculus
Linear Algebra text (pdf) [Especially chapters 1, 2, 6, 8]
Linear transformations
Matrix analysis and applied linear algebra
Linear algebra tutorial
Linear algebra in a nutshell (pdf)

Probability and Statistics

Thursday, 30 April 2015

Python - A Powerful Friend

"Remarkable power with very clear syntax"" - That is Python if explained in simple words.

Are you bored of writing lots of lines and another large number of symbols used in C++ and C? Go to Python. Everything is fine there.

Python is an interpreted, general-purpose high-level programming language whose design philosophy emphasizes code readability. Its standard library is large and comprehensive.

Guido Van Rossum is Python's principal author, and his continuing central role in deciding the direction of Python is reflected in the title given to him by the Python community, Benevolent Dictator for Life (BDFL).

The Python implementation is under an open source license that makes it freely usable and distributable, even for commercial use. The Python license is administered by the Python Software Foundation.

Some of its key distinguishing features include:

very clear, readable syntax
strong introspection capabilities
intuitive object orientation
natural expression of procedural code
full modularity, supporting hierarchical packages
exception-based error handling
very high level dynamic data types
extensive standard libraries and third party modules for virtually every task
extensions and modules easily written in C, C++
embeddable within applications as a scripting interface

Large organizations that make use of Python include Youtube, BitTorrent, Google, Yahoo!, CERN, NASA and ITA. Most of the Sugar software for the One Laptop per Child XO, now developed at Sugar Labs, is written in Python.

Presently, Python 3.2 has been released. But Linux distributions like Fedora, Ubuntu still uses Python 2.7.
(NB:- In this tutorial we will be using Python 2.7)

References:
Official Website:    http://www.python.org/
Documentation for Python 2.7:    http://docs.python.org/tutorial/

Recommended Books:

" A Byte of Python" by Swaroop C. H.
       Simplest tutorial on Python. A good location to start, if you are new to Python. Read the book here, http://www.ibiblio.org/swaroopch/byteofpython/read/

" Dive into Python" by Mark Pilgrim.
        Next step after "A byte of Python". Read the book here,http://www.diveintopython.net/toc/index.html

" Learning Python" by Mark Lutz.
       A really big book with more than 1000 pages. Have a look at the book here: "Learning Python"

Or even better, just Google "Python Tutorials". You will get 1000s of books. But I think above three will be more than enough.

OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real time computer vision, developed by Intel and now supported by Willow Garage. It is free for use under the open source BSD license. The library is cross-platform. It focuses mainly on real-time image processing. The library has more than 2000 optimized algorithms.

History:

OpenCV was started at Intel in 1999 by Gary Bradski for the purposes of accelerating research in and commercial applications of computer vision in the world and, for Intel, creating a demand for ever more powerful computers by such applications. Vadim Pisarevsky joined Gary to manage Intel's Russian software OpenCV team. Over time the OpenCV team moved on to other companies and other Research. Several of the original team eventually ended up working in robotics and found their way to Willow Garage. In 2008, Willow Garage saw the need to rapidly advance robotic perception capabilities in an open way that leverages the entire research and commercial community and began actively supporting OpenCV, with Gary and Vadim once again leading the effort.

OpenCV's application areas include:

2D and 3D feature toolkits
Egomotion estimation
Facial recognition system
Gesture recognition
Human–computer interaction (HCI)
Mobile robotics
Motion understanding
Object Identification
Segmentation and Recognition
Stereopsis Stereo vision: depth perception from 2 cameras
Structure from motion (SFM)
Motion tracking

To support some of the above areas, OpenCV includes a statistical machine learning library that contains:

Boosting
Decision tree learning
Gradient boosting trees
Expectation-maximization algorithm
k-nearest neighbor algorithm
Naive Bayes classifier
Artificial neural networks
Random forest
Support vector machine (SVM)

( All details extracted from Wikipedia)

References:

OpenCV official page: http://opencv.org/
OpenCV Documentation: http://docs.opencv.org/
OpenCV Q&A Forum : http://answers.opencv.org/questions/
OpenCV Developer Zone : http://code.opencv.org/

Installing OpenCV from prebuilt binaries

Below Python packages are to be downloaded and installed to their default locations.

1.1. Python-2.7.x.

1.2. Numpy.

1.3. Matplotlib (Matplotlib is optional, but recommended since we use it a lot in our tutorials).
Install all packages into their default locations. Python will be installed to C:/Python27/.
After installation, open Python IDLE. Enter import numpy and make sure Numpy is working fine.
Download latest OpenCV release from sourceforge site and double-click to extract it.

Goto opencv/build/python/2.7 folder.
Copy cv2.pyd to C:/Python27/lib/site-packeges.
Open Python IDLE and type following codes in Python terminal.
```
>>> import cv2
>>> print cv2.__version__
```

If the results are printed out without any errors, congratulations !!! You have installed OpenCV-Python successfully.

Wednesday, 29 April 2015

Toward practical compressed sensing

(Courtesy Larry Hardesty)

The last 10 years have seen a flurry of research on an emerging technology called compressed sensing. Compressed sensing does something that seems miraculous: It extracts more information from a signal than the signal would appear to contain. One of the most celebrated demonstrations of the technology came in 2006, when Rice University researchers produced images with a resolution of tens of thousands of pixels using a camera whose sensor had only one pixel.

Compressed sensing promises dramatic reductions in the cost and power consumption of a wide range of imaging and signal-processing applications. But it’s been slow to catch on commercially, in part because of a general skepticism that sophisticated math ever works as well in practice as it does in theory. Researchers at MIT’s Research Laboratory of Electronics (RLE) hope to change that, with a new mathematical framework for evaluating compressed-sensing schemes that factors in the real-world performance of hardware components.

“The people who are working on the theory side make some assumptions that circuits are ideal, when in reality, they are not,” says Omid Abari, a doctoral student in the Department of Electrical Engineering and Computer Science (EECS) who led the new work. “On the other hand, it’s very costly to build a circuit, in terms of time and also money. So this work is a bridge between these two worlds. Theory people could improve algorithms by considering circuit nonidealities, and the people who are building a chip could use this framework and methodology to evaluate the performance of those algorithms or systems. And if they see their potential, they can build a circuit.”

Mixed reviews

In a series of recent papers, four members of associate professor Vladimir Stojanovic’s Integrated Systems Group at RLE — Abari, Stojanovic, postdoc Fabian Lim and recent graduate Fred Chen — applied their methodology to two applications where compressed sensing appeared to promise significant power savings. The first was spectrum sensing, in which wireless devices would scan the airwaves to detect unused frequencies that they could use to increase their data rates. The second was the transmission of data from wireless sensors — such as electrocardiogram (EKG) leads — to wired base stations.

At last year’s International Conference on Acoustics, Speech, and Signal Processing, the researchers showed that, alas, in spectrum detection, compressed sensing can provide only a relatively coarse-grained picture of spectrum allocation; even then, the power savings are fairly meager.

But in other work, they argue that encoding data from wireless sensors may be a more natural application of the technique. In a forthcoming paper in the journal IEEE Transactions on Circuits and Systems, they show that, indeed, in the case of EKG monitoring, it can provide a 90 percent reduction in the power consumed by battery-powered wireless leads.

The reason the Rice camera could get away with a single-pixel sensor is that, before striking the sensor, incoming light — the optical signal — bounced off an array of micromirrors, some of which were tilted to reflect the signal and some of which weren’t. The pattern of “on” and “off” mirrors was random and changed hundreds or even thousands of times, and the sensor measured the corresponding changes in total light intensity. Software could then use information about the sequence of patterns to reconstruct the original signal.

Ups and downs

The applications the RLE researchers investigated do something similar, but rather than using mirrors to modify a signal, they use another signal, one that alternates between two values — high and low — in a random pattern. In the case of spectrum sensing, the frequency of the input signal is so high that mixing it with the second signal eats up much of the power savings that compressed sensing affords.

Moreover, the time intervals during which the second signal is high or low should be of precisely equal duration, and the transition from high to low, or vice versa, should be instantaneous. In practice, neither is true, and the result is the steady accumulation of tiny errors that, in aggregate, diminish the precision with which occupied frequencies can be identified.

An EKG signal, however, is mostly silence, punctuated by spikes every second or so, when the heart contracts. As a consequence, the circuitry that mixes it with the second signal can operate at a much lower frequency, so it consumes less power.

Abari, however, says he hasn’t given up on applying compressed sensing to spectrum sensing. A new algorithm called the sparse Fast Fourier Transform, developed at MIT, would modify the signal in the spectrum-sensing application in a way that offsets both the loss of resolution and the increase in power consumption. Abari is currently working with EECS professor Dina Katabi, one of the new algorithm’s inventors, to build a chip that implements that algorithm and could be integrated into future compressed-sensing systems.

What is DFT ?

(Courtesy Larry Hardesty - MIT)

Science and technology journalists pride themselves on the ability to explain complicated ideas in accessible ways, but there are some technical principles that we encounter so often in our reporting that paraphrasing them or writing around them begins to feel like missing a big part of the story. So in a new series of articles called "Explained," MIT News Office staff will explain some of the core ideas in the areas they cover, as reference points for future reporting on MIT research.

In 1811, Joseph Fourier, the 43-year-old prefect of the French district of Isère, entered a competition in heat research sponsored by the French Academy of Sciences. The paper he submitted described a novel analytical technique that we today call the Fourier transform, and it won the competition; but the prize jury declined to publish it, criticizing the sloppiness of Fourier’s reasoning. According to Jean-Pierre Kahane, a French mathematician and current member of the academy, as late as the early 1970s, Fourier’s name still didn’t turn up in the major French encyclopedia the Encyclopædia Universalis.

Now, however, his name is everywhere. The Fourier transform is a way to decompose a signal into its constituent frequencies, and versions of it are used to generate and filter cell-phone and Wi-Fi transmissions, to compress audio, image, and video files so that they take up less bandwidth, and to solve differential equations, among other things. It’s so ubiquitous that “you don’t really study the Fourier transform for what it is,” says Laurent Demanet, an assistant professor of applied mathematics at MIT. “You take a class in signal processing, and there it is. You don’t have any choice.”

The Fourier transform comes in three varieties: the plain old Fourier transform, the Fourier series, and the discrete Fourier transform. But it’s the discrete Fourier transform, or DFT, that accounts for the Fourier revival. In 1965, the computer scientists James Cooley and John Tukey described an algorithm called the fast Fourier transform, which made it much easier to calculate DFTs on a computer. All of a sudden, the DFT became a practical way to process digital signals.

To get a sense of what the DFT does, consider an MP3 player plugged into a loudspeaker. The MP3 player sends the speaker audio information as fluctuations in the voltage of an electrical signal. Those fluctuations cause the speaker drum to vibrate, which in turn causes air particles to move, producing sound.

An audio signal’s fluctuations over time can be depicted as a graph: the x-axis is time, and the y-axis is the voltage of the electrical signal, or perhaps the movement of the speaker drum or air particles. Either way, the signal ends up looking like an erratic wavelike squiggle. But when you listen to the sound produced from that squiggle, you can clearly distinguish all the instruments in a symphony orchestra, playing discrete notes at the same time.

That’s because the erratic squiggle is, effectively, the sum of a number of much more regular squiggles, which represent different frequencies of sound. “Frequency” just means the rate at which air molecules go back and forth, or a voltage fluctuates, and it can be represented as the rate at which a regular squiggle goes up and down. When you add two frequencies together, the resulting squiggle goes up where both the component frequencies go up, goes down where they both go down, and does something in between where they’re going in different directions.

The DFT does mathematically what the human ear does physically: decompose a signal into its component frequencies. Unlike the analog signal from, say, a record player, the digital signal from an MP3 player is just a series of numbers, each representing a point on a squiggle. Collect enough such points, and you produce a reasonable facsimile of a continuous signal: CD-quality digital audio recording, for instance, collects 44,100 samples a second. If you extract some number of consecutive values from a digital signal — 8, or 128, or 1,000 — the DFT represents them as the weighted sum of an equivalent number of frequencies. (“Weighted” just means that some of the frequencies count more than others toward the total.)

The application of the DFT to wireless technologies is fairly straightforward: the ability to break a signal into its constituent frequencies lets cell-phone towers, for instance, disentangle transmissions from different users, allowing more of them to share the air.

The application to data compression is less intuitive. But if you extract an eight-by-eight block of pixels from an image, each row or column is simply a sequence of eight numbers — like a digital signal with eight samples. The whole block can thus be represented as the weighted sum of 64 frequencies. If there’s little variation in color across the block, the weights of most of those frequencies will be zero or near zero. Throwing out the frequencies with low weights allows the block to be represented with fewer bits but little loss of fidelity.

Demanet points out that the DFT has plenty of other applications, in areas like spectroscopy, magnetic resonance imaging, and quantum computing. But ultimately, he says, “It’s hard to explain what sort of impact Fourier’s had,” because the Fourier transform is such a fundamental concept that by now, “it’s part of the language.”

Digital Image Processing - Dr.S.Sridhar Blogs