Image Annotation: Upgrading a Picture from Pixels to Knowledge
2. Image Annotation Approaches
Approaches to image annotation can be divided into three groups:
- Text-over-image annotation. Textual descriptions of individual elements or parts of the image are placed directly on the image providing clear identification and understanding to which object each description refers.
- Referencing system. Textual descriptions are brought out of the image. To link descriptions with corresponding elements, different references are applied.
- Captions. Textual descriptions are brought out of the image without placing references marks on the image. Instead, there contain wordy references in the description itself.
Clear identification can be provided by labeling an image with textual notes placed directly on an image. This approach ensures visual connection of an object with the corresponding label so it could be considered a good solution to provide semantic meta-data about individual elements of the image.
This group combines approaches that offer different solutions.
This approach offers to consider annotation labels placed on the image an integrated part of the picture. It can be done in many graphics programs that enable using text in images.
Advantages:
- Clear identification. As annotation labels are placed on the image, they appear next to the individual elements so it is easy to understand to which object each label refers.
- Transferability. Labels are always visible in any program. If the image is shared, annotation will never be lost as it became a part of the image.
Disadvantages:
- Lack of searchability. Annotation is worth nothing if it cannot be processed in some way. Text in the annotation labels cannot be indexed and used by search engines in order to improve search results.
- Original image is irretrievably spoiled. It is not acceptable for most business applications. There is no other way to hide the text as editing the image or having an original picture along with its annotated copy.
One of the possible solutions to keep the original view intact is storing annotation separately from an image in a proprietary database. Special tools are used to retrieve annotation and display it over the picture without merging with the original image. For example, this approach is implemented in Flickr, a popular photo sharing service owned by Yahoo.
Advantages:
- Clear identification. Flickr enables annotating of parts on the image and define borders of the annotated region. It provides clear understanding to which object or area exactly the annotation refers.
- Original view remains intact. Annotation is stored in the proprietary database and displayed over the image. It allows making markup invisible and show object-specific annotation when the mouse pointer is rolled over the appropriate element.
Disadvantages:
- Lack of transferability. Annotations can be accessed only within the specific application. For example, an image stored on the Flickr site and accompanied with Flickr notes, comes without any notes after downloading or sending by email.
- Limited searchability. Limiting an ability to annotate and search images by a specific application or Website leads to impossibility to find an image by an annotation which refers to a specific element if the picture is located out of the proprietary environment. Search engine companies are trying to meet this challenge through integration of search results with images hosted on photo sharing services that they own.
However, capabilities for improving image search remain limited by those social networking sites that belong to the search engine company. (In the today market landscape, there is little doubt that Google will provide some day a mechanism for finding images on Flickr.) As for the rest of images located throughout the Web, they have to be reached by relying on file characteristics rather than on content of the image. (Read about efficiency of image search in Comprehensive and Relevant Image Search in the Web is about Precise Ad Targeting.)
- Inability to gain the "big picture". Since annotation is displayed "on-demand" basis, there is no way to know which objects are annotated and gain preliminary knowledge about content of the image.
This group combines approaches that try to keep the original view intact by bringing annotation out of the image. To provide clear identification, these approaches offer different referencing systems that allow linking descriptions with the corresponding objects on the image.
One of the widespread techniques used in various fields is marking individual elements with numbers which refer to descriptions placed next to the image.
| 1. |
|
Chipset |
| 2. |
|
VGA Output |
| 3. |
|
Accelerated Graphics Port interface |
| 4. |
|
IDE interface |
| 5. |
|
Memory |
| 6. |
|
Interface Protocol Card Interface with standard VGA feature 26-pin connector |
On one hand, this approach can be useful if the image contains many figures and textual descriptions, especially if they are too lengthy, can overlap the view. Placing the descriptions out of the image solves this problem and helps identify even little objects. This approach is widely used on geographic maps to identify small countries when there is no room for on-map captions.
On the other hand, annotation is now stored separately from the image that makes sharing or moving the image to another location harder and requires creation of a whole distribution package. It should include not only the original image, but information about linking descriptions to the corresponding objects. If the referring numbers are merged with the picture while descriptions are stored in a separate file, it irretrievably spoil the original image.
Facebook offers a similar solution by providing an ability to highlight individual elements on the photo when a user rolls the mouse pointer over one of the descriptions appearing below the picture.
Advantages:
- Clear identification. As each object has a reference to the corresponding description, it is relatively easy to understand to which element the description refers.
- Original view remains intact. This approach keeps the original image intact which makes it similar to the technology implemented in Flickr.
- “Big picture” is available. Unlike Flickr, the Facebook approach helps gain the “big picture” as a user can view all descriptions and gain preliminary knowledge about content of the image.
Disadvantages:
- Lack of transferability. Since annotation is stored in the proprietary database, it is lost when the image is published in the Web, sent by email, or shared by using other means.
- Limited searchability. Like Flickr, it uses the proprietary database for storing annotation that allows to find annotated photos only within the specific Web site.
This approach can help keep the original image intact though it faces many other challenges. Unlike the approaches described above, the original picture does not contain any references. A schematic copy is created instead. It contains contours of figures with references to descriptions.
See how the original image and its copy with contours can look.
Advantages:
- The original image remains intact.
Disadvantages:
- Limited transferability. Now two images – original and the schematic copy – are required to be distributed.
- Limited search. It cannot be effectively searched: even if the text is indexed by a search engine, a search result will include the schematic copy rather than original image.
- Problematic identification. A viewer has to constantly switch attention between the original image, schematic copy, and annotation that makes working with image content less convenient, especially if the image contains many individual elements that should be annotated.
Another way to provide object-specific information without placing text directly on an image is creating fragments that contain a part of the image. Annotation is then written to each fragment individually.
See how the original image and its fragments can look.
Advantages:
- The original image remains intact.
- Clear identification.
Disadvantages:
- Limited transferability. Since comments are disjointed from the image, they are required to be distributed together with the original image. Moreover, now fragments should be also included into the distribution package that makes this approach useless when it comes to sharing.
- Limited search. The approach also does not provide an effective solution for search. Since descriptions annotate specific fragments stored as separate entities, another referencing system that would link the fragments to the original image is required. Otherwise, search results will include the fragments only without an ability to access the original picture.
One of the most widespread approaches to record object-specific information is writing comments appearing next to the image. These comments describe the objects in the same order as corresponding objects appear on the picture.
From left to right: Laura Bush (wife of
USA President George Bush), Bernadette Choron de Courcel (wife of French President Jacques Chirac), Sousa Uva Barroso (wife of European Commission President Jose Manuel Barroso), Flavia Franzoni (wife of Italian Prime Minister Romano Prodi), Lyudmila Putin (wife of Russian President Vladimir Putin), Laureen Harper (wife of Canadian Prime Minister Stephen Harper), Cherie Booth (wife of British Prime Minister Tony Blair).
Probably, it was not just the matter of diplomatic protocol to make photographs of political leaders standing in one line. It was the way to ensure easy identification when writing “who is who” notes because a reporter should just list the names in the same order as they stood on the photo.
Technically, this approach is implemented in various tools that embed description into JPEG file.
Advantages:
- The original image remains intact. The original view is not spoiled as annotations are stored separately, but still in the same file.
- Transferability. Since such tools use recognized standards to store annotation in a JPEG file, such as EXIF or XMP, descriptions are available in many tools that support these formats and distributed together with the file wherever it goes.
Disadvantages:
- Non-clear identification. This approach still considers an image as a minimum unit of content as there is no convenient way to clearly associate description with an individual object. While it can be used for storing image name or general description, it is less useful when it comes to providing information about specific elements of the image.
- Limited search. Although annotation can be searched by standard search engines, search result will include the whole image without possibility to refer to an individual element on the image.
Continue reading...
|