Training ARToolKit NFT to a new surface
From ARToolworks support library
Main Page > ARToolKit NFT > Training ARToolKit NFT to a new surface
About surfaces we can use with ARToolKit NFT
ARToolKit NFT tracks from natural features of planar textured surfaces. The current implementation of the tracking algorithm requires that the visual appearance of the surface is known in advance. Thus, in advance we have to "train" the system to the appearance of a particular surface which we want to use for tracking. The output of this training is a set of data which can be used for realtime tracking in our application.
The following constraints apply to surfaces which can be used with ARToolKit NFT.
- The surfaces to be tracked must be supplied as a rectangular image. Currently only jpeg images are supported.
- The surface must be textured and have a reasonable amount of fine detail (i.e. it must have a low degree of self-similarity and high spatial-frequency). Images with large areas of single flat color will not track well (if at all) since no unique features will be able to be identified in the interior of the areas of flat colour, and images that are blurred or have soft detail will similarly not track well.
- Larger or higher resolution images (more pixels) will allow the extraction of feature points at higher levels of detail, and thus will track better when the camera is closer to the image, or when a higher resolution camera is used.
The ARToolKit NFT 2.0 tracker does not impose any additional constraints.
Producing a digital image to be supplied to the training tools
The inputs to the NFT tracking process are
- a live image stream from a camera
- data (produced by the training tools) about the features of the tracked surface
- a digital image of the tracked surface itself
This section will help you in producing the third of these, the digital image.
Summary: A typical workflow for producing NFT markers proceeds thus:
- A high-resolution image which is to form the basis of the marker is obtained. If the source texture is on paper, it must be scanned. The image is saved in jpeg format.
- The resulting image is fed into the NFT training applications.
- If the image is to be printed, print on a good-quality colour printer, on low-gloss paper, to produce the final image which will be tracked.
Beginning with a pre-existing physical print
In many cases, it may is simplest to start with a physical print of the tracked surface. This might be also be best if
- you are augmenting the pages of a book, magazine, or other printed material for which you do not have the design artwork
- you have digital artwork, but the physical print differs considerably in brightness, colour or tone.
How big, and what resolution? During photography or scanning of pre-existing artwork, a natural question arises: what settings (scanner resolution or camera image size) to use? To answer this question, we must first ask two other questions:
- What is the physical size of the printed material? I.e., what is the width and height in inches or millimetres? Measure this accurately with a ruler. Common paper sizes include A4 (210 mm x 297 mm) and US Letter (8.5 inches x 11 inches, or 215.9 mm x 279.4mm).
- How close to the camera will the printed image be when in use? This relates to the required resolution, commonly expressed as pixels or dots per inch (DPI). To help answer this, use the "checkResolution" tool supplied with ARToolKit NFT. Click here for the checkResolution tutorial, then once you have determined the maximum resolution required, return to this page.
-
- If using a scanner or camera which needs a "resolution" setting, you can just directly use the maximum resolution calculated by the checkResolution tool.
- If using a scanner or camera which calculates in terms of image width and height, multiply the resolution calculated by the physical width and height:
width in pixels = width in inches x dots per inch height in pixels = height in inches x dots per inchIf you have the measurements in millimetres, you can convert to inches by dividing by 25.4 (i.e. there are 25.4 millimetres in one inch).
inches = millimetres / 25.4
- If using a scanner or camera which requires a "megapixels" setting, calculate the required width and height in pixels (above) and then multiply these together and divide by 1 000 000. E.g. 640x480 = 0.3 megapixels.
Checking the result After scanning or photography is completed, check that the resulting digital image is not blurred and has sufficient contrast. Washed-out blurry images work very poorly in the NFT training process.
Beginning with digital artwork
Producing the digital image for tracking from pre-existing digital artwork is simple. Care must be taken however to ensure that the image supplied to the training tools is not too big (wasteful of memory, disk and CPU during tracking) and not too small (of insufficient detail to allow tracking when the camera is close to the image.)
How big, and what resolution? Similarly to beginning from printed artwork, answering the question of what size image to use must take into account the output factors:
- What is the physical size of the printed material? I.e., what is the width and height in inches or millimetres? If the image is to be a page in a book, then the size of the pages might determine this factor. Common sizes include A4 (210 mm x 297 mm) and US Letter (8.5 inches x 11 inches, or 215.9 mm x 279.4mm), although you might choose to track only a portion of the page.
- How close to the camera will the printed image be when in use? This relates to the required resolution, commonly expressed as pixels or dots per inch (DPI). Consider also the physical limit of your printer, as this imposes an upper limit on the required resolution. Most laser printers produce 300dpi black and white images, while colour printers usually use a dot-screen at 150 dpi (although they may advertise higher resolutions, almost all use a 150dpi resolution). To help answer the resolution question, use the "checkResolution" tool supplied with ARToolKit NFT. Click here for the checkResolution tutorial, then once you have determined the maximum resolution required, return to this page.
- Multiply the required maximum printed resolution by the physical width and height of the printed image to calculate the width and height in pixels (the "pixel size" of an image as reported in your image editing application).
width in pixels = width in inches x dots per inch height in pixels = height in inches x dots per inchIf you have the measurements in millimetres, you can convert to inches by dividing by 25.4 (i.e. there are 25.4 millimetres in one inch).
inches = millimetres / 25.4
For example, borderless A4 at 150dpi is 1240 pixels wide and 1754 pixels tall. Borderless US Letter at 150dpi is 1275 pixels wide and 1650 pixels tall.
Checking the print After printing your digital artwork, check that the print is the correct size. A scaled print will still track, but will give scaled (and potentially misleading) tracking results (distance from camera etc.). Also, check that the print matches the artwork in terms of contrast, absence of print defects etc. Differences between the digital artwork and the physical print will reduce the robustness of the tracking, as some of the trained features may not be present on the print.
You can download the sample image "pinball.jpg" used in this tutorial here.
Physical print properties
Whether working from supplied printed material or a print from digital artwork, eventually the user needs an actual surface to hold in front of the camera.
It is important that the physical print is kept as flat as possible. Small amounts of curvature can be coped with by the tracker to some degree, but flat is best. Where possible, the print should be on or affixed to a physical prop that keeps it flat.
- If you were (for example) printing a label to be attached to a product, the label should be applied to a flat area of the product. The curved surface of a bottle or can would not be suitable, and alternatives could include the packaging holding the bottle or can, or on a flat label or tag attached to the product.
- If mounting in a book, surfaces should be printed on heavy card and bound with board-book, ring or spiral binding. If used as an unbound card, affix to the card with a dry glue (e.g. a glue stick or an industrial dry adhesive).
Decide on the image set resolutions
Most of the operation of the training utility programs procedure proceeds without much input from the user, but there is one important decision required prior to starting the training utility, which is selecting the resolutions at which features of the image will be extracted. (Generally, features are extracted at three or more resolutions to cope with the fact that dots in the image will appear at different resolution to the software depending on how close or far away the camera is from the image.)
For a typical webcam operating at VGA (640x480) resolution and tracking at handheld-distance from the surface, a range of resolutions between 20 dpi and 120 dpi is a good starting point. If using a higher-resolution webcam or tracking much closer to the surface, higher resolutions will be required. Note that there is no point in using resolutions higher than the actual resolution of the final printed surface.
The utility program "checkResolution" can help with the decision of what values to use as minimum and maximum resolutions. Click here to see the usage instructions for checkResolution.
After completing a training pass, it will pay to come back to the choice of image set resolutions and experiment with different minimum and maximum resolutions. The choice depends greatly on the way in which you intend to use ARToolKit NFT for tracking, and your source images.
If you have further questions, it would pay to ask questions of the ARToolworks support staff, and/or other users of ARToolKit NFT, on the support forum.
Generating an ARToolKit NFT dataset from the digital image
As mentioned above, the inputs to the NFT tracking process are
- a live image stream from a camera
- data (produced by the training tools) about the features of the tracked surface
- a digital image of the tracked surface itself
This section will help you in producing the second of these, the trained data sets.
Surface training uses a set of utilities included in the ARToolKit NFT package. These utilities must be run from the command line. On windows, this means you must open a “cmd” console and cd to the ARToolKitNFT\bin directory. On Unix systems (Linux and Mac OS X) open a terminal window and cd to the ARToolKitNFT/bin directory.
ARToolKit NFT for iOS uses a new simplified training procedure in which the steps 1-3 and 5 below are combined into a single utility application, genTexData. Click here to read the usage instructions for genTexData
1. Create an image set
In the first step, the source image is resampled at multiple resolutions, generating an image set (.iset) file. This contains raw uncompressed data which will be loaded into the app at runtime for tracking.
Run genImageSet.exe providing the image as command line argument. E.g.: Windows: genImageSet.exe mycoolimage.jpg Linux / Mac OS X: ./genImageSet mycoolimage.jpg
You will be prompted for the resolutions you wish to use. (See the preceding section for advice on how to choose a good set of resolutions to use.) For each resolution, enter the value using the keyboard, then press return. The system will then prompt again for the next resolution. When all resolutions have been entered, just press return to end and move onto the image set generation.
Once the image set has been generated, the various image resolutions will be displayed on screen (shrunk/zoomed as necessary to fit on screen). Press spacebar to view the images, or esc when you're done.
2. Train the system to image features
In this step, the system trains itself to the features of the image at the various resolutions. This is the most time-consuming step in the process, and may take up to an hour for larger images with multiple resolutions. The output of this step is a set of featuremap (.fmap-xx) files.
Run genFeatureMap.exe providing the imageset as command line argument. E,g,: Windows: genFeatureMap.exe mycoolimage.iset Linux / Mac OS X: ./ genFeatureMap mycoolimage.iset
3. Combine trained features into a set
In this step, a configuration file is generated combining the feature maps generated in the previous step. The output of this step is a feature set (.fset) file.
Run genFeatureSet.exe providing the imageset as command line argument. E.g.: Windows: genFeatureSet.exe mycoolimage.iset Linux / Mac OS X: ./ genFeatureSet mycoolimage.iset
This application selects and saves good features for tracking. The result is saved in filename.fset. The output window displays features extracted from different image sizes. All selected features are shown inside red squares. Press space to view next image sizes.
4. write a config.dat file to specify position, orientation and scale of image's coordinate system
In this step, a config.dat file is created in a text editor. This file specifies the number of images per coordinate system, (usually one, although if using NFT images on a cube or paddle, may be more than one), their image sets, and the transformation between the image and the coordinate system used for graphics to be overlaid.
The file format is very simple.
- The first line should be the number of textures to track.
- Then follow groups of lines, one per texture.
- The relative path to the iset file.
- A matrix, specifying the homogenous coordinate transform (HCT) matrix to apply to go from the image coordinates to world (graphics overlay) coordinates.
A config file for one image set can be made in any text editor by copying the text below:
1 mycoolimage.iset 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
This will use the image set, feature maps, and feature set you have just generated.
If you wish to move these files from the bin directory, be sure to edit the pathnames in config.dat. Look at the pinball sample for an example.
5. Interactively generate KPM template files
As well as the feature set and feature map training, an additional key point matching (KPM) data set must be interactively generated.
Before you begin
- Connect your webcam.
- Work out any webcam configuration required. Generally, if your webcam produces images greater than 800x600 resolution, it is recommended to either use the dialog box (where applicable) or set an ARToolKit video configuration environment variable to choose a resolution no greater than 800x600. A resolution of 640x480 is perfectly acceptable for NFT, and the greater frame rate achievable by using this resolution rather than a higher resolution is of more advantage than a larger frame size. You can see an example of setting the environment variable to adjust the webcam resolution in the example below.
- Carefully calibrate your camera (see the procedure here) and copy the calibration data into the
ARToolKitNFT/bin/Data
directory, overwriting the filecamera_para.dat
- Obtain a physical print of the surface you intend to use for tracking. See #Physical print properties above.
Launching the makeKpmTemplate tool
Like the other NFT toolkit tools, the makeKpmTemplate tool is a utility which should be run from a command line window (cmd on Windows, terminal on Mac OS X/Linux).
Open a command line window, and navigate to the bin directory of your ARToolKitNFT install.
After setting any required video configuration, launch the makeKpmTemplate tool:
Windows:makeKpmTemplate.exeMac OS X/Linux:
./makeKpmTemplate
Entering image size
The first prompt required by the tool is the size (in millimetres) of the area to be tracked. You can measure this with a rule, or if you printed at 1:1 scale from digital artwork at a known resolution (dpi), you can use the values from that artwork.
See #Producing a digital image to be supplied to the training tools above.
Type the values in at the terminal prompt. You can enter decimal values (numbers with a '.').
Once you have entered both width and height, the program will load your camera calibration data and open the video window. It is worth emphasising here: from this point on, it is important that you are using a calibrated camera. If you are not, the data produced won't show errors, but it potentially won't work with other cameras, even if those cameras are calibrated.
Capturing images
In the next few steps, you will acquire images of your tracked surface with your webcam, and identify the corners of it. As much as possible, you should aim to capture the images of the surface from the same distances and angles as will later occur when using the finished data. This will help the tracking software perform better even when lighting, camera focus, and surface properties of the printed image are not ideal.
Place your image flat on a neutral-coloured surface, and point the webcam at it, using a typical distance and angle as you would later use for tracking.
The program's main window shows the live video and/or captured video in the left-half, and in the right half, up to four captured and trained images.
Aim the webcam so that you have the whole surface in the camera frame. Feature points in the image are marked with green "X" figures. The number of these points is shown (in green writing) at the top of the window, along with a value called the "Harris threshold". The Harris threshold is a value that when higher, selects fewer better-quality feature points. The maximum number of feature points per frame is 2000, but you should aim to have half or less than this number. Use the '1' and '2' keys on the keyboard to increase or decrease the Harris threshold value until the number of points is low enough.
You can immediately see a number of things about the image used:
- Flat areas with no texture provide no features to track. You should use source material with plenty of edges and fine surface detail.
- Blurry areas (such as the blurry face at the bottom of the printed image) are also poor areas for tracking.
- Some areas with fine detail but low contrast will be quite "noisy", with the green crosses flickering in and out; make sure you're doing this procedure in a well-lit area, and with a nice still webcam (use a tripod if available), and adjust the Harris threshold to produce the least amount of flickering from the largest number of points.
When you are happy with the camera image, click the left mouse button. The current video frame is captured and frozen. If you are not happy with the capture, you can press the right mouse button to return to the interactive display.
Clicking right mouse from the interactive capture display quits the program (at this stage, without saving any data thus far captured).
Marking the corners
The next phase requires you to carefully and precisely click on the corners of the rectangular area to be tracked. This tells the program which parts of the video image contains meaningful data and which parts (outside the rectangle) are irrelevant background texture.
Move the mouse to the top-left of your image, and click the left mouse button once. Aim as close as possible to the corner. If it hard to precisely identify the corner, aim one or two pixels inside the corner rather than the outside. A blue cross will appear at the point clicked.
In the console window, you will see the coordinates of the clicked point, and a prompt to click the next corner.
It is critical that the corners are identified in the exact same order every time. The correct order begins with the top-left, and proceeds anti-clockwise to bottom-left, bottom-right, and finally top-right.
Top / bottom / left / right refers to the actual printed image, not the position onscreen. It might help to write "top-left" etc. on the actual printed image (outside the border of course!) so that you don't inadvertently make a mistake when the onscreen image is rotated.
If at any stage you are unhappy with a corner placement, you can click the right mouse button to cancel all corners placed so far and return to the capture screen.
Entering the page number
Once the last corner has been identified, the data inside the rectangular area you have identified will be processed. If the data is internally-consistent, you will be prompted to enter a page number. If the data is not internally consistent (e.g. the corners you have clicked cannot be mapped with sufficient accuracy to a plane, or your camera calibration data is vastly different from the camera in use) then the data set will not be added, and you will be returned to the capture screen. Try again.
The page number must be the same for all images from the same printed page. So if producing a KPM dataset for use with an application which tracks only one printed page, you would enter 0 each time you finish capturing an image, and when you have finished, save the resulting dataset.
If you are training pages for a multi-page book, e.g. for use with the mrDemo application, you should enter a different page number corresponding to the number of the printed page you are training, beginning with 0.
E.g., suppose you were training 3 pages for a multi-page book, and for each page, you want to capture from 6 different camera angles. To do this, you would run makeKpmTemplate 3 times (once for each printed page).
- During the first run, you would capture six images, and enter the page number 0 for all six images, and then save the dataset.
- Then you would swap the printed page and begin the second run, this time entering 1 for the page number for each six images.
- Finally, you would swap to the third printed page, run makeKpmTemplate a third time, entering 2 for the page number.
Onscreen display of dataset matching
Once the data set has been processed, the display changes. The captured image is placed in the right-hand side of the window, and a live camera image in the left. Lines are drawn between features identified in the live image, and matching features in the saved data set. The lines are green for no match, and red for a good match. So long as 4 good matches can be made, a reference frame will be found and tracking will run -- in this case, a set of red cubes is overlaid over the image, and a green rectangle of A4-paper size (210 mm wide and 297 mm tall) is also drawn.
This view allows you to see how well the data set you've acquired tracks when using actual live camera images. Move the webcam and the printed image around, and you will see that the tracking works better from some angles and distances than others. Try to identify a region where tracking is poor, as indicated in this image:
This would then be a good relative position to acquire another image for a second lot of data in the set.
Adding more images
Once you have had enough of examining the tracking, click the right mouse button to return to the online capture screen. You should now continue to add more images to the dataset. It is recommended that you add at least 6 and up to 10 images to each data set. The more images you add, the more robust the tracking will be, at the expense of speed and data set size in memory when running live.
Be sure to try to get images from a number of orientations and angles.
Once you have mapped out the corners of the second image, it will be added to the second space on the right-hand side of the window. Now you can compare the tracking performance of the two different data sets.
The first four images will be displayed in this way.
Saving the final result
Once you have acquired several good images, it's time to save them. Press the s key to save the dataset.
In the console window, you will be prompted to enter a filename.
The suffix ".kpm" will be added to the name you enter, and the dataset saved into the resulting filename in the current working directory (usually the same directory as the makeKpmTemplate application).
From here you are ready to use the .kpm file with the other data to run a complete tracking example.
Testing the completed dataset
The easiest means of testing NFT datasets you have trained is to run them using the simpleNFT2 example program. Open a console window and change to the ARToolKitNFT bin directory.
Run simpleNFT2.exe providing the relative path to the datasets as command line argument. E.g, to launch simpleNFT2 with the pinball sample dataset.:
- Windows:
simpleNFT2.exe Data/pinball
- Mac OS X:
./simpleNFT2.app/Contents/MacOS/simpleNFT2 Data/pinball
- Linux:
./simpleNFT2 Data/pinball
The tracking in this application is initialized by the KPM dataset. Once the reference frame is established detected, tracking is switched to feature based mode. Red 3D boxes are drawn on the images. If feature tracking fails, it is changed back to KPM-based tracking and yellow 3D boxes are drawn.
Moving on
Once you have generated a few marker sets, and seen the tracking response, you're ready to gain a deeper understanding of NFT tracking. You can read the reference documentation for more information.









