Tell Me All You Know and Even More: Advanced Attribute Extraction, Part 2 (Learning Curve AutoCAD Tutorial)

31 Dec, 2007 By: Bill Fane

A page-by-page guide to data extraction using AutoCAD.

It was a surprisingly clear October day, pleasantly warm with a light breeze. Captain LearnCurve and his gorgeous wife were riding in a pedicab through the Hutong neighborhood of Beijing . . .

Wait a minute! Weren't you on special assignment to China in June?

Yes, the June trip was business, but the current trip is a vacation.

The Hutong neighborhood was definitely different from Wangfujing shopping street, where a Rolex is a Rolex and the Captain's hotel was next to St. Joseph's Cathedral. China is indeed a land of contrasts. Anyway, as the pedicab driver wound his way through the confusing maze of narrow, twisty lanes and alleys, the Captain wondered how they would ever extract themselves from the Hutong if something happened to the driver . . .

Extract! That's it! This month's topic!

In last month's "Learning Curve" column, "Tell Me All You Know: Attribute Extraction, Part 1," I started the topic of querying a drawing and extracting the information contained in attributes that are attached to blocks. I showed how this process has been improved significantly in three main steps over the years, until the AutoCAD 2006 version is actually quite useable. In particular, I showed how easy it was to use the Data Extraction wizard to extract the names and their x,y locations in an organization chart.

This month, let's see much more of the magic that is available to us. Open a large, complex drawing, such as c:\Program files\AutoCAD 200n\Sample\Sheet Sets\Manufacturing\VW252-02-0142.dwg, where n is your AutoCAD release number.

Now start the Data Extraction command (Tools | Data Extraction . . . ). We skipped through it pretty quickly last month, taking the defaults in most cases or just making selections without a lot of explanation. This time, let's look at the pages in a little more detail.

Basically, each page covers a specific segment of the extraction process.

Page 1: Action. This page is pretty simple, so I won't bother with an illustration. (Besides, I showed one last month.) It lets us create a new extraction definition or use or edit an existing one.

Page 2: Data source. The significant point to this page is that we can extract the data from the current drawing, a complete sheet set, or all the drawings in a complete folder with or without subfolders, or we can manually select specific objects from within the current drawing, more than one drawing, sheet sets, complete folders, or any multiple combination of all of the foregoing. Whew!

All extracted data ends up in one big extract file. For example, this file could be used to create a single door schedule for a large project that consisted of several multistory buildings, with each floor being shown on separate xrefed drawings.

Page 3: Select objects. Note that this page is not limited to just attribute values and a few properties of them - such as block name, layer, and x,y coordinates - as it was in earlier releases. In fact, that is why it's now called data extraction instead of attribute extraction, as in previous versions.

figure Page 3 of the DataExtraction wizard lets us specify the type of object from which we want to extract data.

Let's start with the Display Options section. The default is to Display All Object Types. We want to shorten the list a bit, so we need to uncheck the square Display All Object Types button and check the round Display Non-Blocks Only button. This selection shortens the list to show just the lines, circles, arcs, and so on. While we're at it, let's uncheck all the object types except Circle.

Wait a minute! I thought this was about attribute extraction! You haven't selected any blocks with attributes!

My point exactly. It's now about data extraction. Click Next to see what I mean.

Page 4: Select properties. This page shows a list of all the possible data that can be extracted from our selected objects.

figure We can select many different types of data to extract from our selected objects.

I have stretched the window in the illustration to show all the possibilities. Pretty impressive, isn't it?

Let's simultaneously look at Column 3 and at the Category Filter window. Initially, all categories are checked, so they all display. Next, uncheck the filter categories one at a time from the top down. As you do so, those items whose entry in the Category column matches the filter name will disappear from the Property list. Try turning various combinations on and off and watch the results.

Three of the categories are pretty obvious. For example, Geometry lists things such as x,y coordinates, radius, and diameter. On the other hand, where did the Drawing properties come from? We'll come back to this topic later. For our present purposes, turn off all filters except Geometry and then turn off all properties except Center X, Center Y, and Diameter.

figure This page of the Data Extraction wizard shows our short list of properties to be extracted.

Click Next to move on.

Page 5: Refine Data. At this point, we have fully defined the raw data to be extracted. Next, let's fine-tune how the output will appear.

figure Page 5 lets us sort and fine-tune the data.

By now, AutoCAD has actually performed the full extraction and is listing all of the resulting data. In our case, it's too much to show in the single window. But we can use the scroll bar on the right-hand edge to scroll up and down the list. Full Preview displays all of the data without the surrounding command options. If you invoke this view, click on the red X in the upper right corner to close it and to return to page 5.

The operation of the three buttons in the lower-left corner should be obvious, except for one minor point; to combine identical rows, every entry in the row must be exactly identical. Two circles with the same Diameter and Center X values but different Center Y values are not identical.

The Sort Columns Options . . . button is used to (surprise!) set the sorting options. It brings up the following dialog box:

figure This dialog box is used to define the sorting order for the columns.

Initially, the box is empty except for the Select column entry. This drop list displays the column names. Click a column name to select it, click either Ascending or Descending from the drop-down list, and then click Add to add it to the sort order list. Note that the precedence of sorting is from the top down, so that in our case the circles will be sorted by diameter. If diameters are duplicated, then each group of duplicates will be sorted by Center X. And if any duplicates remain, the final sorting for each subgroup will be by Center Y. You can add and remove sort specifications and move a selected specification up and down the priority list. Click OK to return to Page 5 of the wizard.

Next, we come to two very subtle fine-tuning specifications.

The first specification is not directly obvious, but it can be inferred from normal Windows dialog box behavior. We can change the left-to-right column order simply by clicking and dragging the column heading name and then dropping it in the new location. In the figure below, I have turned off the Name column and moved the Diameter column to the left of Center X and Center Y. Note also the sort order within the groups.

figure I have revised the column sequence and specified sort orders and priorities for each column.

figure Right-clicking on a column heading name brings up this context menu.
The other fine-tuning specification is so subtle as to be completely invisible, and yet it can be a very powerful and useful one.

Right-click on the Diameter column heading to bring up the context menu at right.

Many of the selections are just alternate ways of accessing some of the settings we have already studied, but there are also a few unique ones.

Once again, all the options aren't obvious until you explore a bit. For example, Insert Totals Footer adds an extra row at the bottom that can display the minimum, the maximum, the sum total, or the average of all the values in the selected column.

Similarly, Insert Formula Column lets us add a column that displays the result of simple arithmetic calculations (+, -, *, and /) on other column values.

Here's another interesting one. Right-click on the Diameter column header and then click on Filter Options. This selection brings up a dialog box (below) that lists all the different diameters of every circle in the drawing!

We can uncheck certain values to have them drop out of our extracted data list, or we can select an option from the Filter Based On . . . drop-down list. This list lets us specify conditions and values so that we only get values greater than, less than, greater than or equal to, less

figure The Filter list lets us narrow the range of our selected output values.
than or equal to, equal to, and not equal to a specified value. Or we can filter for everything between or outside two specified values.

There is a bit of a quirk in the latter two choices; neither "between" nor "outside" includes the filter values. For example, filtering "between 10 and 25" or "outside 10 and 25" will not include values of 10 or 25. To get these items, you must enter something like "between 9.999 and 25.0001" to bracket the desired range.

Note that different filters can be applied simultaneously to different columns, where they act as an and function. For example, we can filter all diameters less than 10 and all Center X values grater than 300. An entry has to meet both conditions to be accepted.

Who would have guessed that all this was hidden under a right-click . . .

Moving right along, click on Next to get to the next page.

Page 6: Choose output. Last month, we saw how our extracted data can be plugged back into the drawing in the form of a standard table object. This page also lets us specify that we want the extracted data written to an external file. These two functions are not mutually exclusive in that we can choose to do both at once.

If we choose the File option, then AutoCAD brings up a standard Windows file dialog. It defaults to a Microsoft Excel spreadsheet format (XLS), but we can also choose Microsoft Access (MDB), a standard comma-separated file (CSV) that can be read by most database or spreadsheet programs, or a simple text (TXT) file.

If we choose the table option, then page 7 brings up the table specification dialog we saw last month. If we do not choose the AutoCAD table option, then the wizard jumps to page 8, which says, "All done!"

Go Back! Go Back!
And now, as threatened earlier, we'll go back and take a look at the Drawing filter on page 4 of the wizard. It seems to show some strange entries unrelated to the drawing objects. Where did they come from?

Here's where most of them originate. While at the Command prompt, click File | Drawing Properties.

This selection brings up a multi-tabbed dialog box.

figure Click Files | Drawing Properties to see and change certain drawing file properties.

The various tabs display relevant information about the file. Some are completed automatically, and others can be filled manually. We can even create custom fields such as client information, vendor, price, and inventory location.

If we select one of these properties then its current value will appear in an appropriate column within each extracted row.

So what can we do with this property information? Well, the possibilities are pretty much endless. For example, extract a block name and file name from a set of drawings or even several folders, and bingo! We have a where-used list for a standard detail block. Here's another one: extract the total editing time for a title-block insertion from a set of drawings, include a total footer, and you have your billing time for those drawings.

The Gnarly Bit
The Data Extraction command seems to have just one gnarly flaw. It extracts the desired data from all specified data in the drawing or drawings. In spite of all the other filtering that can be done, there doesn't seem to be a way to filter for model space versus paper space layouts, nor even any way to indicate where the objects reside. This inability is strange, because when AutoLISP is used to retrieve object data, one of the returned flags indicates model space or paper space.

Ah yes, data extraction has come a long way from attribute extraction.

And Now for Something Completely Different . . .
When traveling through China, don't buy from the street vendors. If you do, look out for counterfeits. The $5 "Rolex" watches are probably fakes because they have electronic movements. A genuine Rolex Oyster is a mechanical, self-winding watch. On the other hand, a $12 mechanical, self-winding Rolex Oyster is also probably a fake, because genuine Rolex watches are only sold through authorized dealers.

Also, when buying from street vendors, always carry small-denomination bills and after you have haggled the price down from $150 to $12 (yes, that is a typical range), then pay for it using bills as close to the exact price as possible. Street vendors have been known to give counterfeit change for larger denominations.

More News and Resources from Cadalyst Partners

For Mold Designers! Cadalyst has an area of our site focused on technologies and resources specific to the mold design professional. Sponsored by Siemens NX.  Visit the Equipped Mold Designer here!

For Architects! Cadalyst has an area of our site focused on technologies and resources specific to the building design professional. Sponsored by HP.  Visit the Equipped Architect here!