Bimshow Live 2020 Presentation

Bimshow Live 2020 Presentation
If Data Is The New Oil, Then BIM Is the Combustion Engine...and it's about to seize up!
How Data Science, Artificial Intelligence and Machine Learning can enable quality BIM.

In this article, Ian Yeo kindly shares his Bimshow Live 2020 script on how Data Science, Artifical Intelligence (AI), Machine Learning and Deep Learning can enable quality BIM.

Ian provides real in-depth insight into how he uses coding and data science tools, along with a link to the presentation itself via Slideshare and numerous links to the various resorces and platforms we use at both Operance and Bimsense to create, collate, and provide quality BIM data for the benefit of Operance users.

Reccomended Tools

Part 1 - Data

Global Data

Data and information has always been a big part of our lives, but the ease in generating new information and at which we can access this information is making data ever more important (and in many cases essential);

  • Data is or should be driving decision making in business and government. 
  • The amount of data we are producing has never been greater than it is now. 
  • The forecast for the volume of data that we are due to create this year is 50.5 zetabytes (from, and one zetabyte is 1021 bytes or a 1 with 21 zeros after it. Or 1 trillion Gigabytes.
  • The 50.5 zetabytes from this year alone, is equivalent to all the data that we ever produced before 2016.
  • This all means that the digital universe has doubled every 2 ½ years. It’s predicted that in the future the doubling of data will reduce to every 2 years, so in 2022 we will generate over 100 zetabytes (100 trillion gigabytes).
  • This doubling of data is giving us exponential growth.

This means that if we want to take control of data, there is never a better time to do it and taking control will enable you to gain from the available benefits.

BIM Data

So how does the exponential growth in the data universe directly impact upon us our work and BIM? Firstly our everyday work and our processes are becoming increasingly more digital and increasingly online by using new tools.

As an industry we are finding and generating new sources of information, for example in a relatively short time we have spawned a whole new industry of scanning and modelling existing buildings, which is all based upon gathering millions of geometric data points.

BIM clients want increasingly more data than ever in onerous EIRs. Long gone are the days when the physical building and a couple of ring binders of paperwork were enough to complete a project, we now also handover a whole suite of digital information, often including a digital version (or model representation) of the completed building.

If you carry out repetetive workflows that can be defined in clear steps then it’s almost certain that you can automate it.

Coding can be really beneficial. Coding, when you break it down is just a process of providing a clear sequence of instructions. The instructions are provided in a predefined format and that format is the language. I believe that coding and just simple coding when combined with your expertise can be your superpower!

Part 2 - Our Approach


It’s common now for EIR’s to require KPI reporting on either model graphical quality or data quality. As an example, an EIR requires that a model uses uniclass classifications and requires a percentage of the completeness of objects and spaces with classifications.

On the surface this appears to be an easy problem, but classifications mean different things to different people and to the application that generated the IFC file, it can be a classification in the strict IFC format. Or what about classifications from the the revit COBie plugin or some other bespoke approach? If an EIR isn’t explicit then it could be that all of those are acceptable.

We wanted our solution to be repeatable and automated, so we use Jupyter Notebooks with Python and Pandas;

  1. We produce a Pandas data frame from an IFC file (this does require some adjustments, to make it work, but again, once you have this setup this one process can be reused for all other IFC file work).
  2. Then, by a series of Pandas filters we obtain a list and a count of the number of model objects.
  3. We then look at objects that have IFC Uniclass classifications and combine this with objects that have classifications through a property or attribute (these steps are not simple, it’s not one to jump into as a trial project) and if a designer uses a different method for classification we just add a couple more rules or instructions (and over time we have a robust method of that covers most method for classifying objects). 

We have an approach where we want to be as flexible as possible, we think there are project benefits in letting teams use the efficient processes that they have evolved rather than change, because we want data in a certain way or format.

It’s actually far easier to setup and run a script than to change a bespoke data type or format into a client or end-user-specific format that is provided, that the data being generated using tried and tested techniques is consistent, and reliable. We use this approach with the BIM data for Operance, afterall, why would we want to exclude models, just because they don’t follow a precise (but arbitrary) format?

What this means is that we have an automated approach where we link to a model, run the script and receive an output of the progress of the classifications. Which goes into a report which again is automated. But as this method uses keyword searches to obtain classifications as properties or attributes, it’s also extendable for obtaining other data such as manufacturer. And then extendable again to search for data for say maintainable assets which have a predefined classifications.


There are so many automations that we have that involve COBie, most using pandas within a Jupyter notebook where each sheet of a COBie file is imported as a separate data frame. Some of the automated checks are simple such as checking for and removing duplicates, adding documents for asset types and checking for the correct data formats. One that is slightly more complex involves manufacturers and obtaining contact information.

We take a manufacturer name and use various automated searches to find contact information for each manufacturer, their address, a standard contact email and telephone number. This uses a couple of APIs (to find a business address and telephone number), a bit of web crawling (for a standard contact email address), various checks to validate the data, combining all this information together and then saving it back into a COBie file contact sheet.

This doesn’t always produce the correct information, I would say on average 1 in 20 contacts have to be amended, but that means that the other 19 are fully automated. This includes adding the correct manufacturer contact email address to the individual assets types.

We also have also automated the generation of the COBie documents for maintainable assets, all we ask is that the file name that we receive from the contractors has a reference to the asset model number. The script then looks into the folder with the documents and populates the cobie.documents sheet. With other simple scripts for zones and systems.

The more that we use these automations the more valuable they become, every time we are increasing our return on investment.


We use Solibri as our preferred tool for model checking, from which we export the checks as BCF issues, which allows the project team, with the BIMcollab plugin) to view them in Revit or Archicad. We also provide an Excel export for those that only wish to view a summary of the checks. This is a really useful workflow and an effective way to collaborate.

But, how often do you find that this happens? Say the models just don’t federate correctly (there are various reasons why this doesn’t happen and I’ll not get into the ins and outs of federation) and that the best solution is for the models to always federate correctly. What I will explain is how we manipulate the BCF issues to work collaboratively even if we have different elevations. We want to find solutions to keep the team doing what they are good at.

The different model heights can be solved by establishing the offset error and adjusting the location in Solibri. We can then do the model check and generate all the BCF issues.

The problem comes when the designer with the incorrect model height attempts to view the BCF issues in their model. In the example the models were adjusted by 81.507m which means that the BIMcollab BCF will try to view the issue 81.507m away from the position of the model.

Now BCF issues are just just text files inside various folders within a zip file, and given a .bcf file extension rather than .zip. So with a jupyter notebook running python (and a couple of useful python packages) we have written rules to unzip the bcf file, list and open each file, read the content and then adjust all elevations references by the 81.507m. The content is then saved and repacked into a zip file

All of this is done within a saved jupyter notebook and after linking to the bcf file and providing the height adjustment it’s all fully automated. Of course, once we have this setup it can be extended, to add other useful data to the BCF, for example we have added revit BATIDs of the objects for designers who are not permitted to use the BIMcollab plugin, to enable them to view the objects involved in the issue.


As an industry we are contributing to the exponential increase in data and we we need to use our data more efficiently, but more importantly we need harness the data to gain not just incremental improvements but the step changes that this data can enable

If you want to find out more in your own time, we have provided a webpage where you can find out more and start your journey to turbo charge your BIM engine. This includes some of the best available learning resources (many of which are free) starting from learning the basics at code academy through to learning the cutting edge deep learning techniques from fastai. Plus links to enable you to install all the software.

Imagine if each of you could combine your unique expertise with the power of deep learning, and this expertise is available to other experts,  in no time at all we will have something far more valuable than the sum of the parts, something that will enable us to make real changes to real people.

Thank you for reading this article, my email address is, I welcome your conversation!

Link to Presentation

If Data is the New Oil, BIM is the Combustion Engine: Slideshare

Similar posts

New Interface

The Operance community wanted new light and dark modes, so we built them!

New Release: Keyword Search

Bypass filter searches with the direct keyword search feature.

News: V1.1 is Now Live!

Thanks to Alpha Group feedback, the Beta version is now ready for download.

The latest news, views and updates from Operance.
Start your free trial
Download and register for free, no credit card required.
Full access including two example projects and data to browse!
Press Contacts