We have split the implementation into several stages (in order of priority).
The first and most important was the recognition and automatic tagging of the photo with the highest accuracy. We intended to use the information obtained through tags in this way:
- Make a list of inappropriate tags. If an image is tagged by one of them, it cannot be a cover, or even added to the property slideshow.
- Create a list of priority tags for choosing a photo as the main one (cover).
- Set up an image slideshow order based on tag information (e.g., if an image is marked as a bathroom, it should go after a kitchen or living room).
- Automatically apply features based on tags.
- Prepare a self- generated description (as a template-basis for detailed description)
- Discard unacceptable content (blurred photos, 3rd party logos, phones, advertising etc.)
The first step was to collect the proper number of photos to train our model. According to the service provider’s recommendations, it was necessary to collect as many images as possible to get the most accurate result.
For one tag, we needed at least 20 photos (in general, that was no problem for us). Also, it was not necessary to strive for the best quality photos - the sources should be as close as possible to what we work with every day.
The next step was to prepare data for images self-tagging. We used one of the pre-trained services, so we did not have to do the primary manual tagging. After uploading photos and processing them, we got the tag lists for revision.
We received about 20 tags of different types per each image. The most accurate data (probability from 100 to 80% match) included 2-4 tags per image.
We classified the obtained tags by types – primary (property type - living room, bedroom, yard, bathroom; condition - excellent, good, bad, terrible) and secondary (style, materials, interior items, etc.).
Another task was to identify rendered images. Since in our database about 30 % of the catalog are new buildings, it has always been a challenge.
It is assumed that rendered photos evoke less confidence. If the property has just come for sale, except for rendering and visualization, the developer has nothing more to show. But if there is a show house and live property images are available – it is always better to use them).
During initial processing, the service could only determine 32% of renderings, so it had to involve human resources and tag more than 3000 photos manually. After the secondary processing, the accuracy increased to 67% (in the final test we managed to achieve up to 78%. Now, if a photo is marked as rendering (and this item has live photos), we do not put it on the cover, but at the end of the slideshow.
We also managed to detect an image as a layout, plan, screenshots of the location, or even the price list.