By analyzing photos, PIGEON can accurately identify the general country and infer the location within a 40km radius of where the image was taken. The model’s impressive performance surpasses that of human players in the popular online game GeoGuessr.
Developed as an image geolocation model derived from their pre-trained CLIP model called StreetCLIP, PIGEON relies on a set of semantic geocells. These geocells represent bounded areas of land, accounting for region-specific details like road signs and infrastructure quality. The model recently participated in a DeepMind competition, triumphing over GeoGuessr’s top-ranked player, Trevor Rainbolt.
One key factor contributing to PIGEON’s performance is its reliance on OpenAI’s CLIP. By leveraging CLIP, which provides exposure to a wide range of images, PIGEON can perceive finer details and achieve greater effectiveness compared to geolocation models trained from scratch or based on ImageNet.
To optimize the accuracy of the model, the research team devoted significant effort to refining the semantic geocells. This approach resolved issues where previous models tended to incorrectly identify locations at sea, even when utilizing CLIP. The geocells were scaled proportionally to population density and respected different levels of administrative boundaries.
Additionally, the researchers developed a loss function that minimizes prediction penalties when the predicted geocell is close to the actual location. They also applied a meta-learning algorithm to enhance position prediction within a given geocell, further improving accuracy.
The PIGEON model demonstrates impressive performance, correctly identifying 92% of countries with a median kilometre error of 44km. This achievement translates to a GeoGuessr score of 4,525. Notably, around 40% of PIGEON’s guesses are within a 25km radius of the target, as highlighted in the research paper.
With its remarkable capabilities in location extraction, PIGEON has attracted interest from open-source intelligence platforms intrigued by its geolocation technology. The Stanford researchers foresee broader applications for the model, enabling easier localization of diverse outdoor imagery.