New Computer Vision Model (v2.13) with 1,656 new taxa

We released a new computer vision model today. It has 88,517 taxa up from 86,861. This new model (v2.13) was trained on data exported on March 31, 2024.

Here's a graph of the models release schedule since early 2022 (segments extend from data export date to model release date) and how the number of species included in each model has increased over time.

Our goal is to try to attain the same accuracy or improve it while adding more taxa to the model. The graph below shows model accuracy estimates using 1,000 random Research Grade observations in each group not seen during training time. The paired bars below compare average accuracy of model 2.12 with the new model 2.13. Each bar shows the accuracy from Computer Vision alone (dark green) and Computer Vision + Geo (green). Overall the average accuracy of 2.13 is 88.2% (statistically the same as 2.12 at 88.1% - as described here we probably expect ~2% variance all other things being equal among experiments).

Here is a sample of new species added to v2.13:

Publicado el mayo 16, 2024 09:47 TARDE por loarie loarie


Wonderful!! Thank you for more amazing updates to the computer vision.

Publicado por pinefrog hace 2 meses

"Computer Vision + Geo (geen)." ...I assume you mean "green" -- or "light green".

Publicado por astra_the_dragon hace 2 meses

I see many new Chironomid species added. Thanks to all members of the 'non-biting task force' 🤗

Publicado por carnifex hace 2 meses

Is there any way to check if any of my unidentified observations can be identified using this new computer model?

Publicado por stariplativky hace 2 meses

That the Geo adds so little to the result is a bit hard to believe are orignal values as day of month of observation (+- 45days) not used anymore

Publicado por ahospers hace 2 meses

Thanks for the fantastic work and for the transparency with which you release data and results on each modeling iteration. Out of curiosity, is there any promising general-purpose foundational model (not trained specifically on iNat data) that achieves similar performance?

Publicado por radrat hace 2 meses

Do we have a sense of how many species we can cram into this model?

Publicado por mmulqueen hace 2 meses

Great work folks!

Publicado por susanhewitt hace 2 meses

I've noticed that since the new model is out, there are suddenly a lot more Mercenaria mercenaria being CV-identified as Rangia cuneata in the mid-Atlantic coast of the USA. Do these models take feedback?

Publicado por amr_mn hace 2 meses

Nice, some of the spiders I have been working on made it to the new model. Thanks for the regular update 👍

Publicado por ajott hace alrededor de 2 meses

Is there any way to request species be added to the CV? There's a particular recently described species that is very region-specific that could benefit from CV in a very bad way because CV is still erroneously suggesting the European species name for North American observations, despite the proper species already having 400+ RG observations.

Publicado por lothlin hace alrededor de 2 meses

@lothlin I suspect the species is already included in the model if it has 400+ RG obs. What is it?

Publicado por loarie hace alrededor de 2 meses

Macrolepiota macilenta. It is not (as far as I can tell) - the paper was only published recently and I checked and the species wasn't added in this most recent update.

Publicado por lothlin hace alrededor de 2 meses

Looks like Macrolepiota macilenta was created on April 26 of this year so it wouldn't be in v2.13 which was based on a data export from March 31, 2024. It will be included in the next model, v2.14, which is based on a data export made on May 12, 2024.

Publicado por loarie hace alrededor de 2 meses

How many fungi species is there in total please?

Publicado por jonasgruska hace alrededor de 2 meses

v2.13 includes 3,530 fungi taxa

Publicado por loarie hace alrededor de 2 meses

Awesome, thanks Loarie!

Publicado por lothlin hace alrededor de 2 meses

Question, Fragaria × ananassa was originally only in there because it was ranked as a species, not a hybrid. It has just been changed to a hybrid, so is it no longer in the CV?

Publicado por leytonjfreid hace alrededor de 2 meses

correct - hybrids are not included in the model taxonomy

Publicado por loarie hace alrededor de 2 meses

What happened with the cropping change (the light green) that was mentioned in previous update?

Publicado por rudolphous hace alrededor de 2 meses

What is cropping change ? And it it not possible to add model 2.10 ad 2.11 into the same evaluation with the same 1000 photos ?

The last...2% is rather much (The last 2% is the hardest part)
. "Cropping Change" is a slight modification to the way images are prepared before they are sent to the CV model that resulted in an average 2.1% improvement.

Publicado por ahospers hace alrededor de 2 meses

The cropping change resulted from some method improvements to how images are processed before they are sent to the computer vision model that we made between v2.11 and v2.12. We didn't have capacity to make additional method improvements between v2.12 to v2.13, but that cropping improvement is still in place and is reflected in the accuracy of v2.13 (compared to what it would have been had we not made those changes).

The cropping change steps from the fact that the computer vision model needs to examine a square image, which means when dealing with non-square images we have options like squeezing and clipping. Based on some experiments, we made some small changes to this processing pipeline which yielded the ~2% improvement.

We agree there would be certain advantages to using the same 1000 photos to evaluate all models, but we're not currently doing that because of the complexity involved in holding out that test set given the dynamic nature of iNat. There's also significant taxonomic drift between data/model versions which add complexity and is why we're currently focused on just comparing 2 models (the previous model to the new model) rather than trying to track improvements across multiple models - though we agree that would be ideal.

Publicado por loarie hace alrededor de 2 meses

Thanks for the clear explanation.

Publicado por rudolphous hace alrededor de 2 meses

Just curious. Is the next update of the computer vision model a bit delayed?

Publicado por rudolphous hace 6 días

Yes - there was an issue with v2.14 which delayed it - it should be ready soon

Publicado por loarie hace 3 días

Great you fixed the issue 👍👍👍

Publicado por rudolphous hace 3 días

