Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix false positive detection heuristics#4009

Merged
AyanSinhaMahapatra merged 1 commit intoaboutcode-org:developfrom
alexzurbonsen:fix-regression-false-positive-detection
Apr 16, 2025
Merged

Fix false positive detection heuristics#4009
AyanSinhaMahapatra merged 1 commit intoaboutcode-org:developfrom
alexzurbonsen:fix-regression-false-positive-detection

Conversation

Copy link
Contributor

alexzurbonsen commented Dec 9, 2024 *
edited
Loading

Context

The newly introduced conditions on rule relevance and copyrights for false positive detections, see f9863e61, contain a bug that makes them detect copyrights and rule relevance if matches do not even have a matched_text or the rule doesn't even have a relevance attribute.

Fixes #4005 (at least for the example I am looking at)

Summary

Correct the any and all conditions to correctly detect copyrights and relevance.

Test plan

I reran the scan of the file from #4005

scancode -l /boost/boost/assign/ptr_list_of.hpp --json scancode.json

and get a scancode.json that has the same contents as on v32.2.1.

scancode_fix.json

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled and links the original issue above
  • Tests pass -- look for a green checkbox a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Copy link
Contributor Author

alexzurbonsen commented Dec 9, 2024 *
edited
Loading

@AyanSinhaMahapatra This should fix the issue with the false positive license detection heuristics. #4005

It feels like there should be a test for this. If so, can you help me write one? It seemed a bit more involved and I wanted to get this out quickly.

alexzurbonsen force-pushed the fix-regression-false-positive-detection branch from 7212179 to 35029f3 Compare December 9, 2024 10:54
alexzurbonsen marked this pull request as ready for review December 9, 2024 10:55
Copy link
Contributor Author

alexzurbonsen commented Dec 9, 2024

There is a failing test tests/licensedcode/test_plugin_license_detection.py::test_complicated_license_text_from_ffmpeg It seems that it is relying on the buggy logic. I would be glad for some help on this one @AyanSinhaMahapatra, since I am not sure yet, what the appropriate change is.

AyanSinhaMahapatra reacted with thumbs up emoji

alexzurbonsen force-pushed the fix-regression-false-positive-detection branch from 35029f3 to ee32157 Compare December 10, 2024 17:54
Copy link
Contributor Author

alexzurbonsen commented Dec 10, 2024 *
edited
Loading

Update: I scanned plugin_license/scan/ffmpeg-LICENSE.md with the scancode version from this branch and pasted the result into the expected file. The changes seem reasonable to me.

Copy link
Contributor Author

alexzurbonsen commented Dec 10, 2024

@AyanSinhaMahapatra and @pombredanne Would be great to get your feedback on this one. The bug currently blocking us from updating. It would seem that it basically renders the is_false_positive function useless, since it always returns false, and thus torpedoes false positive detection.

Copy link
Contributor Author

alexzurbonsen commented Mar 17, 2025

Friendly ping @AyanSinhaMahapatra. It has been a few months, would be nice to get your feedback :)

Copy link
Member

AyanSinhaMahapatra commented Mar 17, 2025

@alexzurbonsen apologies for the quite large delay on this and other PR reviews, looking at this now

AyanSinhaMahapatra approved these changes Mar 17, 2025
Copy link
Member

AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexzurbonsen Thanks++ for spotting and fixing this bug, even though we had the right approach, we were not detecting false positives correctly because of this bug.

For now a simple test with a stripped version of the file at https://github.com/steinwurf/boost/blob/ade3189e2c03fd975dbfa667a4f49e98a49d2fdf/boost/assign/ptr_list_of.hpp#L196 (where gpl one word matches are correctly detected as clues) should suffice as a test to protect against any further regressions. Your test expectation updates also correct one such case from

are incompatible with the GPLv2 and v3. We do not know for certain if their
licenses are compatible with the LGPL.

In future we probably need to add one test case per case in the implementation of the is_false_positive function, i.e. better unit tests, which the detection.py lacks I will open a follow up issue on this.

This fixes #4005 and #3270 too (thanks @alok1304 for pointing this out.)

@@ -1,4 +1,41 @@
{
"headers": [
Copy link
Member

AyanSinhaMahapatra Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header portion is not required, we remove this to reduce test expectation updates.

AyanSinhaMahapatra mentioned this pull request Mar 17, 2025
Due to wrong handling of any and all functions license matches are
categorized as having full relevance or copyrights, even if they
do not. This leads to a regression in false positive detection.

Correct the any and all conditions to correctly detect copyrights
and relevance.

Signed-off-by: alexzurbonsen
alexzurbonsen force-pushed the fix-regression-false-positive-detection branch from ee32157 to 9872108 Compare March 17, 2025 19:14
Copy link
Contributor Author

alexzurbonsen commented Mar 17, 2025

Hey @AyanSinhaMahapatra, thanks for the prompt review. I removed the header.

Do you want me to add the regression test against the boost file you mention?

Copy link
Member

AyanSinhaMahapatra commented Apr 16, 2025

@alexzurbonsen we have this tested allright, so no worries. Merging!

AyanSinhaMahapatra merged commit 467449a into aboutcode-org:develop Apr 16, 2025
36 of 38 checks passed
Copy link
Member

AyanSinhaMahapatra commented Apr 16, 2025

Thanks++ for adding this.

alexzurbonsen reacted with heart emoji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

AyanSinhaMahapatra AyanSinhaMahapatra approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Regression: GPL false positive license detections with v32.3.0

2 participants