-
-
Notifications
You must be signed in to change notification settings - Fork 711
Fix false positive detection heuristics#4009
Fix false positive detection heuristics#4009AyanSinhaMahapatra merged 1 commit intoaboutcode-org:developfrom
Conversation
Context
The newly introduced conditions on rule relevance and copyrights for false positive detections, see f9863e61, contain a bug that makes them detect copyrights and rule relevance if matches do not even have a matched_text or the rule doesn't even have a relevance attribute.
Fixes #4005 (at least for the example I am looking at)
Summary
Correct the any and all conditions to correctly detect copyrights and relevance.
Test plan
I reran the scan of the file from #4005
scancode -l /boost/boost/assign/ptr_list_of.hpp --json scancode.json
and get a scancode.json that has the same contents as on v32.2.1.
Tasks
- Reviewed contribution guidelines
- PR is descriptively titled and links the original issue above
- Tests pass -- look for a green checkbox a few minutes after opening your PR
Run tests locally to check for errors. - Commits are in uniquely-named feature branch and has no merge conflicts
- Updated documentation pages (if applicable)
- Updated CHANGELOG.rst (if applicable)
|
@AyanSinhaMahapatra This should fix the issue with the false positive license detection heuristics. #4005 It feels like there should be a test for this. If so, can you help me write one? It seemed a bit more involved and I wanted to get this out quickly. |
7212179 to
35029f3
Compare
|
There is a failing test |
35029f3 to
ee32157
Compare
|
Update: I scanned |
|
@AyanSinhaMahapatra and @pombredanne Would be great to get your feedback on this one. The bug currently blocking us from updating. It would seem that it basically renders the |
|
Friendly ping @AyanSinhaMahapatra. It has been a few months, would be nice to get your feedback :) |
|
@alexzurbonsen apologies for the quite large delay on this and other PR reviews, looking at this now |
AyanSinhaMahapatra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexzurbonsen Thanks++ for spotting and fixing this bug, even though we had the right approach, we were not detecting false positives correctly because of this bug.
For now a simple test with a stripped version of the file at https://github.com/steinwurf/boost/blob/ade3189e2c03fd975dbfa667a4f49e98a49d2fdf/boost/assign/ptr_list_of.hpp#L196 (where gpl one word matches are correctly detected as clues) should suffice as a test to protect against any further regressions. Your test expectation updates also correct one such case from
scancode-toolkit/tests/licensedcode/data/plugin_license/scan/ffmpeg-LICENSE.md
Lines 100 to 101 in ee32157
| are incompatible with the GPLv2 and v3. We do not know for certain if their | |
| licenses are compatible with the LGPL. |
In future we probably need to add one test case per case in the implementation of the is_false_positive function, i.e. better unit tests, which the detection.py lacks I will open a follow up issue on this.
This fixes #4005 and #3270 too (thanks @alok1304 for pointing this out.)
| @@ -1,4 +1,41 @@ | |||
| { | |||
| "headers": [ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header portion is not required, we remove this to reduce test expectation updates.
categorized as having full relevance or copyrights, even if they
do not. This leads to a regression in false positive detection.
Correct the any and all conditions to correctly detect copyrights
and relevance.
Signed-off-by: alexzurbonsen
ee32157 to
9872108
Compare
|
Hey @AyanSinhaMahapatra, thanks for the prompt review. I removed the header. Do you want me to add the regression test against the boost file you mention? |
|
@alexzurbonsen we have this tested allright, so no worries. Merging! |
467449a
into
aboutcode-org:develop
|
Thanks++ for adding this. |