Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Refactor artifacts detection + add sturation detection#4297

Open
samuelgarcia wants to merge 11 commits intoSpikeInterface:mainfrom
samuelgarcia:artifact_detector_refactor
Open

Refactor artifacts detection + add sturation detection#4297
samuelgarcia wants to merge 11 commits intoSpikeInterface:mainfrom
samuelgarcia:artifact_detector_refactor

Conversation

Copy link
Member

samuelgarcia commented Jan 6, 2026 *
edited
Loading

@JoeZiminski @oliche

This refactor the artifact detection and blanking

  • with a main multi method function : detect_artifact_periods()
  • return periods with the new base_period_dtype instead list of list
  • SilencedPeriodsRecording also use this new dtype
  • DetectArtifactAndSilentPeriodsRecording (replace SilencedArtifactsRecording) do detect and silence at the same
    time to be used in preprocess dict (@chrishalcrow this is for you)
  • refactor the method of Pierre based on envleope
  • incorporate the method of @oliche and @JoeZiminski to detect saturation

Could also be done in this PR:

  • add a margin_ms option in SilentPeriodsRecording to ewxtend the zeroing with a mragin : do we need this ?

To be done in 2 futures PR:

  • handle margin and taper on border of SilentPeriod (more tricky that it appear when period is at border...)
  • rewrite RemoveArtifactsRecording based on SilencedPeriodsRecording and the new dtype

chrishalcrow reacted with heart emoji
alejoe91 added the preprocessing Related to preprocessing module label Jan 6, 2026
alejoe91 changed the title Refactor aftifacts detection. Refactor artifacts detection Jan 7, 2026
Copy link
Collaborator

yger commented Jan 12, 2026

@samuelgarcia do not forget, in this PR, to change DetectThresholdCrossing(recording, ...) to DetectThresholdCrossing(envelope, ...) in the detec_period_artefacts_by_envelope

JoeZiminski mentioned this pull request Jan 15, 2026
Copy link
Member Author

samuelgarcia commented Jan 21, 2026

Hi @oliche @JoeZiminski
I copy paste and clean a bit your PR #4301 into this one.
Have a look and tell if this is OK.

I think we do not need to scaled traces at each chunk we could only could the unscaled threhold in the init and then we could avoid the costly convertion to float32 or float64.

Also I put the unit in uV to be more SI friendly.

samuelgarcia changed the title Refactor artifacts detection Refactor artifacts detection + add sturation detection Jan 21, 2026
samuelgarcia marked this pull request as ready for review January 21, 2026 13:21
Copy link
Member Author

samuelgarcia commented Jan 21, 2026

@alejoe91 @chrishalcrow ready to review on my side

yger reviewed Jan 21, 2026
samuelgarcia commented Jan 22, 2026
self, recording, periods=artifacts, mode=mode, noise_levels=noise_levels, seed=seed, **noise_levels_kwargs
)
# note self._kwargs["periods"] is done by SilencedPeriodsRecording and so the computaion is done once

Copy link
Member Author

samuelgarcia Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add detect_artifact_method detect_artifact_kwargs

Comment on lines +36 to +37
* 'saturation' using detect_artifact_periods_by_envelope()
* 'envelope' using detect_saturation_periods()
Copy link
Member

chrishalcrow Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* 'saturation' using detect_artifact_periods_by_envelope()
* 'envelope' using detect_saturation_periods()
* 'saturation' using detect_saturation_periods()
* 'envelope' using detect_artifact_periods_by_envelope()

Comment on lines +6 to +7
# from spikeinterface.core.core_tools import define_function_handling_dict_from_class
# from spikeinterface.preprocessing.silence_periods import SilencedPeriodsRecording
Copy link
Member

chrishalcrow Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# from spikeinterface.core.core_tools import define_function_handling_dict_from_class
# from spikeinterface.preprocessing.silence_periods import SilencedPeriodsRecording

Copy link
Member

chrishalcrow commented Jan 22, 2026

Hello, trying this out on real data with saturation. I get this output

artifact_periods=array([(0, 637721, 638769), (0, 638955, 639206), (0, 923219, 923602),
(0, 923605, 923606), (0, 923607, 923835)],

Are these periods inclusive? And if they are, should the last pair be merged?

alejoe91 reacted with thumbs up emoji


def detect_saturation_periods(
recording,
saturation_threshold_uV, # 1200 uV
Copy link
Member

chrishalcrow Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it assumed that saturation_threshold_uV is symmetric around 0uV? E.g. you'll get saturation at $\pm 1200$uV? Is this always the case?

Copy link
Member

alejoe91 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's how saturation works, at least for int16. Maybe we can make a symmetric arguments?

Copy link
Collaborator

JoeZiminski Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, @oliche? (we have a plus-minus saturation and voltage derivative threshold?

Copy link
Collaborator

JoeZiminski commented Jan 22, 2026

Hey @samuelgarcia thanks for this! will check this out tomorrow. I accidentally did not push two tidy-up commits to #4301, do you mind if I push to your branch to add some changes? I think it is mostly superficial stuff like docstrings etc.

Copy link
Member Author

samuelgarcia commented Jan 23, 2026

Hello, trying this out on real data with saturation. I get this output

artifact_periods=array([(0, 637721, 638769), (0, 638955, 639206), (0, 923219, 923602),
(0, 923605, 923606), (0, 923607, 923835)],

Are these periods inclusive? And if they are, should the last pair be merged?

Yes, the last pair should be merged.
Let us check. the _collapse_events() is done for this purpose.
@JoeZiminski can you look at this ?

Copy link
Member Author

samuelgarcia commented Jan 23, 2026

Hey @samuelgarcia thanks for this! will check this out tomorrow. I accidentally did not push two tidy-up commits to #4301, do you mind if I push to your branch to add some changes? I think it is mostly superficial stuff like docstrings etc.

yes of course. push anything you want

Copy link
Collaborator

JoeZiminski commented Jan 30, 2026

Hey @samuelgarcia thanks for this, I added the content of those commits I missed. I ran into some issues with the (now renamed) uV_per_second_threshold . This is a threshold for the first derivative in units uV/s so it can be a very small number (e.g. IBL default in volts, converted to uV is 1e-2; for example, with the test data the threshold is around 0.0004 which goes negative after the round trip to int and back. Please correct me if wrong, the dynamic range of NP probes pm 5mV at int16 the bit resolution is 0.153 mV. I think this might be why @chrishalcrow is seeing those missed samples in his data.

If possible I'd prefer to cast and scale the data to avoid such issues, then we always know we are solid whatever the user inputs. I get 0.047298s to cast, offset and scale 10 million samples so I don't think this will be a bottleneck. WDYT?

max_ = np.max(np.r_[data_seg_1.flatten(), data_seg_2.flatten()])
gain = max_ / 2**15
offset = 0
offset = 50
Copy link
Member Author

samuelgarcia Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoeZiminski : why did you put 50 as offset here ?

Copy link
Collaborator

JoeZiminski Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies forgot to say, just to ensure it is tested (if it was zero and was accidentally not applied in the function, the test would not flag it)

Copy link
Collaborator

JoeZiminski Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that was just a quick test, see update below)

Copy link
Member Author

samuelgarcia commented Feb 2, 2026

Please correct me if wrong, the dynamic range of NP probes pm 5mV at int16 the bit resolution is 0.153 mV. I think this might be why @chrishalcrow is seeing those missed samples in his data.

NP1 is 12bits, so I think that the lsb is 5000 / 2**11 = 2.4uV
Am I wrong ?

Copy link
Collaborator

JoeZiminski commented Feb 3, 2026 *
edited
Loading

Hey @samuelgarcia I didn't realize the bit depth of the ADC is not 16-bit! I see here ADC resolution is given as 10 bits. In any case, the resolution from the gains/offsets used to scale the data won't be able to represent the values used for the uV_per_second_threshold?

samuelgarcia added this to the 0.104.0 milestone Feb 3, 2026
Copy link
Contributor

oliche commented Feb 3, 2026

@JoeZiminski the sample to volts conversion is a probe dependent variable. To get a feel of the complexity of the matter, here are the Neuropixel possible configs for all probes released to date: https://docs.google.com/spreadsheets/d/1UZycmxwwaJGCJ4AknyXrHkDXWOUO24TtAnBbe1mQ9u4/edit?gid=0#gid=0

As you can see there are potentially multiple conversion factors depending on the probe vintage.

It seems the SpikeInterface canon is to leave this up to the user. Our design in https://github.com/int-brain-lab/ibl-neuropixel has been to read it from the metadata, making sure we are compatible with the oldest probe versions we observed at IBL.

Copy link
Collaborator

JoeZiminski commented Feb 3, 2026

Thanks @oliche, in this case @samuelgarcia I think we should definitely scale the data rather than the parameters, as we cannot be sure the gains / offsets used to scale the data will not destroy information when converting float-to-int e.g. uV_per_second_threshold. We could play around with applying the division by fs at another point but it seems a little brittle and not worth the very minor speed-up.

alejoe91 reviewed Feb 5, 2026
# 0.98 is empirically determined as the true saturating point is
# slightly lower than the documented saturation point of the probe
self.saturation_threshold_unscaled = (thresh - offsets) / gains * 0.98
self.uV_per_sec_threshold = (uV_per_sec_threshold - offsets) / gains
Copy link
Member

alejoe91 Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be easy to transform this in uv_per_sample_threshold as:

self.uv_per_sample_threshold = (uV_per_sec_threshold * fs - offsets) / gains

alejoe91 reviewed Feb 5, 2026
self,
recording,
saturation_threshold_uV,
uV_per_sec_threshold,
Copy link
Member

alejoe91 Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to define uV_per_ms_threshold (or uv_per_sample as gfix does). The value in per seconds is very hard to grasp!

Copy link
Collaborator

JoeZiminski commented Feb 6, 2026 *
edited
Loading

Hey @samuelgarcia @alejoe91 thanks for the feedback, I'm running into the below issue calculating the derivative on int16 data. I don't think this is an artefact of the test data but would be good to check, @oliche too.

See below for the test data in float and int. When taking the derivative, information is lost due to overflow.

The above is the data converted to int, each timeseries is the data from one channel. In many instances, the data after the saturation is negative in int space, resulting in overflow

e.g. by definition the saturation point is at the max of the int16 range, 32767. Often, when the saturation goes back to baseline, the datapoint is negative in int space e.g. -10000. When you take the diff e.g. -10000 - 32767 you get overflow and the information is lost. It seems like this would happen in real data as well?

oliche reacted with confused emoji

Copy link
Contributor

oliche commented Feb 7, 2026

@JoeZiminski I think the test data is not realistic. Usually the maximum int recorded on neuropixel is 512 on NP1 and 8192 on NP2 so you wouldn't have the issues above as the max diff would be 8192 * 2 (2 ** 14).

However in general I think it is a poor practice to compute in int16 for this kind of issues and patchy support of the numpy suite. Users may unwittingly create weird situations, exactly like this one: #4175.

Anyways for this specific saturation detection problem, you just have to fix your test data by staying in the maxint ranges above.

Copy link
Collaborator

JoeZiminski commented Feb 7, 2026

Cheers @oliche! That's good to know for NP probes. For other providers like Nexus, run their ADC at 16 bit-depth (e.g. here), in this case it might be a problem? That being said I don't know how often these probes saturate and whether people are likely to run this function for non-NP1 probes. But yes I think the easiest solution to avoid these worries is to perform computation in float.

Copy link
Contributor

oliche commented Feb 8, 2026

If you think you may exceed half the the range of int16 (ie higher than 2 ** 14 or lower than -2 ** 14), then you can check for the max(abs()) values. If they are above or below, you may throw a warning and only perform the absolute saturation detection and skip the one based on the derivative ?

That would prevent too many false positives.

Copy link
Collaborator

JoeZiminski commented Feb 9, 2026

Thanks @oliche, yes in this case I would definitely suggest @samuelgarcia we perform operations in float here to avoid having to worry about the unpredictable bit-depth of the data.

alejoe91 mentioned this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

yger yger left review comments

alejoe91 alejoe91 left review comments

JoeZiminski JoeZiminski left review comments

chrishalcrow chrishalcrow left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

preprocessing Related to preprocessing module

Projects

None yet

Milestone

0.104.0

Development

Successfully merging this pull request may close these issues.

6 participants