How To Change Speed Of Audio File

Audio Processing and Remove Silence using Python

Sound Processing Techniques similar Play an Audio, Plot the Audio Signals, Merge and Split up Sound, Change the Frame Rate, Sample Width and Channel, Silence Remove in Audio, Slow downward and Speed upwards audio

Prototype Source : https://gsshawnee.org/soundtraining

Why this tutorial ???

Many people are doing projects like Speech to Text conversion process and they needed some of the Audio Processing Techniques like

Play an audio
Plot the Audio Signals
Merge and Separate Audio Contents
Slow downward and Speed up the Audio — Speed Changer
Change the Frame Rate, Channels and Sample Width
Silence Remove

Fifty-fifty after exploring many articles on Silence Removal and Audio Processing, I couldn't find an commodity that explained in particular, that'south why I am writing this commodity. I hope this article will help you to practise such tasks like Data collection and other works.

Why Python ???

Python is a general purpose programming language. Hence, yous can use the programming language for developing both desktop and web applications. As well, you lot can use Python for developing circuitous scientific and numeric applications.

Python is designed with features to facilitate data analysis and visualization. You can take advantage of the data analysis features of Python to create custom big data solutions without putting extra time and effort. At the aforementioned time, the data visualization libraries and APIs provided by Python help you to visualize and nowadays data in a more than appealing and effective way.

Many Python developers even use Python to accomplish Bogus Intelligence (AI), Motorcar Learning(ML), Deep Learning(DL), Computer Vision(CV) and Natural Language Processing(NLP) tasks.

Requirements and Installation

Of grade, We demand Python 3.5 or above
Install Pydub, Moving ridge, Simple Audio and webrtcvad Packages

          pip install webrtcvad==two.0.10 wave pydub simpleaudio numpy matplotlib

Permit'southward Start the Audio Manipulation . . . . . .

Listen Sound

People who wants to listen their Sound and play their audio without using tool slike VLC or Windows Media Role player

Create a file named "listenaudio.py" and paste the below contents in that file

          # Import packages
from pydub import AudioSegment
from pydub.playback import play                    # Play
playaudio = AudioSegment.from_file("<Paste your File Name Hither>", format="<File format Eg. WAV>")          play(playaudio)

Here is the gist for Listen Audio . . .

Heed Audio

Plot Audio Signal

Plotting the Sound Signal makes y'all to visualize the Audio frequency. This will help you to make up one's mind where we tin can cut the sound and where is having silences in the Audio Indicate

                      # Loading the Libraries
                        from            scipy.io.wavfile            import            read
            import            numpy            equally            np
            import            matplotlib.pyplot            equally            plt              # Read the Audiofile
              samplerate, data = read('6TU5302374.wav')
              # Frame rate for the Audio
              print(samplerate)
                          # Duration of the audio in Seconds
              duration = len(data)/samplerate
print("Duration of Audio in Seconds", elapsing)
print("Duration of Audio in Minutes", duration/60)
            fourth dimension = np.arange(0,duration,1/samplerate)
                          # Plotting the Graph using Matplotlib
              plt.plot(time,information)
plt.xlabel('Time [south]')
plt.ylabel('Amplitude')
plt.title('6TU5302374.wav')
plt.evidence()

Hither is the gist for plotting the Audio Indicate . . . . . .

Plot Audio

In the Graph, the horizontal straight lines are the silences in Audio

Separate Audio Files

This helps you to Separate Audio files based on the Duration that y'all set.

Threshold value normally in milliseconds. (one Sec = 1000 milliseconds). By Adjusting the Threshold value in the lawmaking, you lot tin carve up the audio as y'all wish.

Here I am splitting the audio by 10 Seconds.

                      from            pydub            import            AudioSegment
            import            os              if not              os.path.isdir("splitaudio"):
              os.mkdir("splitaudio")
            audio = AudioSegment.from_file("<filenamewithextension>")
lengthaudio = len(audio)
print("Length of Audio File", lengthaudio)
            get-go = 0
              # In Milliseconds, this will cut 10 Sec of sound
              threshold = 10000
terminate = 0
counter = 0
                          while              start < len(audio):
                          end += threshold
                          print(start , end)
                          chunk = audio[kickoff:end]
                          filename =              f'splitaudio/clamper{counter}.wav'chunk.export(filename, format="wav")
                          counter +=1
                          kickoff += threshold

Here is the gist for Separate Audio Files . . .

Split Audio

You can get the Audio files equally chunks in "splitaudio" folder.

Merge Audio File

This helps you to merge audio from different audio files . . .

                      import            os
            from            pydub            import            AudioSegment
            import            glob              # if "audio" folder not exists, information technology volition create
                            if not              bone.path.isdir("audio"):
              bone.mkdir("audio")
                          # Take hold of the Audio files in "audio" binder
              wavfiles  = glob.glob("./audio/*.wav")
print(wavfiles)
                          # Loopting each file and include in Audio Segment
              wavs = [AudioSegment.from_wav(wav)              for              wav              in              wavfiles]
            combined = wavs[0]
                          # Appending all the audio file
                            for              wav              in              wavs[1:]:
              combined = combined.suspend(wav)
                          # Export Merged Audio File
              combined.export("Mergedaudio.wav", format="wav")

Hither is the gist for Merge Sound content . . .

Merge Audio Files

You tin view the Merged sound in "Mergedaudio.wav" file

Speed Changer-Slow down and Speed up

Alter the Speed of the Audio — Slow down or Speed Upwards

Create a file named "speedchangeaudio.py" and copy the below content

                      from            pydub            import            AudioSegmentaudio = AudioSegment.from_file("chunk.wav")
                          def              speed_change(audio, speed):
                              
              sound_with_altered_frame_rate = audio._spawn(sound.raw_data, overrides={
              "frame_rate": int(sound.frame_rate * speed)
              })
                          filename =              'changed_speed.wav'sound_with_altered_frame_rate.consign(filename, format ="wav")
                    # To Slow down audio
slow_sound = speed_change(sound, 0.eight)          # To Speed up the audio            
            #fast_sound = speed_change(audio, i.two)

Normal Speed of Every Audio : ane.0. To Slow downwards audio, tweak the range below one.0 and to Speed up the Audio, tweak the range above 1.0

Accommodate the speed as much every bit you desire in "speed_change" function parameter

Here is the gist for Dull down and Speed Upward the Audio

Speed Alter Sound

Y'all can come across the Speed changed Audio in "changed_speed.wav"

Adjust the Frame Rate, Channels and Sample Width in Sound

This aid you lot to preprocess the audio file while doing Data Training for "Speech to Text" projects etc . . .

          from pydub import AudioSegment          sound = AudioSegment.from_file("chunk.wav")          print("----------Before Conversion--------")
impress("Frame Rate", sound.frame_rate)
impress("Channel", sound.channels)
print("Sample Width",sound.sample_width)          # Change Frame Rate
sound = sound.set_frame_rate(16000)          # Change Channel


sound = sound.set_channels(1)          # Modify Sample Width
sound = sound.set_sample_width(two)          # Export the Sound to get the changed contentsound.export("convertedrate.wav", format ="wav")

Set Frame rate 8KHz as 8000, 16KHz every bit 16000, 44KHz as 44000

Gear up Aqueduct : 1 is Mono and 2 is Stereo

Set up Sample Width

1 : "eight scrap Signed Integer PCM",
2 : "16 scrap Signed Integer PCM",
3 : "32 bit Signed Integer PCM",
4 : "64 bit Signed Integer PCM"

Here is the gist for Irresolute the Frame Rate, Channels and Sample Width

Frame Rate Conversion

You can see the Frame Charge per unit, Channels and Sample Width of Sound in "convertedrate.wav"

Silence Remove

Hither nosotros will Remove the Silence using Vocalisation Activity Detector(VAD) Algorithm.

Basically the Silence Removal lawmaking reads the sound file and convert into frames and so check VAD to each set of frames using Sliding Window Technique. The Frames having voices are nerveless in seperate list and non-voices(silences) are removed. Hence, all frames which contains voices is in the list are converted into "Audio file".

Create a file named "silenceremove.py" and copy the below contents

                      import            collections
            import            contextlib
            import            sys
            import            wave
            import            webrtcvad              def              read_wave(path):
              """Reads a .wav file.
                Takes the path, and returns (PCM audio data, sample charge per unit).
                """
                            with              contextlib.endmost(wave.open up(path,              'rb'))              as              wf:
              num_channels = wf.getnchannels()
              assert              num_channels == 1
              sample_width = wf.getsampwidth()
              affirm              sample_width == ii
              sample_rate = wf.getframerate()
              assert              sample_rate              in              (8000, 16000, 32000, 48000)
              pcm_data = wf.readframes(wf.getnframes())
              return              pcm_data, sample_rate
                          def              write_wave(path, audio, sample_rate):
              """Writes a .wav file.
                Takes path, PCM audio data, and sample charge per unit.
                """
                            with              contextlib.closing(wave.open(path,              'wb'))              as              wf:
              wf.setnchannels(1)
              wf.setsampwidth(2)
              wf.setframerate(sample_rate)
              wf.writeframes(audio)
                          class              Frame(object):
              """Represents a "frame" of audio information."""
                            def              __init__(self, bytes, timestamp, duration):
              cocky.bytes = bytes
              self.timestamp = timestamp
              self.elapsing = duration
                          def              frame_generator(frame_duration_ms, audio, sample_rate):
              """Generates audio frames from PCM sound data.
                Takes the desired frame duration in milliseconds, the PCM data, and
                the sample rate.
                Yields Frames of the requested duration.
                """
              n = int(sample_rate * (frame_duration_ms / 1000.0) * 2)
              first = 0
              timestamp = 0.0
              duration = (float(n) / sample_rate) / 2.0
              while              offset + n < len(audio):
              yield              Frame(sound[offset:offset + n], timestamp, elapsing)
              timestamp += elapsing
              offset += northward
                          def              vad_collector(sample_rate, frame_duration_ms,
              padding_duration_ms, vad, frames):
              """Filters out not-voiced audio frames.
                Given a webrtcvad.Vad and a source of sound frames, yields merely
                the voiced audio.
                Uses a padded, sliding window algorithm over the audio frames.
                When more than 90% of the frames in the window are voiced (equally
                reported past the VAD), the collector triggers and begins yielding
                sound frames. So the collector waits until ninety% of the frames in
                the window are unvoiced to detrigger.
                The window is padded at the front end and dorsum to provide a small
                amount of silence or the ancestry/endings of speech around the
                voiced frames.
                Arguments:
                sample_rate - The audio sample rate, in Hz.
                frame_duration_ms - The frame elapsing in milliseconds.
                padding_duration_ms - The corporeality to pad the window, in milliseconds.
                vad - An instance of webrtcvad.Vad.
                frames - a source of sound frames (sequence or generator).
                Returns: A generator that yields PCM audio data.
                """
              num_padding_frames = int(padding_duration_ms / frame_duration_ms)
              # We utilize a deque for our sliding window/ring buffer.
              ring_buffer = collections.deque(maxlen=num_padding_frames)
              # We have two states: TRIGGERED and NOTTRIGGERED. We offset in the
                # NOTTRIGGERED state.
              triggered =              Falsevoiced_frames = []
              for              frame              in              frames:
              is_speech = vad.is_speech(frame.bytes, sample_rate)
                          sys.stdout.write('i' if              is_speech              else '0')
              if not              triggered:
              ring_buffer.append((frame, is_speech))
              num_voiced = len([f              for              f, oral communication              in              ring_buffer              if              voice communication])
              # If we're NOTTRIGGERED and more than 90% of the frames in
                # the band buffer are voiced frames, then enter the
                # TRIGGERED land.
                            if              num_voiced > 0.9 * ring_buffer.maxlen:
              triggered =              True
              sys.stdout.write('+(%s)'              % (ring_buffer[0][0].timestamp,))
              # Nosotros want to yield all the audio we see from now until
                # we are NOTTRIGGERED, but nosotros have to offset with the
                # audio that'due south already in the band buffer.
                            for              f, s              in              ring_buffer:
              voiced_frames.append(f)
              ring_buffer.clear()
              else:
              # We're in the TRIGGERED state, so collect the sound data
                # and add it to the band buffer.
              voiced_frames.append(frame)
              ring_buffer.append((frame, is_speech))
              num_unvoiced = len([f              for              f, speech              in              ring_buffer              if not              speech])
              # If more than 90% of the frames in the ring buffer are
                # unvoiced, then enter NOTTRIGGERED and yield whatever
                # audio we've collected.
                            if              num_unvoiced > 0.9 * ring_buffer.maxlen:
              sys.stdout.write('-(%s)'              % (frame.timestamp + frame.duration))
              triggered =              False
                yield b''.join([f.bytes              for              f              in              voiced_frames])
              ring_buffer.clear()
              voiced_frames = []
              if              triggered:
              sys.stdout.write('-(%southward)'              % (frame.timestamp + frame.elapsing))
              sys.stdout.write('\due north')
              # If we have any leftover voiced audio when we run out of input,
                # yield it.
                            if              voiced_frames:
              yield b''.join([f.bytes              for              f              in              voiced_frames])
                          def              principal(args):
              if              len(args) != 2:
              sys.stderr.write(
              'Usage: silenceremove.py <aggressiveness> <path to wav file>\n')
              sys.exit(ane)
              audio, sample_rate = read_wave(args[ane])
              vad = webrtcvad.Vad(int(args[0]))
              frames = frame_generator(30, audio, sample_rate)
              frames = listing(frames)
              segments = vad_collector(sample_rate, 30, 300, vad, frames)
                          # Segmenting the Vocalization audio and save it in list equally bytes
              concataudio = [segment              for              segment              in              segments]
                          joinedaudio =              b"".join(concataudio)
                          write_wave("Non-Silenced-Sound.wav", joinedaudio, sample_rate)
                          if              __name__ ==              '__main__':
              main(sys.argv[ane:])

Set the aggressiveness mode, which is an integer between 0 and 3. 0 is the least ambitious about filtering out non-spoken language, 3 is the most aggressive.

Run the "python silenceremove.py 'aggressiveness' <inputfile.wav>" in command prompt(For Eg. "python silenceremove.py 3 abc.wav").

Hither is the gist for Silence Removal of the Audio . . . . . .

You lot will go non-silenced audio every bit "Non-Silenced-Sound.wav".

If you want to Split the audio using Silence, check this

The Consummate code is uploaded in GitHub

Determination

The article is a summary of how to remove silence in audio file and some audio processing techniques in Python

Thanks,

Bala Murugan North Yard

Source: https://ngbala6.medium.com/audio-processing-and-remove-silence-using-python-a7fe1552007a

Posted by: martinhignisfat.blogspot.com

How To Change Speed Of Audio File

Audio Processing and Remove Silence using Python

Sound Processing Techniques similar Play an Audio, Plot the Audio Signals, Merge and Split up Sound, Change the Frame Rate, Sample Width and Channel, Silence Remove in Audio, Slow downward and Speed upwards audio

Why this tutorial ???

Why Python ???

Requirements and Installation

Listen Sound

Plot Audio Signal

Separate Audio Files

Merge Audio File

Speed Changer-Slow down and Speed up

Adjust the Frame Rate, Channels and Sample Width in Sound

Silence Remove

Determination

0 Response to "How To Change Speed Of Audio File"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel

How To Change Speed Of Audio File

Audio Processing and Remove Silence using Python

Sound Processing Techniques similar Play an Audio, Plot the Audio Signals, Merge and Split up Sound, Change the Frame Rate, Sample Width and Channel, Silence Remove in Audio, Slow downward and Speed upwards audio

Why this tutorial ???

Why Python ???

Requirements and Installation

Listen Sound

Plot Audio Signal

Separate Audio Files

Merge Audio File

Speed Changer-Slow down and Speed up

Adjust the Frame Rate, Channels and Sample Width in Sound

Silence Remove

Determination

Related Posts

0 Response to "How To Change Speed Of Audio File"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel