How To Change Speed Of Audio File
Audio Processing and Remove Silence using Python
Sound Processing Techniques similar Play an Audio, Plot the Audio Signals, Merge and Split up Sound, Change the Frame Rate, Sample Width and Channel, Silence Remove in Audio, Slow downward and Speed upwards audio
Why this tutorial ???
Many people are doing projects like Speech to Text conversion process and they needed some of the Audio Processing Techniques like
- Play an audio
- Plot the Audio Signals
- Merge and Separate Audio Contents
- Slow downward and Speed up the Audio — Speed Changer
- Change the Frame Rate, Channels and Sample Width
- Silence Remove
Fifty-fifty after exploring many articles on Silence Removal and Audio Processing, I couldn't find an commodity that explained in particular, that'south why I am writing this commodity. I hope this article will help you to practise such tasks like Data collection and other works.
Why Python ???
Python is a general purpose programming language. Hence, yous can use the programming language for developing both desktop and web applications. As well, you lot can use Python for developing circuitous scientific and numeric applications.
Python is designed with features to facilitate data analysis and visualization. You can take advantage of the data analysis features of Python to create custom big data solutions without putting extra time and effort. At the aforementioned time, the data visualization libraries and APIs provided by Python help you to visualize and nowadays data in a more than appealing and effective way.
Many Python developers even use Python to accomplish Bogus Intelligence (AI), Motorcar Learning(ML), Deep Learning(DL), Computer Vision(CV) and Natural Language Processing(NLP) tasks.
Requirements and Installation
- Of grade, We demand Python 3.5 or above
- Install Pydub, Moving ridge, Simple Audio and webrtcvad Packages
pip install webrtcvad==two.0.10 wave pydub simpleaudio numpy matplotlib
Permit'southward Start the Audio Manipulation . . . . . .
Listen Sound
People who wants to listen their Sound and play their audio without using tool slike VLC or Windows Media Role player
Create a file named "listenaudio.py" and paste the below contents in that file
# Import packages
from pydub import AudioSegment
from pydub.playback import play # Play
playaudio = AudioSegment.from_file("<Paste your File Name Hither>", format="<File format Eg. WAV>") play(playaudio)
Here is the gist for Listen Audio . . .
Plot Audio Signal
Plotting the Sound Signal makes y'all to visualize the Audio frequency. This will help you to make up one's mind where we tin can cut the sound and where is having silences in the Audio Indicate
# Loading the Libraries
from scipy.io.wavfile import read
import numpy equally np
import matplotlib.pyplot equally plt # Read the Audiofile
samplerate, data = read('6TU5302374.wav')
# Frame rate for the Audio
print(samplerate)
# Duration of the audio in Seconds
duration = len(data)/samplerate
print("Duration of Audio in Seconds", elapsing)
print("Duration of Audio in Minutes", duration/60)
fourth dimension = np.arange(0,duration,1/samplerate)
# Plotting the Graph using Matplotlib
plt.plot(time,information)
plt.xlabel('Time [south]')
plt.ylabel('Amplitude')
plt.title('6TU5302374.wav')
plt.evidence()
Hither is the gist for plotting the Audio Indicate . . . . . .
In the Graph, the horizontal straight lines are the silences in Audio
Separate Audio Files
This helps you to Separate Audio files based on the Duration that y'all set.
Threshold value normally in milliseconds. (one Sec = 1000 milliseconds). By Adjusting the Threshold value in the lawmaking, you lot tin carve up the audio as y'all wish.
Here I am splitting the audio by 10 Seconds.
from pydub import AudioSegment
import os if not os.path.isdir("splitaudio"):
os.mkdir("splitaudio")
audio = AudioSegment.from_file("<filenamewithextension>")
lengthaudio = len(audio)
print("Length of Audio File", lengthaudio)
get-go = 0
# In Milliseconds, this will cut 10 Sec of sound
threshold = 10000
terminate = 0
counter = 0
while start < len(audio):
end += threshold
print(start , end)
chunk = audio[kickoff:end]
filename = f'splitaudio/clamper{counter}.wav'chunk.export(filename, format="wav")
counter +=1
kickoff += threshold
Here is the gist for Separate Audio Files . . .
You can get the Audio files equally chunks in "splitaudio" folder.
Merge Audio File
This helps you to merge audio from different audio files . . .
import os
from pydub import AudioSegment
import glob # if "audio" folder not exists, information technology volition create
if not bone.path.isdir("audio"):
bone.mkdir("audio")
# Take hold of the Audio files in "audio" binder
wavfiles = glob.glob("./audio/*.wav")
print(wavfiles)
# Loopting each file and include in Audio Segment
wavs = [AudioSegment.from_wav(wav) for wav in wavfiles]
combined = wavs[0]
# Appending all the audio file
for wav in wavs[1:]:
combined = combined.suspend(wav)
# Export Merged Audio File
combined.export("Mergedaudio.wav", format="wav")
Hither is the gist for Merge Sound content . . .
You tin view the Merged sound in "Mergedaudio.wav" file
Speed Changer-Slow down and Speed up
Alter the Speed of the Audio — Slow down or Speed Upwards
Create a file named "speedchangeaudio.py" and copy the below content
from pydub import AudioSegmentaudio = AudioSegment.from_file("chunk.wav")
def speed_change(audio, speed):
sound_with_altered_frame_rate = audio._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})filename = 'changed_speed.wav'sound_with_altered_frame_rate.consign(filename, format ="wav")
# To Slow down audio
slow_sound = speed_change(sound, 0.eight) # To Speed up the audio
#fast_sound = speed_change(audio, i.two)
Normal Speed of Every Audio : ane.0. To Slow downwards audio, tweak the range below one.0 and to Speed up the Audio, tweak the range above 1.0
Accommodate the speed as much every bit you desire in "speed_change" function parameter
Here is the gist for Dull down and Speed Upward the Audio
Y'all can come across the Speed changed Audio in "changed_speed.wav"
Adjust the Frame Rate, Channels and Sample Width in Sound
This aid you lot to preprocess the audio file while doing Data Training for "Speech to Text" projects etc . . .
from pydub import AudioSegment sound = AudioSegment.from_file("chunk.wav") print("----------Before Conversion--------")
impress("Frame Rate", sound.frame_rate)
impress("Channel", sound.channels)
print("Sample Width",sound.sample_width) # Change Frame Rate
sound = sound.set_frame_rate(16000) # Change Channel
sound = sound.set_channels(1) # Modify Sample Width
sound = sound.set_sample_width(two) # Export the Sound to get the changed contentsound.export("convertedrate.wav", format ="wav")
Set Frame rate 8KHz as 8000, 16KHz every bit 16000, 44KHz as 44000
Gear up Aqueduct : 1 is Mono and 2 is Stereo
Set up Sample Width
1 : "eight scrap Signed Integer PCM",
2 : "16 scrap Signed Integer PCM",
3 : "32 bit Signed Integer PCM",
4 : "64 bit Signed Integer PCM"
Here is the gist for Irresolute the Frame Rate, Channels and Sample Width
You can see the Frame Charge per unit, Channels and Sample Width of Sound in "convertedrate.wav"
Silence Remove
Hither nosotros will Remove the Silence using Vocalisation Activity Detector(VAD) Algorithm.
Basically the Silence Removal lawmaking reads the sound file and convert into frames and so check VAD to each set of frames using Sliding Window Technique. The Frames having voices are nerveless in seperate list and non-voices(silences) are removed. Hence, all frames which contains voices is in the list are converted into "Audio file".
Create a file named "silenceremove.py" and copy the below contents
import collections
import contextlib
import sys
import wave
import webrtcvad def read_wave(path):
"""Reads a .wav file.
Takes the path, and returns (PCM audio data, sample charge per unit).
"""
with contextlib.endmost(wave.open up(path, 'rb')) as wf:
num_channels = wf.getnchannels()
assert num_channels == 1
sample_width = wf.getsampwidth()
affirm sample_width == ii
sample_rate = wf.getframerate()
assert sample_rate in (8000, 16000, 32000, 48000)
pcm_data = wf.readframes(wf.getnframes())
return pcm_data, sample_rate
def write_wave(path, audio, sample_rate):
"""Writes a .wav file.
Takes path, PCM audio data, and sample charge per unit.
"""
with contextlib.closing(wave.open(path, 'wb')) as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(sample_rate)
wf.writeframes(audio)
class Frame(object):
"""Represents a "frame" of audio information."""
def __init__(self, bytes, timestamp, duration):
cocky.bytes = bytes
self.timestamp = timestamp
self.elapsing = duration
def frame_generator(frame_duration_ms, audio, sample_rate):
"""Generates audio frames from PCM sound data.
Takes the desired frame duration in milliseconds, the PCM data, and
the sample rate.
Yields Frames of the requested duration.
"""
n = int(sample_rate * (frame_duration_ms / 1000.0) * 2)
first = 0
timestamp = 0.0
duration = (float(n) / sample_rate) / 2.0
while offset + n < len(audio):
yield Frame(sound[offset:offset + n], timestamp, elapsing)
timestamp += elapsing
offset += northward
def vad_collector(sample_rate, frame_duration_ms,
padding_duration_ms, vad, frames):
"""Filters out not-voiced audio frames.
Given a webrtcvad.Vad and a source of sound frames, yields merely
the voiced audio.
Uses a padded, sliding window algorithm over the audio frames.
When more than 90% of the frames in the window are voiced (equally
reported past the VAD), the collector triggers and begins yielding
sound frames. So the collector waits until ninety% of the frames in
the window are unvoiced to detrigger.
The window is padded at the front end and dorsum to provide a small
amount of silence or the ancestry/endings of speech around the
voiced frames.
Arguments:
sample_rate - The audio sample rate, in Hz.
frame_duration_ms - The frame elapsing in milliseconds.
padding_duration_ms - The corporeality to pad the window, in milliseconds.
vad - An instance of webrtcvad.Vad.
frames - a source of sound frames (sequence or generator).
Returns: A generator that yields PCM audio data.
"""
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# We utilize a deque for our sliding window/ring buffer.
ring_buffer = collections.deque(maxlen=num_padding_frames)
# We have two states: TRIGGERED and NOTTRIGGERED. We offset in the
# NOTTRIGGERED state.
triggered = Falsevoiced_frames = []
for frame in frames:
is_speech = vad.is_speech(frame.bytes, sample_rate)
sys.stdout.write('i' if is_speech else '0')
if not triggered:
ring_buffer.append((frame, is_speech))
num_voiced = len([f for f, oral communication in ring_buffer if voice communication])
# If we're NOTTRIGGERED and more than 90% of the frames in
# the band buffer are voiced frames, then enter the
# TRIGGERED land.
if num_voiced > 0.9 * ring_buffer.maxlen:
triggered = True
sys.stdout.write('+(%s)' % (ring_buffer[0][0].timestamp,))
# Nosotros want to yield all the audio we see from now until
# we are NOTTRIGGERED, but nosotros have to offset with the
# audio that'due south already in the band buffer.
for f, s in ring_buffer:
voiced_frames.append(f)
ring_buffer.clear()
else:
# We're in the TRIGGERED state, so collect the sound data
# and add it to the band buffer.
voiced_frames.append(frame)
ring_buffer.append((frame, is_speech))
num_unvoiced = len([f for f, speech in ring_buffer if not speech])
# If more than 90% of the frames in the ring buffer are
# unvoiced, then enter NOTTRIGGERED and yield whatever
# audio we've collected.
if num_unvoiced > 0.9 * ring_buffer.maxlen:
sys.stdout.write('-(%s)' % (frame.timestamp + frame.duration))
triggered = False
yield b''.join([f.bytes for f in voiced_frames])
ring_buffer.clear()
voiced_frames = []
if triggered:
sys.stdout.write('-(%southward)' % (frame.timestamp + frame.elapsing))
sys.stdout.write('\due north')
# If we have any leftover voiced audio when we run out of input,
# yield it.
if voiced_frames:
yield b''.join([f.bytes for f in voiced_frames])
def principal(args):
if len(args) != 2:
sys.stderr.write(
'Usage: silenceremove.py <aggressiveness> <path to wav file>\n')
sys.exit(ane)
audio, sample_rate = read_wave(args[ane])
vad = webrtcvad.Vad(int(args[0]))
frames = frame_generator(30, audio, sample_rate)
frames = listing(frames)
segments = vad_collector(sample_rate, 30, 300, vad, frames)
# Segmenting the Vocalization audio and save it in list equally bytes
concataudio = [segment for segment in segments]
joinedaudio = b"".join(concataudio)
write_wave("Non-Silenced-Sound.wav", joinedaudio, sample_rate)
if __name__ == '__main__':
main(sys.argv[ane:])
Set the aggressiveness mode, which is an integer between 0 and 3. 0 is the least ambitious about filtering out non-spoken language, 3 is the most aggressive.
Run the "python silenceremove.py 'aggressiveness' <inputfile.wav>" in command prompt(For Eg. "python silenceremove.py 3 abc.wav").
Hither is the gist for Silence Removal of the Audio . . . . . .
You lot will go non-silenced audio every bit "Non-Silenced-Sound.wav".
If you want to Split the audio using Silence, check this
The Consummate code is uploaded in GitHub
Determination
The article is a summary of how to remove silence in audio file and some audio processing techniques in Python
Thanks,
Bala Murugan North Yard
Source: https://ngbala6.medium.com/audio-processing-and-remove-silence-using-python-a7fe1552007a
Posted by: martinhignisfat.blogspot.com
0 Response to "How To Change Speed Of Audio File"
Post a Comment