Initial working version

This commit is contained in:
Samuel Kent
2022-12-22 20:22:22 +11:00
parent ce9675a1cc
commit ced7fa5092
902 changed files with 150252 additions and 0 deletions
+177
View File
@@ -0,0 +1,177 @@
# audio-metadata
[![Build Status](https://travis-ci.org/tmont/audio-metadata.png)](https://travis-ci.org/tmont/audio-metadata)
[![NPM version](https://badge.fury.io/js/audio-metadata.png)](http://badge.fury.io/js/audio-metadata)
This is a tinyish (2.1K gzipped) library to extract metadata from audio files.
Specifically, it can extract [ID3v1](http://en.wikipedia.org/wiki/ID3#ID3v1),
[ID3v2](http://en.wikipedia.org/wiki/ID3#ID3v2) and
[Vorbis comments](http://www.xiph.org/vorbis/doc/v-comment.html)
(i.e. metadata in [OGG containers](http://en.wikipedia.org/wiki/Ogg)).
Licensed under the [WTFPL](http://www.wtfpl.net/).
## What is this good for?
The purpose of this library is to be very fast and small. It's suitable
for server-side or client-side. Really any platform that supports
`ArrayBuffer` and its ilk (`Uint8Array`, etc.).
I wrote it because the other libraries were large and very robust; I just
needed something that could extract the metadata out without requiring
30KB of JavaScript. `audio-metadata.min.js` comes in at 6.1K/2.1K
minified/gzipped.
To accomplish the small size and speed, it sacrifices several things.
1. It's very naive. For example, the OGG format stipulates that the comment
header must come second, after the identification header. This library
assumes that's always true and ignores the header type byte.
2. Text encoding is for losers. ID3v2 in particular has a lot of flexibility in
terms of the encoding of text for ID3 frames. This library will handle UTF8
properly, but everything else is just spit out as ASCII.
3. It assumes that ID3v2 tags are always the very first thing in the file (as they
should be). The spec is mum on whether that's ''required'', but this library
assumes it is.
4. ID3v1.1 (extended tags with "TAG+") are not supported; Wikipedia suggests they
aren't really well-supported in media players anyway.
As such, the code is a bit abstruse, in that you'll see some magic numbers, like
`offset += 94` where it's ignoring a bunch of header data to get to the good stuff.
Don't judge me based on this code. It works and it's tested; it's just hard to
read.
Of course, since this isn't an actual parser, invalid files will also work. This
means, for example, you could only read the first couple hundred bytes of an MP3
file and still extract the metadata from it, rather than requiring actual valid
MP3 data.
## Usage
The library operates solely on `ArrayBuffer`s, or `Buffer`s for Node's convenience.
So you'll need to preload your audio data before using this library.
The library defines three methods:
```javascript
// extract comments from OGG container
AudioMetaData.ogg(buffer)
// extract ID3v2 tags
AudioMetaData.id3v2(buffer);
// extract ID3v1 tags
AudioMetaData.id3v1(buffer);
```
The result is an object with the metadata. It attempts to normalize common keys:
* ''title'': (`TIT1` and `TIT2` in id3v2)
* ''artist'' (`TSE1` in id3v2)
* ''composer'' (`TCOM` in id3v2)
* ''album'' (`TALB` in id3v2)
* ''track'' (`TRCK` in id3v2, commonly `TRACKNUMBER` in vorbis comments)
* ''year'' (`TDRC` (date recorded) is used in id3v2)
* ''encoder'' (`TSSE` in id3v2)
* ''genre'' (`TCON` in id3v2)
Everything else will be keyed by its original name. For id3v2,
anything that is not a text identifier (i.e. a frame that starts with a
"T") is ignored. This includes comments (`COMM`).
### Node
Install it using NPM: `npm install audio-metadata` or `npm install -g audio-metadata`
if you want to use it from the shell.
```javascript
var audioMetaData = require('audio-metadata'),
fs = require('fs');
var oggData = fs.readFileSync('/path/to/my.ogg');
var metadata = audioMetaData.ogg(oggData);
/*
{
"title": "Contra Base Snippet",
"artist": "Konami",
"album": "Bill and Lance's Excellent Adventure",
"year": "1988",
"tracknumber": "1",
"track": "1",
"encoder": "Lavf53.21.1"
}
*/
```
#### From the Shell
```
Extract metadata from audio files
USAGE
audio-metadata --type <type> [options] file1 [file2...]
OPTIONS
--help,-h
This help
--type,-t <type>
One of "id3v1", "id3v2" or "ogg"
--chunk-size,-c <size>
Read the file in chunks of <size>; default is 512
--quit-after,-q <length>
Stop searching for metadata if nothing is found after
<length> bytes; default is 512
--no-colors,-z
Don't colorize the output
EXAMPLE
Search for metadata in the first 300 bytes in 100 byte increments
audio-metadata -t id3v2 -c 100 -q 300 keepitoffmy.wav
```
### Browser
This library has been tested on current versions of Firefox and Chrome. IE
might work, since it apparently supports `ArrayBuffer`. Safari/Opera are
probably okayish since they're webkit. Your mileage may vary.
Loading `audio-metadata.min.js` will define the `AudioMetadata` global variable.
```html
<script type="text/javascript" src="audio-metadata.min.js"></script>
<script type="text/javascript">
var req = new XMLHttpRequest();
req.open('GET', 'http://example.com/sofine.mp3', true);
req.responseType = 'arraybuffer';
req.onload = function() {
var metadata = AudioMetaData.id3v2(req.response);
/*
{
"TIT2": "Foobar",
"title": "Foobar",
"TPE1": "The Foobars",
"artist": "The Foobars",
"TALB": "FUBAR",
"album": "FUBAR",
"year": "2014",
"TRCK": "9",
"track": "9",
"TSSE": "Lavf53.21.1",
"encoder": "Lavf53.21.1"
}
*/
};
req.send(null);
</script>
```
## Development
```bash
git clone git@github.com:tmont/audio-metadata.js
cd audio-metadata
npm install
npm test
```
There's a "test" (yeah, yeah) for browsers, which you can view
by running `npm start` and then pointing your browser at
[http://localhost:24578/tests/browser/](http://localhost:24578/tests/browser/).
To build the minified browserified file, run `npm run minify`.
File diff suppressed because one or more lines are too long
+140
View File
@@ -0,0 +1,140 @@
#!/usr/bin/env node
var fs = require('fs'),
audioMetadata = require('../'),
util = require('util'),
args = process.argv.slice(2),
type = 'id3v2',
chunkSize = 512,
quitAfter = chunkSize,
colorize = true,
files = [],
i;
function usage() {
console.log('Extract metadata from audio files');
console.log();
console.log('USAGE');
console.log('audio-metadata --type <type> [options] file1 [file2...]');
console.log();
console.log('OPTIONS');
console.log('--help,-h');
console.log(' This help');
console.log('--type,-t <type>');
console.log(' One of "id3v1", "id3v2" or "ogg"');
console.log('--chunk-size,-c <size>');
console.log(' Read the file in chunks of <size>; default is 512');
console.log('--quit-after,-q <length>');
console.log(' Stop searching for metadata if nothing is found after ');
console.log(' <length> bytes; default is 512');
console.log('--no-colors,-z');
console.log(' Don\'t colorize the output');
console.log();
console.log('EXAMPLE');
console.log('Search for metadata in the first 300 bytes in 100 byte increments');
console.log(' audio-metadata -t id3v2 -c 100 -q 300 keepitoffmy.wav');
}
for (i = 0; i < args.length; i++) {
switch (args[i]) {
case '-t':
case '--type':
type = args[++i];
break;
case '-h':
case '--help':
usage();
process.exit(0);
break;
case '-c':
case '--chunk-size':
chunkSize = parseInt(args[++i]);
break;
case '-q':
case '--quit-after':
quitAfter = parseInt(args[++i]);
break;
case '-z':
case '--no-colors':
colorize = false;
break;
default:
files.push(args[i]);
break;
}
}
if (!type) {
console.error('--type is required');
process.exit(1);
}
if (!(type in { ogg: 1, id3v1: 1, id3v2: 1 })) {
console.error('Unrecognized type: ' + type);
process.exit(1);
}
if (!files.length) {
console.error('At least one file must be specified');
process.exit(1);
}
if (isNaN(chunkSize) || chunkSize < 64) {
console.error('Invalid chunk size');
process.exit(1);
}
if (isNaN(quitAfter)) {
console.error('Invalid --quit-after value');
process.exit(1);
}
if (chunkSize > quitAfter) {
console.error('chunk size cannot be greater than quit after value');
process.exit(1);
}
try {
for (i = 0; i < files.length; i++) {
//everything's done synchronously so things are printed in the expected order
var fd = fs.openSync(files[i], 'r'),
buffer = new Buffer(quitAfter),
metadata = null,
offset = 0;
while (!metadata) {
var toRead = offset + chunkSize > quitAfter ? quitAfter - offset : chunkSize;
if (!toRead) {
break;
}
var bytesRead = fs.readSync(fd, buffer, offset, toRead, offset);
if (bytesRead === 0) {
//EOF
break;
}
offset += bytesRead;
metadata = audioMetadata[type](buffer);
}
fs.closeSync(fd);
if (files.length > 1) {
console.log(files[i] + ':');
}
if (metadata) {
if (colorize) {
console.log(util.inspect(metadata, false, null, true));
} else {
console.log(JSON.stringify(metadata, null, ' '));
}
} else {
console.log('no metadata found');
}
console.log();
}
process.exit(0);
} catch (e) {
console.error('An error occurred trying to read from a file');
console.error(' ' + e.message);
process.exit(1);
}
+5
View File
@@ -0,0 +1,5 @@
module.exports = {
ogg: require('./src/ogg'),
id3v1: require('./src/id3v1'),
id3v2: require('./src/id3v2')
};
+76
View File
@@ -0,0 +1,76 @@
{
"_from": "audio-metadata@^0.3.0",
"_id": "audio-metadata@0.3.0",
"_inBundle": false,
"_integrity": "sha1-fVVAMfDCRO4pYjGhpV4A/3iNbOs=",
"_location": "/audio-metadata",
"_phantomChildren": {},
"_requested": {
"type": "range",
"registry": true,
"raw": "audio-metadata@^0.3.0",
"name": "audio-metadata",
"escapedName": "audio-metadata",
"rawSpec": "^0.3.0",
"saveSpec": null,
"fetchSpec": "^0.3.0"
},
"_requiredBy": [
"#USER",
"/"
],
"_resolved": "https://registry.npmjs.org/audio-metadata/-/audio-metadata-0.3.0.tgz",
"_shasum": "7d554031f0c244ee296231a1a55e00ff788d6ceb",
"_spec": "audio-metadata@^0.3.0",
"_where": "C:\\Users\\Maspenguin\\Documents\\Programming\\MasSite",
"author": {
"name": "Tommy Montgomery",
"email": "tmont@tmont.com",
"url": "http://tmont.com/"
},
"bin": {
"audio-metadata": "bin/audio-metadata.js"
},
"bugs": {
"url": "https://github.com/tmont/audio-metadata/issues"
},
"bundleDependencies": false,
"deprecated": false,
"description": "Extract metadata from audio files",
"devDependencies": {
"browserify": "3.19.1",
"mocha": "1.16.2",
"serve": "1.3.0",
"should": "2.1.1",
"uglify-js": "2.4.8"
},
"files": [
"index.js",
"audio-metadata.min.js",
"src",
"bin",
"README.md"
],
"homepage": "https://github.com/tmont/audio-metadata#readme",
"keywords": [
"id3",
"metadata",
"mp3",
"ogg",
"wav",
"audio"
],
"license": "WTFPL",
"name": "audio-metadata",
"repository": {
"type": "git",
"url": "git+https://github.com/tmont/audio-metadata.git"
},
"scripts": {
"build": "browserify -s AudioMetadata -e index.js --bare > audio-metadata.js",
"minify": "npm run build && ./node_modules/.bin/uglifyjs audio-metadata.js > audio-metadata.min.js && rm audio-metadata.js",
"start": "serve -p 24578 .",
"test": "mocha -R spec tests"
},
"version": "0.3.0"
}
+54
View File
@@ -0,0 +1,54 @@
var utils = require('./utils');
function checkMagicId3v1(view) {
var id3Magic = utils.readBytes(view, view.byteLength - 128, 3);
//"TAG"
return id3Magic[0] === 84 && id3Magic[1] === 65 && id3Magic[2] === 71;
}
module.exports = function(buffer) {
//read last 128 bytes
var view = utils.createView(buffer);
if (!checkMagicId3v1(view)) {
return null;
}
function trim(value) {
return value.replace(/[\s\u0000]+$/, '');
}
try {
var offset = view.byteLength - 128 + 3,
readAscii = utils.readAscii;
var title = readAscii(view, offset, 30),
artist = readAscii(view, offset + 30, 30),
album = readAscii(view, offset + 60, 30),
year = readAscii(view, offset + 90, 4);
offset += 94;
var comment = readAscii(view, offset, 28),
track = null;
offset += 28;
if (view.getUint8(offset) === 0) {
//next byte is the track
track = view.getUint8(offset + 1);
} else {
comment += readAscii(view, offset, 2);
}
offset += 2;
var genre = view.getUint8(offset);
return {
title: trim(title),
artist: trim(artist),
album: trim(album),
year: trim(year),
comment: trim(comment),
track: track,
genre: genre
};
} catch (e) {
return null;
}
};
+124
View File
@@ -0,0 +1,124 @@
var utils = require('./utils');
function checkMagicId3(view, offset) {
var id3Magic = utils.readBytes(view, offset, 3);
//"ID3"
return id3Magic[0] === 73 && id3Magic[1] === 68 && id3Magic[2] === 51;
}
function getUint28(view, offset) {
var sizeBytes = utils.readBytes(view, offset, 4);
var mask = 0xfffffff;
return ((sizeBytes[0] & mask) << 21) |
((sizeBytes[1] & mask) << 14) |
((sizeBytes[2] & mask) << 7) |
(sizeBytes[3] & mask);
}
//http://id3.org/id3v2.3.0
//http://id3.org/id3v2.4.0-structure
//http://id3.org/id3v2.4.0-frames
module.exports = function(buffer) {
var view = utils.createView(buffer);
if (!checkMagicId3(view, 0)) {
return null;
}
var offset = 3;
//var majorVersion = view.getUint8(offset);
offset += 2;
var flags = view.getUint8(offset);
offset++;
var size = getUint28(view, offset);
offset += 4;
var extendedHeader = (flags & 128) > 0;
if (extendedHeader) {
offset += getUint28(view, offset);
}
function readFrame(offset) {
try {
var id = utils.readAscii(view, offset, 4);
var size = getUint28(view, offset + 4);
offset += 10; //+2 more for flags we don't care about
if (id[0] !== 'T') {
return {
id: id,
size: size + 10
};
}
var encoding = view.getUint8(offset),
data = '';
if (encoding <= 3) {
offset++;
if (encoding === 3) {
//UTF8 - null terminated
data = utils.readUtf8(view, offset, size - 1);
} else {
//ISO-8859-1, UTF-16, UTF-16BE
//UTF-16 and UTF-16BE are $FF $00 terminated
//ISO is null terminated
//screw these encodings, read it as ascii
data = utils.readAscii(view, offset, size - 1);
}
} else {
//no encoding info, read it as ascii
data = utils.readAscii(view, offset, size);
}
//id3v2.4 is supposed to have encoding terminations, but sometimes
//they don't? meh.
data = utils.trimNull(data);
return {
id: id,
size: size + 10,
content: data
};
} catch (e) {
return null;
}
}
var idMap = {
TALB: 'album',
TCOM: 'composer',
TIT1: 'title',
TIT2: 'title',
TPE1: 'artist',
TRCK: 'track',
TSSE: 'encoder',
TDRC: 'year',
TCON: 'genre'
};
var endOfTags = offset + size,
frames = {};
while (offset < endOfTags) {
var frame = readFrame(offset);
if (!frame) {
break;
}
offset += frame.size;
if (!frame.content) {
continue;
}
var id = idMap[frame.id] || frame.id;
if (id === 'TXXX') {
var nullByte = frame.content.indexOf('\u0000');
id = frame.content.substring(0, nullByte);
frames[id] = frame.content.substring(nullByte + 1);
} else {
frames[id] = frames[frame.id] = frame.content;
}
}
return frames;
};
+79
View File
@@ -0,0 +1,79 @@
var utils = require('./utils');
/**
* See http://www.ietf.org/rfc/rfc3533.txt
* @param {Buffer|ArrayBuffer} buffer
*/
module.exports = function(buffer) {
var view = utils.createView(buffer);
function parsePage(offset, withPacket) {
if (view.byteLength < offset + 27) {
return null;
}
var numPageSegments = view.getUint8(offset + 26),
segmentTable = utils.readBytes(view, offset + 27, numPageSegments),
headerSize = 27 + numPageSegments;
if (!segmentTable.length) {
return null;
}
var
pageSize = headerSize + segmentTable.reduce(function(cur, next) {
return cur + next;
}),
length = headerSize + 1 + 'vorbis'.length,
packetView = null;
if (withPacket) {
packetView = utils.createView(new ArrayBuffer(pageSize - length));
utils.readBytes(view, offset + length, pageSize - length, packetView);
}
return {
pageSize: pageSize,
packet: packetView
};
}
function parseComments(packet) {
try {
var vendorLength = packet.getUint32(0, true),
commentListLength = packet.getUint32(4 + vendorLength, true),
comments = {},
offset = 8 + vendorLength,
map = {
tracknumber: 'track'
};
for (var i = 0; i < commentListLength; i++) {
var commentLength = packet.getUint32(offset, true),
comment = utils.readUtf8(packet, offset + 4, commentLength),
equals = comment.indexOf('='),
key = comment.substring(0, equals).toLowerCase();
comments[map[key] || key] = comments[key] = utils.trimNull(comment.substring(equals + 1));
offset += 4 + commentLength;
}
return comments;
} catch (e) {
//all exceptions are just malformed/truncated data, so we just ignore them
return null;
}
}
var id = parsePage(0);
if (!id) {
return null;
}
var commentHeader = parsePage(id.pageSize, true);
if (!commentHeader) {
return null;
}
return parseComments(commentHeader.packet);
};
+69
View File
@@ -0,0 +1,69 @@
function toArrayBuffer(buffer) {
var arrayBuffer = new ArrayBuffer(buffer.length);
var view = new Uint8Array(arrayBuffer);
for (var i = 0; i < buffer.length; ++i) {
view[i] = buffer[i];
}
return arrayBuffer;
}
module.exports = {
trimNull: function(s) {
return s.replace(/\u0000+$/, '');
},
createView: function(buffer) {
if (typeof(Buffer) !== 'undefined' && buffer instanceof Buffer) {
//convert nodejs buffers to ArrayBuffer
buffer = toArrayBuffer(buffer);
}
if (!(buffer instanceof ArrayBuffer)) {
throw new Error('Expected instance of Buffer or ArrayBuffer');
}
return new DataView(buffer);
},
readBytes: function(view, offset, length, target) {
if (offset + length < 0) {
return [];
}
var bytes = [];
var max = Math.min(offset + length, view.byteLength);
for (var i = offset; i < max; i++) {
var value = view.getUint8(i);
bytes.push(value);
if (target) {
target.setUint8(i - offset, value);
}
}
return bytes;
},
readAscii: function(view, offset, length) {
if (view.byteLength < offset + length) {
return '';
}
var s = '';
for (var i = 0; i < length; i++) {
s += String.fromCharCode(view.getUint8(offset + i));
}
return s;
},
readUtf8: function(view, offset, length) {
if (view.byteLength < offset + length) {
return '';
}
var buffer = view.buffer.slice(offset, offset + length);
//http://stackoverflow.com/a/17192845 - convert byte array to UTF8 string
var encodedString = String.fromCharCode.apply(null, new Uint8Array(buffer));
return decodeURIComponent(escape(encodedString));
}
};