DOSBox-X
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Defines
src/libs/decoders/dr_flac.h
00001 /*
00002 FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
00003 dr_flac - v0.12.13 - 2020-05-16
00004 
00005 David Reid - mackron@gmail.com
00006 
00007 GitHub: https://github.com/mackron/dr_libs
00008 */
00009 
00010 /*
00011 RELEASE NOTES - v0.12.0
00012 =======================
00013 Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
00014 
00015 
00016 Improved Client-Defined Memory Allocation
00017 -----------------------------------------
00018 The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
00019 existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
00020 allocation callbacks are specified.
00021 
00022 To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
00023 
00024     void* my_malloc(size_t sz, void* pUserData)
00025     {
00026         return malloc(sz);
00027     }
00028     void* my_realloc(void* p, size_t sz, void* pUserData)
00029     {
00030         return realloc(p, sz);
00031     }
00032     void my_free(void* p, void* pUserData)
00033     {
00034         free(p);
00035     }
00036 
00037     ...
00038 
00039     drflac_allocation_callbacks allocationCallbacks;
00040     allocationCallbacks.pUserData = &myData;
00041     allocationCallbacks.onMalloc  = my_malloc;
00042     allocationCallbacks.onRealloc = my_realloc;
00043     allocationCallbacks.onFree    = my_free;
00044     drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
00045 
00046 The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
00047 
00048 Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
00049 DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
00050 
00051 Every API that opens a drflac object now takes this extra parameter. These include the following:
00052 
00053     drflac_open()
00054     drflac_open_relaxed()
00055     drflac_open_with_metadata()
00056     drflac_open_with_metadata_relaxed()
00057     drflac_open_file()
00058     drflac_open_file_with_metadata()
00059     drflac_open_memory()
00060     drflac_open_memory_with_metadata()
00061     drflac_open_and_read_pcm_frames_s32()
00062     drflac_open_and_read_pcm_frames_s16()
00063     drflac_open_and_read_pcm_frames_f32()
00064     drflac_open_file_and_read_pcm_frames_s32()
00065     drflac_open_file_and_read_pcm_frames_s16()
00066     drflac_open_file_and_read_pcm_frames_f32()
00067     drflac_open_memory_and_read_pcm_frames_s32()
00068     drflac_open_memory_and_read_pcm_frames_s16()
00069     drflac_open_memory_and_read_pcm_frames_f32()
00070 
00071 
00072 
00073 Optimizations
00074 -------------
00075 Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
00076 improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
00077 advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
00078 means it will be disabled when DR_FLAC_NO_CRC is used.
00079 
00080 The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
00081 particular. 16-bit streams should also see some improvement.
00082 
00083 drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
00084 to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
00085 
00086 A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
00087 channel reconstruction which is the last part of the decoding process.
00088 
00089 The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
00090 compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
00091 compile time and the REV instruction requires ARM architecture version 6.
00092 
00093 An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
00094 
00095 
00096 Removed APIs
00097 ------------
00098 The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
00099 
00100     drflac_read_s32()                   -> drflac_read_pcm_frames_s32()
00101     drflac_read_s16()                   -> drflac_read_pcm_frames_s16()
00102     drflac_read_f32()                   -> drflac_read_pcm_frames_f32()
00103     drflac_seek_to_sample()             -> drflac_seek_to_pcm_frame()
00104     drflac_open_and_decode_s32()        -> drflac_open_and_read_pcm_frames_s32()
00105     drflac_open_and_decode_s16()        -> drflac_open_and_read_pcm_frames_s16()
00106     drflac_open_and_decode_f32()        -> drflac_open_and_read_pcm_frames_f32()
00107     drflac_open_and_decode_file_s32()   -> drflac_open_file_and_read_pcm_frames_s32()
00108     drflac_open_and_decode_file_s16()   -> drflac_open_file_and_read_pcm_frames_s16()
00109     drflac_open_and_decode_file_f32()   -> drflac_open_file_and_read_pcm_frames_f32()
00110     drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
00111     drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
00112     drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
00113 
00114 Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
00115 to the old per-sample APIs. You now need to use the "pcm_frame" versions.
00116 */
00117 
00118 
00119 /*
00120 Introduction
00121 ============
00122 dr_flac is a single file library. To use it, do something like the following in one .c file.
00123 
00124     ```c
00125     #define DR_FLAC_IMPLEMENTATION
00126     #include "dr_flac.h"
00127     ```
00128 
00129 You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
00130 
00131     ```c
00132     drflac* pFlac = drflac_open_file("MySong.flac", NULL);
00133     if (pFlac == NULL) {
00134         // Failed to open FLAC file
00135     }
00136 
00137     drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
00138     drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
00139     ```
00140 
00141 The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
00142 should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
00143 a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
00144 
00145 You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
00146 samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
00147 
00148     ```c
00149     while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
00150         do_something();
00151     }
00152     ```
00153 
00154 You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
00155 
00156 If you just want to quickly decode an entire FLAC file in one go you can do something like this:
00157 
00158     ```c
00159     unsigned int channels;
00160     unsigned int sampleRate;
00161     drflac_uint64 totalPCMFrameCount;
00162     drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
00163     if (pSampleData == NULL) {
00164         // Failed to open and decode FLAC file.
00165     }
00166 
00167     ...
00168 
00169     drflac_free(pSampleData);
00170     ```
00171 
00172 You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
00173 should be considered lossy.
00174 
00175 
00176 If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
00177 The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
00178 reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
00179 
00180 The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
00181 streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
00182     
00183     `drflac_open_relaxed()`
00184     `drflac_open_with_metadata_relaxed()`
00185 
00186 It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
00187 APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
00188 
00189 
00190 
00191 Build Options
00192 =============
00193 #define these options before including this file.
00194 
00195 #define DR_FLAC_NO_STDIO
00196   Disable `drflac_open_file()` and family.
00197 
00198 #define DR_FLAC_NO_OGG
00199   Disables support for Ogg/FLAC streams.
00200 
00201 #define DR_FLAC_BUFFER_SIZE <number>
00202   Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
00203   Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
00204   you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
00205 
00206 #define DR_FLAC_NO_CRC
00207   Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
00208   be used if available. Otherwise the seek will be performed using brute force.
00209 
00210 #define DR_FLAC_NO_SIMD
00211   Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
00212 
00213 
00214 
00215 Notes
00216 =====
00217 - dr_flac does not support changing the sample rate nor channel count mid stream.
00218 - dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
00219 - When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
00220   to differences in corrupted stream recorvery logic between the two APIs.
00221 */
00222 
00223 #ifndef dr_flac_h
00224 #define dr_flac_h
00225 
00226 #ifdef __cplusplus
00227 extern "C" {
00228 #endif
00229 
00230 #define DRFLAC_STRINGIFY(x)      #x
00231 #define DRFLAC_XSTRINGIFY(x)     DRFLAC_STRINGIFY(x)
00232 
00233 #define DRFLAC_VERSION_MAJOR     0
00234 #define DRFLAC_VERSION_MINOR     12
00235 #define DRFLAC_VERSION_REVISION  13
00236 #define DRFLAC_VERSION_STRING    DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
00237 
00238 #include <stddef.h> /* For size_t. */
00239 
00240 /* Sized types. Prefer built-in types. Fall back to stdint. */
00241 #ifdef _MSC_VER
00242     #if defined(__clang__)
00243         #pragma GCC diagnostic push
00244         #pragma GCC diagnostic ignored "-Wlanguage-extension-token"
00245         #pragma GCC diagnostic ignored "-Wlong-long"        
00246         #pragma GCC diagnostic ignored "-Wc++11-long-long"
00247     #endif
00248     typedef   signed __int8  drflac_int8;
00249     typedef unsigned __int8  drflac_uint8;
00250     typedef   signed __int16 drflac_int16;
00251     typedef unsigned __int16 drflac_uint16;
00252     typedef   signed __int32 drflac_int32;
00253     typedef unsigned __int32 drflac_uint32;
00254     typedef   signed __int64 drflac_int64;
00255     typedef unsigned __int64 drflac_uint64;
00256     #if defined(__clang__)
00257         #pragma GCC diagnostic pop
00258     #endif
00259 #else
00260     #include <stdint.h>
00261     typedef int8_t           drflac_int8;
00262     typedef uint8_t          drflac_uint8;
00263     typedef int16_t          drflac_int16;
00264     typedef uint16_t         drflac_uint16;
00265     typedef int32_t          drflac_int32;
00266     typedef uint32_t         drflac_uint32;
00267     typedef int64_t          drflac_int64;
00268     typedef uint64_t         drflac_uint64;
00269 #endif
00270 typedef drflac_uint8         drflac_bool8;
00271 typedef drflac_uint32        drflac_bool32;
00272 #define DRFLAC_TRUE          1
00273 #define DRFLAC_FALSE         0
00274 
00275 #if !defined(DRFLAC_API)
00276     #if defined(DRFLAC_DLL)
00277         #if defined(_WIN32)
00278             #define DRFLAC_DLL_IMPORT  __declspec(dllimport)
00279             #define DRFLAC_DLL_EXPORT  __declspec(dllexport)
00280             #define DRFLAC_DLL_PRIVATE static
00281         #else
00282             #if defined(__GNUC__) && __GNUC__ >= 4
00283                 #define DRFLAC_DLL_IMPORT  __attribute__((visibility("default")))
00284                 #define DRFLAC_DLL_EXPORT  __attribute__((visibility("default")))
00285                 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
00286             #else
00287                 #define DRFLAC_DLL_IMPORT
00288                 #define DRFLAC_DLL_EXPORT
00289                 #define DRFLAC_DLL_PRIVATE static
00290             #endif
00291         #endif
00292 
00293         #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
00294             #define DRFLAC_API  DRFLAC_DLL_EXPORT
00295         #else
00296             #define DRFLAC_API  DRFLAC_DLL_IMPORT
00297         #endif
00298         #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
00299     #else
00300         #define DRFLAC_API extern
00301         #define DRFLAC_PRIVATE static
00302     #endif
00303 #endif
00304 
00305 #if defined(_MSC_VER) && _MSC_VER >= 1700   /* Visual Studio 2012 */
00306     #define DRFLAC_DEPRECATED       __declspec(deprecated)
00307 #elif (defined(__GNUC__) && __GNUC__ >= 4)  /* GCC 4 */
00308     #define DRFLAC_DEPRECATED       __attribute__((deprecated))
00309 #elif defined(__has_feature)                /* Clang */
00310     #if __has_feature(attribute_deprecated)
00311         #define DRFLAC_DEPRECATED   __attribute__((deprecated))
00312     #else
00313         #define DRFLAC_DEPRECATED
00314     #endif
00315 #else
00316     #define DRFLAC_DEPRECATED
00317 #endif
00318 
00319 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
00320 DRFLAC_API const char* drflac_version_string();
00321 
00322 /*
00323 As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
00324 but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
00325 */
00326 #ifndef DR_FLAC_BUFFER_SIZE
00327 #define DR_FLAC_BUFFER_SIZE   4096
00328 #endif
00329 
00330 /* Check if we can enable 64-bit optimizations. */
00331 #if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
00332 #define DRFLAC_64BIT
00333 #endif
00334 
00335 #ifdef DRFLAC_64BIT
00336 typedef drflac_uint64 drflac_cache_t;
00337 #else
00338 typedef drflac_uint32 drflac_cache_t;
00339 #endif
00340 
00341 /* The various metadata block types. */
00342 #define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO       0
00343 #define DRFLAC_METADATA_BLOCK_TYPE_PADDING          1
00344 #define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION      2
00345 #define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE        3
00346 #define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT   4
00347 #define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET         5
00348 #define DRFLAC_METADATA_BLOCK_TYPE_PICTURE          6
00349 #define DRFLAC_METADATA_BLOCK_TYPE_INVALID          127
00350 
00351 /* The various picture types specified in the PICTURE block. */
00352 #define DRFLAC_PICTURE_TYPE_OTHER                   0
00353 #define DRFLAC_PICTURE_TYPE_FILE_ICON               1
00354 #define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON         2
00355 #define DRFLAC_PICTURE_TYPE_COVER_FRONT             3
00356 #define DRFLAC_PICTURE_TYPE_COVER_BACK              4
00357 #define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE            5
00358 #define DRFLAC_PICTURE_TYPE_MEDIA                   6
00359 #define DRFLAC_PICTURE_TYPE_LEAD_ARTIST             7
00360 #define DRFLAC_PICTURE_TYPE_ARTIST                  8
00361 #define DRFLAC_PICTURE_TYPE_CONDUCTOR               9
00362 #define DRFLAC_PICTURE_TYPE_BAND                    10
00363 #define DRFLAC_PICTURE_TYPE_COMPOSER                11
00364 #define DRFLAC_PICTURE_TYPE_LYRICIST                12
00365 #define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION      13
00366 #define DRFLAC_PICTURE_TYPE_DURING_RECORDING        14
00367 #define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE      15
00368 #define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE          16
00369 #define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH     17
00370 #define DRFLAC_PICTURE_TYPE_ILLUSTRATION            18
00371 #define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE           19
00372 #define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE      20
00373 
00374 typedef enum
00375 {
00376     drflac_container_native,
00377     drflac_container_ogg,
00378     drflac_container_unknown
00379 } drflac_container;
00380 
00381 typedef enum
00382 {
00383     drflac_seek_origin_start,
00384     drflac_seek_origin_current
00385 } drflac_seek_origin;
00386 
00387 /* Packing is important on this structure because we map this directly to the raw data within the SEEKTABLE metadata block. */
00388 #pragma pack(2)
00389 typedef struct
00390 {
00391     drflac_uint64 firstPCMFrame;
00392     drflac_uint64 flacFrameOffset;   /* The offset from the first byte of the header of the first frame. */
00393     drflac_uint16 pcmFrameCount;
00394 } drflac_seekpoint;
00395 #pragma pack()
00396 
00397 typedef struct
00398 {
00399     drflac_uint16 minBlockSizeInPCMFrames;
00400     drflac_uint16 maxBlockSizeInPCMFrames;
00401     drflac_uint32 minFrameSizeInPCMFrames;
00402     drflac_uint32 maxFrameSizeInPCMFrames;
00403     drflac_uint32 sampleRate;
00404     drflac_uint8  channels;
00405     drflac_uint8  bitsPerSample;
00406     drflac_uint64 totalPCMFrameCount;
00407     drflac_uint8  md5[16];
00408 } drflac_streaminfo;
00409 
00410 typedef struct
00411 {
00412     /* The metadata type. Use this to know how to interpret the data below. */
00413     drflac_uint32 type;
00414 
00415     /*
00416     A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
00417     not modify the contents of this buffer. Use the structures below for more meaningful and structured
00418     information about the metadata. It's possible for this to be null.
00419     */
00420     const void* pRawData;
00421 
00422     /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
00423     drflac_uint32 rawDataSize;
00424 
00425     union
00426     {
00427         drflac_streaminfo streaminfo;
00428 
00429         struct
00430         {
00431             int unused;
00432         } padding;
00433 
00434         struct
00435         {
00436             drflac_uint32 id;
00437             const void* pData;
00438             drflac_uint32 dataSize;
00439         } application;
00440 
00441         struct
00442         {
00443             drflac_uint32 seekpointCount;
00444             const drflac_seekpoint* pSeekpoints;
00445         } seektable;
00446 
00447         struct
00448         {
00449             drflac_uint32 vendorLength;
00450             const char* vendor;
00451             drflac_uint32 commentCount;
00452             const void* pComments;
00453         } vorbis_comment;
00454 
00455         struct
00456         {
00457             char catalog[128];
00458             drflac_uint64 leadInSampleCount;
00459             drflac_bool32 isCD;
00460             drflac_uint8 trackCount;
00461             const void* pTrackData;
00462         } cuesheet;
00463 
00464         struct
00465         {
00466             drflac_uint32 type;
00467             drflac_uint32 mimeLength;
00468             const char* mime;
00469             drflac_uint32 descriptionLength;
00470             const char* description;
00471             drflac_uint32 width;
00472             drflac_uint32 height;
00473             drflac_uint32 colorDepth;
00474             drflac_uint32 indexColorCount;
00475             drflac_uint32 pictureDataSize;
00476             const drflac_uint8* pPictureData;
00477         } picture;
00478     } data;
00479 } drflac_metadata;
00480 
00481 
00482 /*
00483 Callback for when data needs to be read from the client.
00484 
00485 
00486 Parameters
00487 ----------
00488 pUserData (in)
00489     The user data that was passed to drflac_open() and family.
00490 
00491 pBufferOut (out)
00492     The output buffer.
00493 
00494 bytesToRead (in)
00495     The number of bytes to read.
00496 
00497 
00498 Return Value
00499 ------------
00500 The number of bytes actually read.
00501 
00502 
00503 Remarks
00504 -------
00505 A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
00506 you have reached the end of the stream.
00507 */
00508 typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
00509 
00510 /*
00511 Callback for when data needs to be seeked.
00512 
00513 
00514 Parameters
00515 ----------
00516 pUserData (in)
00517     The user data that was passed to drflac_open() and family.
00518 
00519 offset (in)
00520     The number of bytes to move, relative to the origin. Will never be negative.
00521 
00522 origin (in)
00523     The origin of the seek - the current position or the start of the stream.
00524 
00525 
00526 Return Value
00527 ------------
00528 Whether or not the seek was successful.
00529 
00530 
00531 Remarks
00532 -------
00533 The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
00534 either drflac_seek_origin_start or drflac_seek_origin_current.
00535 
00536 When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
00537 and handled by returning DRFLAC_FALSE.
00538 */
00539 typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
00540 
00541 /*
00542 Callback for when a metadata block is read.
00543 
00544 
00545 Parameters
00546 ----------
00547 pUserData (in)
00548     The user data that was passed to drflac_open() and family.
00549 
00550 pMetadata (in)
00551     A pointer to a structure containing the data of the metadata block.
00552 
00553 
00554 Remarks
00555 -------
00556 Use pMetadata->type to determine which metadata block is being handled and how to read the data.
00557 */
00558 typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
00559 
00560 
00561 typedef struct
00562 {
00563     void* pUserData;
00564     void* (* onMalloc)(size_t sz, void* pUserData);
00565     void* (* onRealloc)(void* p, size_t sz, void* pUserData);
00566     void  (* onFree)(void* p, void* pUserData);
00567 } drflac_allocation_callbacks;
00568 
00569 /* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
00570 typedef struct
00571 {
00572     const drflac_uint8* data;
00573     size_t dataSize;
00574     size_t currentReadPos;
00575 } drflac__memory_stream;
00576 
00577 /* Structure for internal use. Used for bit streaming. */
00578 typedef struct
00579 {
00580     /* The function to call when more data needs to be read. */
00581     drflac_read_proc onRead;
00582 
00583     /* The function to call when the current read position needs to be moved. */
00584     drflac_seek_proc onSeek;
00585 
00586     /* The user data to pass around to onRead and onSeek. */
00587     void* pUserData;
00588 
00589 
00590     /*
00591     The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
00592     stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
00593     or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
00594     */
00595     size_t unalignedByteCount;
00596 
00597     /* The content of the unaligned bytes. */
00598     drflac_cache_t unalignedCache;
00599 
00600     /* The index of the next valid cache line in the "L2" cache. */
00601     drflac_uint32 nextL2Line;
00602 
00603     /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
00604     drflac_uint32 consumedBits;
00605 
00606     /*
00607     The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
00608     Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
00609     */
00610     drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
00611     drflac_cache_t cache;
00612 
00613     /*
00614     CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
00615     is reset to 0 at the beginning of each frame.
00616     */
00617     drflac_uint16 crc16;
00618     drflac_cache_t crc16Cache;              /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
00619     drflac_uint32 crc16CacheIgnoredBytes;   /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
00620 } drflac_bs;
00621 
00622 typedef struct
00623 {
00624     /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
00625     drflac_uint8 subframeType;
00626 
00627     /* The number of wasted bits per sample as specified by the sub-frame header. */
00628     drflac_uint8 wastedBitsPerSample;
00629 
00630     /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
00631     drflac_uint8 lpcOrder;
00632 
00633     /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
00634     drflac_int32* pSamplesS32;
00635 } drflac_subframe;
00636 
00637 typedef struct
00638 {
00639     /*
00640     If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
00641     always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
00642     */
00643     drflac_uint64 pcmFrameNumber;
00644 
00645     /*
00646     If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
00647     is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
00648     */
00649     drflac_uint32 flacFrameNumber;
00650 
00651     /* The sample rate of this frame. */
00652     drflac_uint32 sampleRate;
00653 
00654     /* The number of PCM frames in each sub-frame within this frame. */
00655     drflac_uint16 blockSizeInPCMFrames;
00656 
00657     /*
00658     The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
00659     will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
00660     */
00661     drflac_uint8 channelAssignment;
00662 
00663     /* The number of bits per sample within this frame. */
00664     drflac_uint8 bitsPerSample;
00665 
00666     /* The frame's CRC. */
00667     drflac_uint8 crc8;
00668 } drflac_frame_header;
00669 
00670 typedef struct
00671 {
00672     /* The header. */
00673     drflac_frame_header header;
00674 
00675     /*
00676     The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
00677     this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
00678     */
00679     drflac_uint32 pcmFramesRemaining;
00680 
00681     /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
00682     drflac_subframe subframes[8];
00683 } drflac_frame;
00684 
00685 typedef struct
00686 {
00687     /* The function to call when a metadata block is read. */
00688     drflac_meta_proc onMeta;
00689 
00690     /* The user data posted to the metadata callback function. */
00691     void* pUserDataMD;
00692 
00693     /* Memory allocation callbacks. */
00694     drflac_allocation_callbacks allocationCallbacks;
00695 
00696 
00697     /* The sample rate. Will be set to something like 44100. */
00698     drflac_uint32 sampleRate;
00699 
00700     /*
00701     The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
00702     value specified in the STREAMINFO block.
00703     */
00704     drflac_uint8 channels;
00705 
00706     /* The bits per sample. Will be set to something like 16, 24, etc. */
00707     drflac_uint8 bitsPerSample;
00708 
00709     /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
00710     drflac_uint16 maxBlockSizeInPCMFrames;
00711 
00712     /*
00713     The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
00714     the total PCM frame count is unknown. Likely the case with streams like internet radio.
00715     */
00716     drflac_uint64 totalPCMFrameCount;
00717 
00718 
00719     /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
00720     drflac_container container;
00721 
00722     /* The number of seekpoints in the seektable. */
00723     drflac_uint32 seekpointCount;
00724 
00725 
00726     /* Information about the frame the decoder is currently sitting on. */
00727     drflac_frame currentFLACFrame;
00728 
00729 
00730     /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
00731     drflac_uint64 currentPCMFrame;
00732 
00733     /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
00734     drflac_uint64 firstFLACFramePosInBytes;
00735 
00736 
00737     /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
00738     drflac__memory_stream memoryStream;
00739 
00740 
00741     /* A pointer to the decoded sample data. This is an offset of pExtraData. */
00742     drflac_int32* pDecodedSamples;
00743 
00744     /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
00745     drflac_seekpoint* pSeekpoints;
00746 
00747     /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
00748     void* _oggbs;
00749 
00750     /* Internal use only. Used for profiling and testing different seeking modes. */
00751     drflac_bool32 _noSeekTableSeek    : 1;
00752     drflac_bool32 _noBinarySearchSeek : 1;
00753     drflac_bool32 _noBruteForceSeek   : 1;
00754 
00755     /* The bit streamer. The raw FLAC data is fed through this object. */
00756     drflac_bs bs;
00757 
00758     /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
00759     drflac_uint8 pExtraData[1];
00760 } drflac;
00761 
00762 
00763 /*
00764 Opens a FLAC decoder.
00765 
00766 
00767 Parameters
00768 ----------
00769 onRead (in)
00770     The function to call when data needs to be read from the client.
00771 
00772 onSeek (in)
00773     The function to call when the read position of the client data needs to move.
00774 
00775 pUserData (in, optional)
00776     A pointer to application defined data that will be passed to onRead and onSeek.
00777 
00778 pAllocationCallbacks (in, optional)
00779     A pointer to application defined callbacks for managing memory allocations.
00780 
00781 
00782 Return Value
00783 ------------
00784 Returns a pointer to an object representing the decoder.
00785 
00786 
00787 Remarks
00788 -------
00789 Close the decoder with `drflac_close()`.
00790 
00791 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
00792 
00793 This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
00794 without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
00795 
00796 This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
00797 from a block of memory respectively.
00798 
00799 The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
00800 
00801 
00802 Seek Also
00803 ---------
00804 drflac_open_file()
00805 drflac_open_memory()
00806 drflac_open_with_metadata()
00807 drflac_close()
00808 */
00809 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
00810 
00811 /*
00812 Opens a FLAC stream with relaxed validation of the header block.
00813 
00814 
00815 Parameters
00816 ----------
00817 onRead (in)
00818     The function to call when data needs to be read from the client.
00819 
00820 onSeek (in)
00821     The function to call when the read position of the client data needs to move.
00822 
00823 container (in)
00824     Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
00825 
00826 pUserData (in, optional)
00827     A pointer to application defined data that will be passed to onRead and onSeek.
00828 
00829 pAllocationCallbacks (in, optional)
00830     A pointer to application defined callbacks for managing memory allocations.
00831 
00832 
00833 Return Value
00834 ------------
00835 A pointer to an object representing the decoder.
00836 
00837 
00838 Remarks
00839 -------
00840 The same as drflac_open(), except attempts to open the stream even when a header block is not present.
00841 
00842 Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
00843 as that is for internal use only.
00844 
00845 Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
00846 force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
00847 */
00848 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
00849 
00850 /*
00851 Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
00852 
00853 
00854 Parameters
00855 ----------
00856 onRead (in)
00857     The function to call when data needs to be read from the client.
00858 
00859 onSeek (in)
00860     The function to call when the read position of the client data needs to move.
00861 
00862 onMeta (in)
00863     The function to call for every metadata block.
00864 
00865 pUserData (in, optional)
00866     A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
00867 
00868 pAllocationCallbacks (in, optional)
00869     A pointer to application defined callbacks for managing memory allocations.
00870 
00871 
00872 Return Value
00873 ------------
00874 A pointer to an object representing the decoder.
00875 
00876 
00877 Remarks
00878 -------
00879 Close the decoder with `drflac_close()`.
00880 
00881 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
00882 
00883 This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
00884 metadata block except for STREAMINFO and PADDING blocks.
00885 
00886 The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns.
00887 
00888 The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
00889 
00890 Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
00891 the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
00892 metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
00893 returned depending on whether or not the stream is being opened with metadata.
00894 
00895 
00896 Seek Also
00897 ---------
00898 drflac_open_file_with_metadata()
00899 drflac_open_memory_with_metadata()
00900 drflac_open()
00901 drflac_close()
00902 */
00903 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
00904 
00905 /*
00906 The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
00907 
00908 See Also
00909 --------
00910 drflac_open_with_metadata()
00911 drflac_open_relaxed()
00912 */
00913 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
00914 
00915 /*
00916 Closes the given FLAC decoder.
00917 
00918 
00919 Parameters
00920 ----------
00921 pFlac (in)
00922     The decoder to close.
00923 
00924 
00925 Remarks
00926 -------
00927 This will destroy the decoder object.
00928 
00929 
00930 See Also
00931 --------
00932 drflac_open()
00933 drflac_open_with_metadata()
00934 drflac_open_file()
00935 drflac_open_file_w()
00936 drflac_open_file_with_metadata()
00937 drflac_open_file_with_metadata_w()
00938 drflac_open_memory()
00939 drflac_open_memory_with_metadata()
00940 */
00941 DRFLAC_API void drflac_close(drflac* pFlac);
00942 
00943 
00944 /*
00945 Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
00946 
00947 
00948 Parameters
00949 ----------
00950 pFlac (in)
00951     The decoder.
00952 
00953 framesToRead (in)
00954     The number of PCM frames to read.
00955 
00956 pBufferOut (out, optional)
00957     A pointer to the buffer that will receive the decoded samples.
00958 
00959 
00960 Return Value
00961 ------------
00962 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
00963 
00964 
00965 Remarks
00966 -------
00967 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
00968 */
00969 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
00970 
00971 
00972 /*
00973 Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
00974 
00975 
00976 Parameters
00977 ----------
00978 pFlac (in)
00979     The decoder.
00980 
00981 framesToRead (in)
00982     The number of PCM frames to read.
00983 
00984 pBufferOut (out, optional)
00985     A pointer to the buffer that will receive the decoded samples.
00986 
00987 
00988 Return Value
00989 ------------
00990 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
00991 
00992 
00993 Remarks
00994 -------
00995 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
00996 
00997 Note that this is lossy for streams where the bits per sample is larger than 16.
00998 */
00999 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
01000 
01001 /*
01002 Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
01003 
01004 
01005 Parameters
01006 ----------
01007 pFlac (in)
01008     The decoder.
01009 
01010 framesToRead (in)
01011     The number of PCM frames to read.
01012 
01013 pBufferOut (out, optional)
01014     A pointer to the buffer that will receive the decoded samples.
01015 
01016 
01017 Return Value
01018 ------------
01019 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
01020 
01021 
01022 Remarks
01023 -------
01024 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
01025 
01026 Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
01027 */
01028 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
01029 
01030 /*
01031 Seeks to the PCM frame at the given index.
01032 
01033 
01034 Parameters
01035 ----------
01036 pFlac (in)
01037     The decoder.
01038 
01039 pcmFrameIndex (in)
01040     The index of the PCM frame to seek to. See notes below.
01041 
01042 
01043 Return Value
01044 -------------
01045 `DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
01046 */
01047 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
01048 
01049 
01050 
01051 #ifndef DR_FLAC_NO_STDIO
01052 /*
01053 Opens a FLAC decoder from the file at the given path.
01054 
01055 
01056 Parameters
01057 ----------
01058 pFileName (in)
01059     The path of the file to open, either absolute or relative to the current directory.
01060 
01061 pAllocationCallbacks (in, optional)
01062     A pointer to application defined callbacks for managing memory allocations.
01063 
01064 
01065 Return Value
01066 ------------
01067 A pointer to an object representing the decoder.
01068 
01069 
01070 Remarks
01071 -------
01072 Close the decoder with drflac_close().
01073 
01074 
01075 Remarks
01076 -------
01077 This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
01078 at any given time, so keep this mind if you have many decoders open at the same time.
01079 
01080 
01081 See Also
01082 --------
01083 drflac_open_file_with_metadata()
01084 drflac_open()
01085 drflac_close()
01086 */
01087 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
01088 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
01089 
01090 /*
01091 Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
01092 
01093 
01094 Parameters
01095 ----------
01096 pFileName (in)
01097     The path of the file to open, either absolute or relative to the current directory.
01098 
01099 pAllocationCallbacks (in, optional)
01100     A pointer to application defined callbacks for managing memory allocations.
01101 
01102 onMeta (in)
01103     The callback to fire for each metadata block.
01104 
01105 pUserData (in)
01106     A pointer to the user data to pass to the metadata callback.
01107 
01108 pAllocationCallbacks (in)
01109     A pointer to application defined callbacks for managing memory allocations.
01110 
01111 
01112 Remarks
01113 -------
01114 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
01115 
01116 
01117 See Also
01118 --------
01119 drflac_open_with_metadata()
01120 drflac_open()
01121 drflac_close()
01122 */
01123 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
01124 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
01125 #endif
01126 
01127 /*
01128 Opens a FLAC decoder from a pre-allocated block of memory
01129 
01130 
01131 Parameters
01132 ----------
01133 pData (in)
01134     A pointer to the raw encoded FLAC data.
01135 
01136 dataSize (in)
01137     The size in bytes of `data`.
01138 
01139 pAllocationCallbacks (in)
01140     A pointer to application defined callbacks for managing memory allocations.
01141 
01142 
01143 Return Value
01144 ------------
01145 A pointer to an object representing the decoder.
01146 
01147 
01148 Remarks
01149 -------
01150 This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
01151 
01152 
01153 See Also
01154 --------
01155 drflac_open()
01156 drflac_close()
01157 */
01158 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
01159 
01160 /*
01161 Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
01162 
01163 
01164 Parameters
01165 ----------
01166 pData (in)
01167     A pointer to the raw encoded FLAC data.
01168 
01169 dataSize (in)
01170     The size in bytes of `data`.
01171 
01172 onMeta (in)
01173     The callback to fire for each metadata block.
01174 
01175 pUserData (in)
01176     A pointer to the user data to pass to the metadata callback.
01177 
01178 pAllocationCallbacks (in)
01179     A pointer to application defined callbacks for managing memory allocations.
01180 
01181 
01182 Remarks
01183 -------
01184 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
01185 
01186 
01187 See Also
01188 -------
01189 drflac_open_with_metadata()
01190 drflac_open()
01191 drflac_close()
01192 */
01193 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
01194 
01195 
01196 
01197 /* High Level APIs */
01198 
01199 /*
01200 Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
01201 pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
01202 
01203 You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
01204 case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
01205 
01206 Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
01207 read samples into a dynamically sized buffer on the heap until no samples are left.
01208 
01209 Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
01210 */
01211 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01212 
01213 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
01214 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01215 
01216 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
01217 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01218 
01219 #ifndef DR_FLAC_NO_STDIO
01220 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
01221 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01222 
01223 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
01224 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01225 
01226 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
01227 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01228 #endif
01229 
01230 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
01231 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01232 
01233 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
01234 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01235 
01236 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
01237 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
01238 
01239 /*
01240 Frees memory that was allocated internally by dr_flac.
01241 
01242 Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
01243 */
01244 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
01245 
01246 
01247 /* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
01248 typedef struct
01249 {
01250     drflac_uint32 countRemaining;
01251     const char* pRunningData;
01252 } drflac_vorbis_comment_iterator;
01253 
01254 /*
01255 Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
01256 metadata block.
01257 */
01258 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
01259 
01260 /*
01261 Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
01262 returned string is NOT null terminated.
01263 */
01264 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
01265 
01266 
01267 /* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
01268 typedef struct
01269 {
01270     drflac_uint32 countRemaining;
01271     const char* pRunningData;
01272 } drflac_cuesheet_track_iterator;
01273 
01274 /* Packing is important on this structure because we map this directly to the raw data within the CUESHEET metadata block. */
01275 #pragma pack(4)
01276 typedef struct
01277 {
01278     drflac_uint64 offset;
01279     drflac_uint8 index;
01280     drflac_uint8 reserved[3];
01281 } drflac_cuesheet_track_index;
01282 #pragma pack()
01283 
01284 typedef struct
01285 {
01286     drflac_uint64 offset;
01287     drflac_uint8 trackNumber;
01288     char ISRC[12];
01289     drflac_bool8 isAudio;
01290     drflac_bool8 preEmphasis;
01291     drflac_uint8 indexCount;
01292     const drflac_cuesheet_track_index* pIndexPoints;
01293 } drflac_cuesheet_track;
01294 
01295 /*
01296 Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
01297 block.
01298 */
01299 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
01300 
01301 /* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
01302 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
01303 
01304 
01305 #ifdef __cplusplus
01306 }
01307 #endif
01308 #endif  /* dr_flac_h */
01309 
01310 
01311 /************************************************************************************************************************************************************
01312  ************************************************************************************************************************************************************
01313 
01314  IMPLEMENTATION
01315 
01316  ************************************************************************************************************************************************************
01317  ************************************************************************************************************************************************************/
01318 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
01319 
01320 /* Disable some annoying warnings. */
01321 #if defined(__GNUC__)
01322     #pragma GCC diagnostic push
01323     #if __GNUC__ >= 7
01324     #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
01325     #endif
01326 #endif
01327 
01328 #ifdef __linux__
01329     #ifndef _BSD_SOURCE
01330         #define _BSD_SOURCE
01331     #endif
01332     #ifndef __USE_BSD
01333         #define __USE_BSD
01334     #endif
01335     #include <endian.h>
01336 #endif
01337 
01338 #include <stdlib.h>
01339 #include <string.h>
01340 
01341 #ifdef _MSC_VER
01342     #define DRFLAC_INLINE __forceinline
01343 #elif defined(__GNUC__)
01344     /*
01345     I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
01346     the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
01347     case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
01348     command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
01349     I am using "__inline__" only when we're compiling in strict ANSI mode.
01350     */
01351     #if defined(__STRICT_ANSI__)
01352         #define DRFLAC_INLINE __inline__ __attribute__((always_inline))
01353     #else
01354         #define DRFLAC_INLINE inline __attribute__((always_inline))
01355     #endif
01356 #else
01357     #define DRFLAC_INLINE
01358 #endif
01359 
01360 /* CPU architecture. */
01361 #if defined(__x86_64__) || defined(_M_X64)
01362     #define DRFLAC_X64
01363 #elif defined(__i386) || defined(_M_IX86)
01364     #define DRFLAC_X86
01365 #elif defined(__arm__) || defined(_M_ARM)
01366     #define DRFLAC_ARM
01367 #endif
01368 
01369 /* Intrinsics Support */
01370 #if !defined(DR_FLAC_NO_SIMD)
01371     #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
01372         #if defined(_MSC_VER) && !defined(__clang__)
01373             /* MSVC. */
01374             #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2)    /* 2005 */
01375                 #define DRFLAC_SUPPORT_SSE2
01376             #endif
01377             #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41)   /* 2010 */
01378                 #define DRFLAC_SUPPORT_SSE41
01379             #endif
01380         #else
01381             /* Assume GNUC-style. */
01382             #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
01383                 #define DRFLAC_SUPPORT_SSE2
01384             #endif
01385             #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
01386                 #define DRFLAC_SUPPORT_SSE41
01387             #endif
01388         #endif
01389 
01390         /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
01391         #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
01392             #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
01393                 #define DRFLAC_SUPPORT_SSE2
01394             #endif
01395             #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
01396                 #define DRFLAC_SUPPORT_SSE41
01397             #endif
01398         #endif
01399 
01400         #if defined(DRFLAC_SUPPORT_SSE41)
01401             #include <smmintrin.h>
01402         #elif defined(DRFLAC_SUPPORT_SSE2)
01403             #include <emmintrin.h>
01404         #endif
01405     #endif
01406 
01407     #if defined(DRFLAC_ARM)
01408         #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
01409             #define DRFLAC_SUPPORT_NEON
01410         #endif
01411 
01412         /* Fall back to looking for the #include file. */
01413         #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
01414             #if !defined(DRFLAC_SUPPORT_NEON) && !defined(DRFLAC_NO_NEON) && __has_include(<arm_neon.h>)
01415                 #define DRFLAC_SUPPORT_NEON
01416             #endif
01417         #endif
01418 
01419         #if defined(DRFLAC_SUPPORT_NEON)
01420             #include <arm_neon.h>
01421         #endif
01422     #endif
01423 #endif
01424 
01425 /* Compile-time CPU feature support. */
01426 #if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
01427     #if defined(_MSC_VER) && !defined(__clang__)
01428         #if _MSC_VER >= 1400
01429             #include <intrin.h>
01430             static void drflac__cpuid(int info[4], int fid)
01431             {
01432                 __cpuid(info, fid);
01433             }
01434         #else
01435             #define DRFLAC_NO_CPUID
01436         #endif
01437     #else
01438         #if defined(__GNUC__) || defined(__clang__)
01439             static void drflac__cpuid(int info[4], int fid)
01440             {
01441                 /*
01442                 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
01443                 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
01444                 supporting different assembly dialects.
01445 
01446                 What's basically happening is that we're saving and restoring the ebx register manually.
01447                 */
01448                 #if defined(DRFLAC_X86) && defined(__PIC__)
01449                     __asm__ __volatile__ (
01450                         "xchg{l} {%%}ebx, %k1;"
01451                         "cpuid;"
01452                         "xchg{l} {%%}ebx, %k1;"
01453                         : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
01454                     );
01455                 #else
01456                     __asm__ __volatile__ (
01457                         "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
01458                     );
01459                 #endif
01460             }
01461         #else
01462             #define DRFLAC_NO_CPUID
01463         #endif
01464     #endif
01465 #else
01466     #define DRFLAC_NO_CPUID
01467 #endif
01468 
01469 static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
01470 {
01471 #if defined(DRFLAC_SUPPORT_SSE2)
01472     #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
01473         #if defined(DRFLAC_X64)
01474             return DRFLAC_TRUE;    /* 64-bit targets always support SSE2. */
01475         #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
01476             return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
01477         #else
01478             #if defined(DRFLAC_NO_CPUID)
01479                 return DRFLAC_FALSE;
01480             #else
01481                 int info[4];
01482                 drflac__cpuid(info, 1);
01483                 return (info[3] & (1 << 26)) != 0;
01484             #endif
01485         #endif
01486     #else
01487         return DRFLAC_FALSE;       /* SSE2 is only supported on x86 and x64 architectures. */
01488     #endif
01489 #else
01490     return DRFLAC_FALSE;           /* No compiler support. */
01491 #endif
01492 }
01493 
01494 static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
01495 {
01496 #if defined(DRFLAC_SUPPORT_SSE41)
01497     #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
01498         #if defined(DRFLAC_X64)
01499             return DRFLAC_TRUE;    /* 64-bit targets always support SSE4.1. */
01500         #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE4_1__)
01501             return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
01502         #else
01503             #if defined(DRFLAC_NO_CPUID)
01504                 return DRFLAC_FALSE;
01505             #else
01506                 int info[4];
01507                 drflac__cpuid(info, 1);
01508                 return (info[2] & (1 << 19)) != 0;
01509             #endif
01510         #endif
01511     #else
01512         return DRFLAC_FALSE;       /* SSE41 is only supported on x86 and x64 architectures. */
01513     #endif
01514 #else
01515     return DRFLAC_FALSE;           /* No compiler support. */
01516 #endif
01517 }
01518 
01519 
01520 #if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
01521     #define DRFLAC_HAS_LZCNT_INTRINSIC
01522 #elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
01523     #define DRFLAC_HAS_LZCNT_INTRINSIC
01524 #elif defined(__clang__)
01525     #if defined(__has_builtin)
01526         #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
01527             #define DRFLAC_HAS_LZCNT_INTRINSIC
01528         #endif
01529     #endif
01530 #endif
01531 
01532 #if defined(_MSC_VER) && _MSC_VER >= 1400
01533     #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
01534     #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
01535     #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
01536 #elif defined(__clang__)
01537     #if defined(__has_builtin)
01538         #if __has_builtin(__builtin_bswap16)
01539             #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
01540         #endif
01541         #if __has_builtin(__builtin_bswap32)
01542             #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
01543         #endif
01544         #if __has_builtin(__builtin_bswap64)
01545             #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
01546         #endif
01547     #endif
01548 #elif defined(__GNUC__)
01549     #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
01550         #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
01551         #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
01552     #endif
01553     #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
01554         #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
01555     #endif
01556 #endif
01557 
01558 
01559 /* Standard library stuff. */
01560 #ifndef DRFLAC_ASSERT
01561 #include <assert.h>
01562 #define DRFLAC_ASSERT(expression)           assert(expression)
01563 #endif
01564 #ifndef DRFLAC_MALLOC
01565 #define DRFLAC_MALLOC(sz)                   malloc((sz))
01566 #endif
01567 #ifndef DRFLAC_REALLOC
01568 #define DRFLAC_REALLOC(p, sz)               realloc((p), (sz))
01569 #endif
01570 #ifndef DRFLAC_FREE
01571 #define DRFLAC_FREE(p)                      free((p))
01572 #endif
01573 #ifndef DRFLAC_COPY_MEMORY
01574 #define DRFLAC_COPY_MEMORY(dst, src, sz)    memcpy((dst), (src), (sz))
01575 #endif
01576 #ifndef DRFLAC_ZERO_MEMORY
01577 #define DRFLAC_ZERO_MEMORY(p, sz)           memset((p), 0, (sz))
01578 #endif
01579 #ifndef DRFLAC_ZERO_OBJECT
01580 #define DRFLAC_ZERO_OBJECT(p)               DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
01581 #endif
01582 
01583 #define DRFLAC_MAX_SIMD_VECTOR_SIZE                     64  /* 64 for AVX-512 in the future. */
01584 
01585 typedef drflac_int32 drflac_result;
01586 #define DRFLAC_SUCCESS                                   0
01587 #define DRFLAC_ERROR                                    -1   /* A generic error. */
01588 #define DRFLAC_INVALID_ARGS                             -2
01589 #define DRFLAC_INVALID_OPERATION                        -3
01590 #define DRFLAC_OUT_OF_MEMORY                            -4
01591 #define DRFLAC_OUT_OF_RANGE                             -5
01592 #define DRFLAC_ACCESS_DENIED                            -6
01593 #define DRFLAC_DOES_NOT_EXIST                           -7
01594 #define DRFLAC_ALREADY_EXISTS                           -8
01595 #define DRFLAC_TOO_MANY_OPEN_FILES                      -9
01596 #define DRFLAC_INVALID_FILE                             -10
01597 #define DRFLAC_TOO_BIG                                  -11
01598 #define DRFLAC_PATH_TOO_LONG                            -12
01599 #define DRFLAC_NAME_TOO_LONG                            -13
01600 #define DRFLAC_NOT_DIRECTORY                            -14
01601 #define DRFLAC_IS_DIRECTORY                             -15
01602 #define DRFLAC_DIRECTORY_NOT_EMPTY                      -16
01603 #define DRFLAC_END_OF_FILE                              -17
01604 #define DRFLAC_NO_SPACE                                 -18
01605 #define DRFLAC_BUSY                                     -19
01606 #define DRFLAC_IO_ERROR                                 -20
01607 #define DRFLAC_INTERRUPT                                -21
01608 #define DRFLAC_UNAVAILABLE                              -22
01609 #define DRFLAC_ALREADY_IN_USE                           -23
01610 #define DRFLAC_BAD_ADDRESS                              -24
01611 #define DRFLAC_BAD_SEEK                                 -25
01612 #define DRFLAC_BAD_PIPE                                 -26
01613 #define DRFLAC_DEADLOCK                                 -27
01614 #define DRFLAC_TOO_MANY_LINKS                           -28
01615 #define DRFLAC_NOT_IMPLEMENTED                          -29
01616 #define DRFLAC_NO_MESSAGE                               -30
01617 #define DRFLAC_BAD_MESSAGE                              -31
01618 #define DRFLAC_NO_DATA_AVAILABLE                        -32
01619 #define DRFLAC_INVALID_DATA                             -33
01620 #define DRFLAC_TIMEOUT                                  -34
01621 #define DRFLAC_NO_NETWORK                               -35
01622 #define DRFLAC_NOT_UNIQUE                               -36
01623 #define DRFLAC_NOT_SOCKET                               -37
01624 #define DRFLAC_NO_ADDRESS                               -38
01625 #define DRFLAC_BAD_PROTOCOL                             -39
01626 #define DRFLAC_PROTOCOL_UNAVAILABLE                     -40
01627 #define DRFLAC_PROTOCOL_NOT_SUPPORTED                   -41
01628 #define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED            -42
01629 #define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED             -43
01630 #define DRFLAC_SOCKET_NOT_SUPPORTED                     -44
01631 #define DRFLAC_CONNECTION_RESET                         -45
01632 #define DRFLAC_ALREADY_CONNECTED                        -46
01633 #define DRFLAC_NOT_CONNECTED                            -47
01634 #define DRFLAC_CONNECTION_REFUSED                       -48
01635 #define DRFLAC_NO_HOST                                  -49
01636 #define DRFLAC_IN_PROGRESS                              -50
01637 #define DRFLAC_CANCELLED                                -51
01638 #define DRFLAC_MEMORY_ALREADY_MAPPED                    -52
01639 #define DRFLAC_AT_END                                   -53
01640 #define DRFLAC_CRC_MISMATCH                             -128
01641 
01642 #define DRFLAC_SUBFRAME_CONSTANT                        0
01643 #define DRFLAC_SUBFRAME_VERBATIM                        1
01644 #define DRFLAC_SUBFRAME_FIXED                           8
01645 #define DRFLAC_SUBFRAME_LPC                             32
01646 #define DRFLAC_SUBFRAME_RESERVED                        255
01647 
01648 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE  0
01649 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
01650 
01651 #define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT           0
01652 #define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE             8
01653 #define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE            9
01654 #define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE              10
01655 
01656 #define drflac_align(x, a)                              ((((x) + (a) - 1) / (a)) * (a))
01657 
01658 
01659 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
01660 {
01661     if (pMajor) {
01662         *pMajor = DRFLAC_VERSION_MAJOR;
01663     }
01664 
01665     if (pMinor) {
01666         *pMinor = DRFLAC_VERSION_MINOR;
01667     }
01668 
01669     if (pRevision) {
01670         *pRevision = DRFLAC_VERSION_REVISION;
01671     }
01672 }
01673 
01674 DRFLAC_API const char* drflac_version_string()
01675 {
01676     return DRFLAC_VERSION_STRING;
01677 }
01678 
01679 
01680 /* CPU caps. */
01681 #if defined(__has_feature)
01682     #if __has_feature(thread_sanitizer)
01683         #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
01684     #else
01685         #define DRFLAC_NO_THREAD_SANITIZE
01686     #endif
01687 #else
01688     #define DRFLAC_NO_THREAD_SANITIZE
01689 #endif
01690 
01691 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
01692 static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
01693 #endif
01694 
01695 #ifndef DRFLAC_NO_CPUID
01696 static drflac_bool32 drflac__gIsSSE2Supported  = DRFLAC_FALSE;
01697 static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
01698 
01699 /*
01700 I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
01701 actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
01702 complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
01703 just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
01704 */
01705 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
01706 {
01707     static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
01708 
01709     if (!isCPUCapsInitialized) {
01710         /* LZCNT */
01711 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
01712         int info[4] = {0};
01713         drflac__cpuid(info, 0x80000001);
01714         drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
01715 #endif
01716 
01717         /* SSE2 */
01718         drflac__gIsSSE2Supported = drflac_has_sse2();
01719 
01720         /* SSE4.1 */
01721         drflac__gIsSSE41Supported = drflac_has_sse41();
01722 
01723         /* Initialized. */
01724         isCPUCapsInitialized = DRFLAC_TRUE;
01725     }
01726 }
01727 #else
01728 static drflac_bool32 drflac__gIsNEONSupported  = DRFLAC_FALSE;
01729 
01730 static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
01731 {
01732 #if defined(DRFLAC_SUPPORT_NEON)
01733     #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
01734         #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
01735             return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate NEON code we can assume support. */
01736         #else
01737             /* TODO: Runtime check. */
01738             return DRFLAC_FALSE;
01739         #endif
01740     #else
01741         return DRFLAC_FALSE;       /* NEON is only supported on ARM architectures. */
01742     #endif
01743 #else
01744     return DRFLAC_FALSE;           /* No compiler support. */
01745 #endif
01746 }
01747 
01748 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
01749 {
01750     drflac__gIsNEONSupported = drflac__has_neon();
01751 
01752 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
01753     drflac__gIsLZCNTSupported = DRFLAC_TRUE;
01754 #endif
01755 }
01756 #endif
01757 
01758 
01759 /* Endian Management */
01760 static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
01761 {
01762 #if defined(DRFLAC_X86) || defined(DRFLAC_X64)
01763     return DRFLAC_TRUE;
01764 #elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
01765     return DRFLAC_TRUE;
01766 #else
01767     int n = 1;
01768     return (*(char*)&n) == 1;
01769 #endif
01770 }
01771 
01772 static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
01773 {
01774 #ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
01775     #if defined(_MSC_VER)
01776         return _byteswap_ushort(n);
01777     #elif defined(__GNUC__) || defined(__clang__)
01778         return __builtin_bswap16(n);
01779     #else
01780         #error "This compiler does not support the byte swap intrinsic."
01781     #endif
01782 #else
01783     return ((n & 0xFF00) >> 8) |
01784            ((n & 0x00FF) << 8);
01785 #endif
01786 }
01787 
01788 static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
01789 {
01790 #ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
01791     #if defined(_MSC_VER)
01792         return _byteswap_ulong(n);
01793     #elif defined(__GNUC__) || defined(__clang__)
01794         #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(DRFLAC_64BIT)   /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
01795             /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
01796             drflac_uint32 r;
01797             __asm__ __volatile__ (
01798             #if defined(DRFLAC_64BIT)
01799                 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
01800             #else
01801                 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
01802             #endif
01803             );
01804             return r;
01805         #else
01806             return __builtin_bswap32(n);
01807         #endif
01808     #else
01809         #error "This compiler does not support the byte swap intrinsic."
01810     #endif
01811 #else
01812     return ((n & 0xFF000000) >> 24) |
01813            ((n & 0x00FF0000) >>  8) |
01814            ((n & 0x0000FF00) <<  8) |
01815            ((n & 0x000000FF) << 24);
01816 #endif
01817 }
01818 
01819 static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
01820 {
01821 #ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
01822     #if defined(_MSC_VER)
01823         return _byteswap_uint64(n);
01824     #elif defined(__GNUC__) || defined(__clang__)
01825         return __builtin_bswap64(n);
01826     #else
01827         #error "This compiler does not support the byte swap intrinsic."
01828     #endif
01829 #else
01830     return ((n & (drflac_uint64)0xFF00000000000000) >> 56) |
01831            ((n & (drflac_uint64)0x00FF000000000000) >> 40) |
01832            ((n & (drflac_uint64)0x0000FF0000000000) >> 24) |
01833            ((n & (drflac_uint64)0x000000FF00000000) >>  8) |
01834            ((n & (drflac_uint64)0x00000000FF000000) <<  8) |
01835            ((n & (drflac_uint64)0x0000000000FF0000) << 24) |
01836            ((n & (drflac_uint64)0x000000000000FF00) << 40) |
01837            ((n & (drflac_uint64)0x00000000000000FF) << 56);
01838 #endif
01839 }
01840 
01841 
01842 static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
01843 {
01844     if (drflac__is_little_endian()) {
01845         return drflac__swap_endian_uint16(n);
01846     }
01847 
01848     return n;
01849 }
01850 
01851 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
01852 {
01853     if (drflac__is_little_endian()) {
01854         return drflac__swap_endian_uint32(n);
01855     }
01856 
01857     return n;
01858 }
01859 
01860 static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
01861 {
01862     if (drflac__is_little_endian()) {
01863         return drflac__swap_endian_uint64(n);
01864     }
01865 
01866     return n;
01867 }
01868 
01869 
01870 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
01871 {
01872     if (!drflac__is_little_endian()) {
01873         return drflac__swap_endian_uint32(n);
01874     }
01875 
01876     return n;
01877 }
01878 
01879 
01880 static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
01881 {
01882     drflac_uint32 result = 0;
01883     result |= (n & 0x7F000000) >> 3;
01884     result |= (n & 0x007F0000) >> 2;
01885     result |= (n & 0x00007F00) >> 1;
01886     result |= (n & 0x0000007F) >> 0;
01887 
01888     return result;
01889 }
01890 
01891 
01892 
01893 /* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
01894 static drflac_uint8 drflac__crc8_table[] = {
01895     0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
01896     0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
01897     0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
01898     0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
01899     0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
01900     0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
01901     0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
01902     0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
01903     0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
01904     0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
01905     0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
01906     0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
01907     0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
01908     0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
01909     0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
01910     0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
01911 };
01912 
01913 static drflac_uint16 drflac__crc16_table[] = {
01914     0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
01915     0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
01916     0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
01917     0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
01918     0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
01919     0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
01920     0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
01921     0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
01922     0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
01923     0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
01924     0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
01925     0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
01926     0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
01927     0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
01928     0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
01929     0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
01930     0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
01931     0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
01932     0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
01933     0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
01934     0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
01935     0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
01936     0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
01937     0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
01938     0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
01939     0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
01940     0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
01941     0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
01942     0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
01943     0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
01944     0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
01945     0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
01946 };
01947 
01948 static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
01949 {
01950     return drflac__crc8_table[crc ^ data];
01951 }
01952 
01953 static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
01954 {
01955 #ifdef DR_FLAC_NO_CRC
01956     (void)crc;
01957     (void)data;
01958     (void)count;
01959     return 0;
01960 #else
01961 #if 0
01962     /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
01963     drflac_uint8 p = 0x07;
01964     for (int i = count-1; i >= 0; --i) {
01965         drflac_uint8 bit = (data & (1 << i)) >> i;
01966         if (crc & 0x80) {
01967             crc = ((crc << 1) | bit) ^ p;
01968         } else {
01969             crc = ((crc << 1) | bit);
01970         }
01971     }
01972     return crc;
01973 #else
01974     drflac_uint32 wholeBytes;
01975     drflac_uint32 leftoverBits;
01976     drflac_uint64 leftoverDataMask;
01977 
01978     static drflac_uint64 leftoverDataMaskTable[8] = {
01979         0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
01980     };
01981 
01982     DRFLAC_ASSERT(count <= 32);
01983 
01984     wholeBytes = count >> 3;
01985     leftoverBits = count - (wholeBytes*8);
01986     leftoverDataMask = leftoverDataMaskTable[leftoverBits];
01987 
01988     switch (wholeBytes) {
01989         case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
01990         case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
01991         case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
01992         case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
01993         case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
01994     }
01995     return crc;
01996 #endif
01997 #endif
01998 }
01999 
02000 static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
02001 {
02002     return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
02003 }
02004 
02005 static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
02006 {
02007 #ifdef DRFLAC_64BIT
02008     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
02009     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
02010     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
02011     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
02012 #endif
02013     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
02014     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
02015     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
02016     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
02017 
02018     return crc;
02019 }
02020 
02021 static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
02022 {
02023     switch (byteCount)
02024     {
02025 #ifdef DRFLAC_64BIT
02026     case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
02027     case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
02028     case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
02029     case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
02030 #endif
02031     case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
02032     case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
02033     case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
02034     case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
02035     }
02036 
02037     return crc;
02038 }
02039 
02040 #if 0
02041 static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
02042 {
02043 #ifdef DR_FLAC_NO_CRC
02044     (void)crc;
02045     (void)data;
02046     (void)count;
02047     return 0;
02048 #else
02049 #if 0
02050     /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
02051     drflac_uint16 p = 0x8005;
02052     for (int i = count-1; i >= 0; --i) {
02053         drflac_uint16 bit = (data & (1ULL << i)) >> i;
02054         if (r & 0x8000) {
02055             r = ((r << 1) | bit) ^ p;
02056         } else {
02057             r = ((r << 1) | bit);
02058         }
02059     }
02060 
02061     return crc;
02062 #else
02063     drflac_uint32 wholeBytes;
02064     drflac_uint32 leftoverBits;
02065     drflac_uint64 leftoverDataMask;
02066 
02067     static drflac_uint64 leftoverDataMaskTable[8] = {
02068         0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
02069     };
02070 
02071     DRFLAC_ASSERT(count <= 64);
02072 
02073     wholeBytes = count >> 3;
02074     leftoverBits = count & 7;
02075     leftoverDataMask = leftoverDataMaskTable[leftoverBits];
02076 
02077     switch (wholeBytes) {
02078         default:
02079         case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
02080         case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
02081         case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
02082         case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
02083         case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
02084     }
02085     return crc;
02086 #endif
02087 #endif
02088 }
02089 
02090 static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
02091 {
02092 #ifdef DR_FLAC_NO_CRC
02093     (void)crc;
02094     (void)data;
02095     (void)count;
02096     return 0;
02097 #else
02098     drflac_uint32 wholeBytes;
02099     drflac_uint32 leftoverBits;
02100     drflac_uint64 leftoverDataMask;
02101 
02102     static drflac_uint64 leftoverDataMaskTable[8] = {
02103         0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
02104     };
02105 
02106     DRFLAC_ASSERT(count <= 64);
02107 
02108     wholeBytes = count >> 3;
02109     leftoverBits = count & 7;
02110     leftoverDataMask = leftoverDataMaskTable[leftoverBits];
02111 
02112     switch (wholeBytes) {
02113         default:
02114         case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits)));    /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
02115         case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
02116         case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
02117         case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
02118         case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000      ) << leftoverBits)) >> (24 + leftoverBits)));
02119         case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000      ) << leftoverBits)) >> (16 + leftoverBits)));
02120         case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00      ) << leftoverBits)) >> ( 8 + leftoverBits)));
02121         case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF      ) << leftoverBits)) >> ( 0 + leftoverBits)));
02122         case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
02123     }
02124     return crc;
02125 #endif
02126 }
02127 
02128 
02129 static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
02130 {
02131 #ifdef DRFLAC_64BIT
02132     return drflac_crc16__64bit(crc, data, count);
02133 #else
02134     return drflac_crc16__32bit(crc, data, count);
02135 #endif
02136 }
02137 #endif
02138 
02139 
02140 #ifdef DRFLAC_64BIT
02141 #define drflac__be2host__cache_line drflac__be2host_64
02142 #else
02143 #define drflac__be2host__cache_line drflac__be2host_32
02144 #endif
02145 
02146 /*
02147 BIT READING ATTEMPT #2
02148 
02149 This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
02150 on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
02151 is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
02152 array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
02153 from onRead() is read into.
02154 */
02155 #define DRFLAC_CACHE_L1_SIZE_BYTES(bs)                      (sizeof((bs)->cache))
02156 #define DRFLAC_CACHE_L1_SIZE_BITS(bs)                       (sizeof((bs)->cache)*8)
02157 #define DRFLAC_CACHE_L1_BITS_REMAINING(bs)                  (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
02158 #define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount)           (~((~(drflac_cache_t)0) >> (_bitCount)))
02159 #define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount)      (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
02160 #define DRFLAC_CACHE_L1_SELECT(bs, _bitCount)               (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
02161 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount)     (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >>  DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
02162 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
02163 #define DRFLAC_CACHE_L2_SIZE_BYTES(bs)                      (sizeof((bs)->cacheL2))
02164 #define DRFLAC_CACHE_L2_LINE_COUNT(bs)                      (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
02165 #define DRFLAC_CACHE_L2_LINES_REMAINING(bs)                 (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
02166 
02167 
02168 #ifndef DR_FLAC_NO_CRC
02169 static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
02170 {
02171     bs->crc16 = 0;
02172     bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
02173 }
02174 
02175 static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
02176 {
02177     if (bs->crc16CacheIgnoredBytes == 0) {
02178         bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
02179     } else {
02180         bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
02181         bs->crc16CacheIgnoredBytes = 0;
02182     }
02183 }
02184 
02185 static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
02186 {
02187     /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
02188     DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
02189 
02190     /*
02191     The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
02192     by the number of bits that have been consumed.
02193     */
02194     if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
02195         drflac__update_crc16(bs);
02196     } else {
02197         /* We only accumulate the consumed bits. */
02198         bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
02199 
02200         /*
02201         The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
02202         so we can handle that later.
02203         */
02204         bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
02205     }
02206 
02207     return bs->crc16;
02208 }
02209 #endif
02210 
02211 static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
02212 {
02213     size_t bytesRead;
02214     size_t alignedL1LineCount;
02215 
02216     /* Fast path. Try loading straight from L2. */
02217     if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
02218         bs->cache = bs->cacheL2[bs->nextL2Line++];
02219         return DRFLAC_TRUE;
02220     }
02221 
02222     /*
02223     If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
02224     any left.
02225     */
02226     if (bs->unalignedByteCount > 0) {
02227         return DRFLAC_FALSE;   /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
02228     }
02229 
02230     bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
02231 
02232     bs->nextL2Line = 0;
02233     if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
02234         bs->cache = bs->cacheL2[bs->nextL2Line++];
02235         return DRFLAC_TRUE;
02236     }
02237 
02238 
02239     /*
02240     If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
02241     means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
02242     and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
02243     the size of the L1 so we'll need to seek backwards by any misaligned bytes.
02244     */
02245     alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
02246 
02247     /* We need to keep track of any unaligned bytes for later use. */
02248     bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
02249     if (bs->unalignedByteCount > 0) {
02250         bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
02251     }
02252 
02253     if (alignedL1LineCount > 0) {
02254         size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
02255         size_t i;
02256         for (i = alignedL1LineCount; i > 0; --i) {
02257             bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
02258         }
02259 
02260         bs->nextL2Line = (drflac_uint32)offset;
02261         bs->cache = bs->cacheL2[bs->nextL2Line++];
02262         return DRFLAC_TRUE;
02263     } else {
02264         /* If we get into this branch it means we weren't able to load any L1-aligned data. */
02265         bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
02266         return DRFLAC_FALSE;
02267     }
02268 }
02269 
02270 static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
02271 {
02272     size_t bytesRead;
02273 
02274 #ifndef DR_FLAC_NO_CRC
02275     drflac__update_crc16(bs);
02276 #endif
02277 
02278     /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
02279     if (drflac__reload_l1_cache_from_l2(bs)) {
02280         bs->cache = drflac__be2host__cache_line(bs->cache);
02281         bs->consumedBits = 0;
02282 #ifndef DR_FLAC_NO_CRC
02283         bs->crc16Cache = bs->cache;
02284 #endif
02285         return DRFLAC_TRUE;
02286     }
02287 
02288     /* Slow path. */
02289 
02290     /*
02291     If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
02292     few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
02293     data from the unaligned cache.
02294     */
02295     bytesRead = bs->unalignedByteCount;
02296     if (bytesRead == 0) {
02297         bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- The stream has been exhausted, so marked the bits as consumed. */
02298         return DRFLAC_FALSE;
02299     }
02300 
02301     DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
02302     bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
02303 
02304     bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
02305     bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs));    /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
02306     bs->unalignedByteCount = 0;     /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
02307 
02308 #ifndef DR_FLAC_NO_CRC
02309     bs->crc16Cache = bs->cache >> bs->consumedBits;
02310     bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
02311 #endif
02312     return DRFLAC_TRUE;
02313 }
02314 
02315 static void drflac__reset_cache(drflac_bs* bs)
02316 {
02317     bs->nextL2Line   = DRFLAC_CACHE_L2_LINE_COUNT(bs);  /* <-- This clears the L2 cache. */
02318     bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- This clears the L1 cache. */
02319     bs->cache = 0;
02320     bs->unalignedByteCount = 0;                         /* <-- This clears the trailing unaligned bytes. */
02321     bs->unalignedCache = 0;
02322 
02323 #ifndef DR_FLAC_NO_CRC
02324     bs->crc16Cache = 0;
02325     bs->crc16CacheIgnoredBytes = 0;
02326 #endif
02327 }
02328 
02329 
02330 static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
02331 {
02332     DRFLAC_ASSERT(bs != NULL);
02333     DRFLAC_ASSERT(pResultOut != NULL);
02334     DRFLAC_ASSERT(bitCount > 0);
02335     DRFLAC_ASSERT(bitCount <= 32);
02336 
02337     if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
02338         if (!drflac__reload_cache(bs)) {
02339             return DRFLAC_FALSE;
02340         }
02341     }
02342 
02343     if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
02344         /*
02345         If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
02346         a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
02347         more optimal solution for this.
02348         */
02349 #ifdef DRFLAC_64BIT
02350         *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
02351         bs->consumedBits += bitCount;
02352         bs->cache <<= bitCount;
02353 #else
02354         if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
02355             *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
02356             bs->consumedBits += bitCount;
02357             bs->cache <<= bitCount;
02358         } else {
02359             /* Cannot shift by 32-bits, so need to do it differently. */
02360             *pResultOut = (drflac_uint32)bs->cache;
02361             bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
02362             bs->cache = 0;
02363         }
02364 #endif
02365 
02366         return DRFLAC_TRUE;
02367     } else {
02368         /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
02369         drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
02370         drflac_uint32 bitCountLo = bitCount - bitCountHi;
02371         drflac_uint32 resultHi;
02372 
02373         DRFLAC_ASSERT(bitCountHi > 0);
02374         DRFLAC_ASSERT(bitCountHi < 32);
02375         resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
02376 
02377         if (!drflac__reload_cache(bs)) {
02378             return DRFLAC_FALSE;
02379         }
02380 
02381         *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
02382         bs->consumedBits += bitCountLo;
02383         bs->cache <<= bitCountLo;
02384         return DRFLAC_TRUE;
02385     }
02386 }
02387 
02388 static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
02389 {
02390     drflac_uint32 result;
02391     drflac_uint32 signbit;
02392 
02393     DRFLAC_ASSERT(bs != NULL);
02394     DRFLAC_ASSERT(pResult != NULL);
02395     DRFLAC_ASSERT(bitCount > 0);
02396     DRFLAC_ASSERT(bitCount <= 32);
02397 
02398     if (!drflac__read_uint32(bs, bitCount, &result)) {
02399         return DRFLAC_FALSE;
02400     }
02401 
02402     signbit = ((result >> (bitCount-1)) & 0x01);
02403     result |= (~signbit + 1) << bitCount;
02404 
02405     *pResult = (drflac_int32)result;
02406     return DRFLAC_TRUE;
02407 }
02408 
02409 #ifdef DRFLAC_64BIT
02410 static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
02411 {
02412     drflac_uint32 resultHi;
02413     drflac_uint32 resultLo;
02414 
02415     DRFLAC_ASSERT(bitCount <= 64);
02416     DRFLAC_ASSERT(bitCount >  32);
02417 
02418     if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
02419         return DRFLAC_FALSE;
02420     }
02421 
02422     if (!drflac__read_uint32(bs, 32, &resultLo)) {
02423         return DRFLAC_FALSE;
02424     }
02425 
02426     *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
02427     return DRFLAC_TRUE;
02428 }
02429 #endif
02430 
02431 /* Function below is unused, but leaving it here in case I need to quickly add it again. */
02432 #if 0
02433 static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
02434 {
02435     drflac_uint64 result;
02436     drflac_uint64 signbit;
02437 
02438     DRFLAC_ASSERT(bitCount <= 64);
02439 
02440     if (!drflac__read_uint64(bs, bitCount, &result)) {
02441         return DRFLAC_FALSE;
02442     }
02443 
02444     signbit = ((result >> (bitCount-1)) & 0x01);
02445     result |= (~signbit + 1) << bitCount;
02446 
02447     *pResultOut = (drflac_int64)result;
02448     return DRFLAC_TRUE;
02449 }
02450 #endif
02451 
02452 static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
02453 {
02454     drflac_uint32 result;
02455 
02456     DRFLAC_ASSERT(bs != NULL);
02457     DRFLAC_ASSERT(pResult != NULL);
02458     DRFLAC_ASSERT(bitCount > 0);
02459     DRFLAC_ASSERT(bitCount <= 16);
02460 
02461     if (!drflac__read_uint32(bs, bitCount, &result)) {
02462         return DRFLAC_FALSE;
02463     }
02464 
02465     *pResult = (drflac_uint16)result;
02466     return DRFLAC_TRUE;
02467 }
02468 
02469 #if 0
02470 static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
02471 {
02472     drflac_int32 result;
02473 
02474     DRFLAC_ASSERT(bs != NULL);
02475     DRFLAC_ASSERT(pResult != NULL);
02476     DRFLAC_ASSERT(bitCount > 0);
02477     DRFLAC_ASSERT(bitCount <= 16);
02478 
02479     if (!drflac__read_int32(bs, bitCount, &result)) {
02480         return DRFLAC_FALSE;
02481     }
02482 
02483     *pResult = (drflac_int16)result;
02484     return DRFLAC_TRUE;
02485 }
02486 #endif
02487 
02488 static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
02489 {
02490     drflac_uint32 result;
02491 
02492     DRFLAC_ASSERT(bs != NULL);
02493     DRFLAC_ASSERT(pResult != NULL);
02494     DRFLAC_ASSERT(bitCount > 0);
02495     DRFLAC_ASSERT(bitCount <= 8);
02496 
02497     if (!drflac__read_uint32(bs, bitCount, &result)) {
02498         return DRFLAC_FALSE;
02499     }
02500 
02501     *pResult = (drflac_uint8)result;
02502     return DRFLAC_TRUE;
02503 }
02504 
02505 static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
02506 {
02507     drflac_int32 result;
02508 
02509     DRFLAC_ASSERT(bs != NULL);
02510     DRFLAC_ASSERT(pResult != NULL);
02511     DRFLAC_ASSERT(bitCount > 0);
02512     DRFLAC_ASSERT(bitCount <= 8);
02513 
02514     if (!drflac__read_int32(bs, bitCount, &result)) {
02515         return DRFLAC_FALSE;
02516     }
02517 
02518     *pResult = (drflac_int8)result;
02519     return DRFLAC_TRUE;
02520 }
02521 
02522 
02523 static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
02524 {
02525     if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
02526         bs->consumedBits += (drflac_uint32)bitsToSeek;
02527         bs->cache <<= bitsToSeek;
02528         return DRFLAC_TRUE;
02529     } else {
02530         /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
02531         bitsToSeek       -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
02532         bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
02533         bs->cache         = 0;
02534 
02535         /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
02536 #ifdef DRFLAC_64BIT
02537         while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
02538             drflac_uint64 bin;
02539             if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
02540                 return DRFLAC_FALSE;
02541             }
02542             bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
02543         }
02544 #else
02545         while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
02546             drflac_uint32 bin;
02547             if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
02548                 return DRFLAC_FALSE;
02549             }
02550             bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
02551         }
02552 #endif
02553 
02554         /* Whole leftover bytes. */
02555         while (bitsToSeek >= 8) {
02556             drflac_uint8 bin;
02557             if (!drflac__read_uint8(bs, 8, &bin)) {
02558                 return DRFLAC_FALSE;
02559             }
02560             bitsToSeek -= 8;
02561         }
02562 
02563         /* Leftover bits. */
02564         if (bitsToSeek > 0) {
02565             drflac_uint8 bin;
02566             if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
02567                 return DRFLAC_FALSE;
02568             }
02569             bitsToSeek = 0; /* <-- Necessary for the assert below. */
02570         }
02571 
02572         DRFLAC_ASSERT(bitsToSeek == 0);
02573         return DRFLAC_TRUE;
02574     }
02575 }
02576 
02577 
02578 /* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
02579 static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
02580 {
02581     DRFLAC_ASSERT(bs != NULL);
02582 
02583     /*
02584     The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
02585     thing to do is align to the next byte.
02586     */
02587     if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
02588         return DRFLAC_FALSE;
02589     }
02590 
02591     for (;;) {
02592         drflac_uint8 hi;
02593 
02594 #ifndef DR_FLAC_NO_CRC
02595         drflac__reset_crc16(bs);
02596 #endif
02597 
02598         if (!drflac__read_uint8(bs, 8, &hi)) {
02599             return DRFLAC_FALSE;
02600         }
02601 
02602         if (hi == 0xFF) {
02603             drflac_uint8 lo;
02604             if (!drflac__read_uint8(bs, 6, &lo)) {
02605                 return DRFLAC_FALSE;
02606             }
02607 
02608             if (lo == 0x3E) {
02609                 return DRFLAC_TRUE;
02610             } else {
02611                 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
02612                     return DRFLAC_FALSE;
02613                 }
02614             }
02615         }
02616     }
02617 
02618     /* Should never get here. */
02619     /*return DRFLAC_FALSE;*/
02620 }
02621 
02622 
02623 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
02624 #define DRFLAC_IMPLEMENT_CLZ_LZCNT
02625 #endif
02626 #if  defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86))
02627 #define DRFLAC_IMPLEMENT_CLZ_MSVC
02628 #endif
02629 
02630 static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
02631 {
02632     drflac_uint32 n;
02633     static drflac_uint32 clz_table_4[] = {
02634         0,
02635         4,
02636         3, 3,
02637         2, 2, 2, 2,
02638         1, 1, 1, 1, 1, 1, 1, 1
02639     };
02640 
02641     if (x == 0) {
02642         return sizeof(x)*8;
02643     }
02644 
02645     n = clz_table_4[x >> (sizeof(x)*8 - 4)];
02646     if (n == 0) {
02647 #ifdef DRFLAC_64BIT
02648         if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n  = 32; x <<= 32; }
02649         if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
02650         if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8;  x <<= 8;  }
02651         if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4;  x <<= 4;  }
02652 #else
02653         if ((x & 0xFFFF0000) == 0) { n  = 16; x <<= 16; }
02654         if ((x & 0xFF000000) == 0) { n += 8;  x <<= 8;  }
02655         if ((x & 0xF0000000) == 0) { n += 4;  x <<= 4;  }
02656 #endif
02657         n += clz_table_4[x >> (sizeof(x)*8 - 4)];
02658     }
02659 
02660     return n - 1;
02661 }
02662 
02663 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
02664 static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
02665 {
02666     /* Fast compile time check for ARM. */
02667 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
02668     return DRFLAC_TRUE;
02669 #else
02670     /* If the compiler itself does not support the intrinsic then we'll need to return false. */
02671     #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
02672         return drflac__gIsLZCNTSupported;
02673     #else
02674         return DRFLAC_FALSE;
02675     #endif
02676 #endif
02677 }
02678 
02679 static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
02680 {
02681 #if defined(_MSC_VER) && !defined(__clang__)
02682     #ifdef DRFLAC_64BIT
02683         return (drflac_uint32)__lzcnt64(x);
02684     #else
02685         return (drflac_uint32)__lzcnt(x);
02686     #endif
02687 #else
02688     #if defined(__GNUC__) || defined(__clang__)
02689         #if defined(DRFLAC_X64)
02690             {
02691                 drflac_uint64 r;
02692                 __asm__ __volatile__ (
02693                     "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x)
02694                 );
02695 
02696                 return (drflac_uint32)r;
02697             }
02698         #elif defined(DRFLAC_X86)
02699             {
02700                 drflac_uint32 r;
02701                 __asm__ __volatile__ (
02702                     "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x)
02703                 );
02704 
02705                 return r;
02706             }
02707         #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(DRFLAC_64BIT)   /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
02708             {
02709                 unsigned int r;
02710                 __asm__ __volatile__ (
02711                 #if defined(DRFLAC_64BIT)
02712                     "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
02713                 #else
02714                     "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
02715                 #endif
02716                 );
02717 
02718                 return r;
02719             }
02720         #else
02721             if (x == 0) {
02722                 return sizeof(x)*8;
02723             }
02724             #ifdef DRFLAC_64BIT
02725                 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
02726             #else
02727                 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
02728             #endif
02729         #endif
02730     #else
02731         /* Unsupported compiler. */
02732         #error "This compiler does not support the lzcnt intrinsic."
02733     #endif
02734 #endif
02735 }
02736 #endif
02737 
02738 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
02739 #include <intrin.h> /* For BitScanReverse(). */
02740 
02741 static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
02742 {
02743     drflac_uint32 n;
02744 
02745     if (x == 0) {
02746         return sizeof(x)*8;
02747     }
02748 
02749 #ifdef DRFLAC_64BIT
02750     _BitScanReverse64((unsigned long*)&n, x);
02751 #else
02752     _BitScanReverse((unsigned long*)&n, x);
02753 #endif
02754     return sizeof(x)*8 - n - 1;
02755 }
02756 #endif
02757 
02758 static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
02759 {
02760 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
02761     if (drflac__is_lzcnt_supported()) {
02762         return drflac__clz_lzcnt(x);
02763     } else
02764 #endif
02765     {
02766 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
02767         return drflac__clz_msvc(x);
02768 #else
02769         return drflac__clz_software(x);
02770 #endif
02771     }
02772 }
02773 
02774 
02775 static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
02776 {
02777     drflac_uint32 zeroCounter = 0;
02778     drflac_uint32 setBitOffsetPlus1;
02779 
02780     while (bs->cache == 0) {
02781         zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
02782         if (!drflac__reload_cache(bs)) {
02783             return DRFLAC_FALSE;
02784         }
02785     }
02786 
02787     setBitOffsetPlus1 = drflac__clz(bs->cache);
02788     setBitOffsetPlus1 += 1;
02789 
02790     bs->consumedBits += setBitOffsetPlus1;
02791     bs->cache <<= setBitOffsetPlus1;
02792 
02793     *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
02794     return DRFLAC_TRUE;
02795 }
02796 
02797 
02798 
02799 static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
02800 {
02801     DRFLAC_ASSERT(bs != NULL);
02802     DRFLAC_ASSERT(offsetFromStart > 0);
02803 
02804     /*
02805     Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
02806     is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
02807     To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
02808     */
02809     if (offsetFromStart > 0x7FFFFFFF) {
02810         drflac_uint64 bytesRemaining = offsetFromStart;
02811         if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
02812             return DRFLAC_FALSE;
02813         }
02814         bytesRemaining -= 0x7FFFFFFF;
02815 
02816         while (bytesRemaining > 0x7FFFFFFF) {
02817             if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
02818                 return DRFLAC_FALSE;
02819             }
02820             bytesRemaining -= 0x7FFFFFFF;
02821         }
02822 
02823         if (bytesRemaining > 0) {
02824             if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
02825                 return DRFLAC_FALSE;
02826             }
02827         }
02828     } else {
02829         if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
02830             return DRFLAC_FALSE;
02831         }
02832     }
02833 
02834     /* The cache should be reset to force a reload of fresh data from the client. */
02835     drflac__reset_cache(bs);
02836     return DRFLAC_TRUE;
02837 }
02838 
02839 
02840 static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
02841 {
02842     drflac_uint8 crc;
02843     drflac_uint64 result;
02844     drflac_uint8 utf8[7] = {0};
02845     int byteCount;
02846     int i;
02847 
02848     DRFLAC_ASSERT(bs != NULL);
02849     DRFLAC_ASSERT(pNumberOut != NULL);
02850     DRFLAC_ASSERT(pCRCOut != NULL);
02851 
02852     crc = *pCRCOut;
02853 
02854     if (!drflac__read_uint8(bs, 8, utf8)) {
02855         *pNumberOut = 0;
02856         return DRFLAC_AT_END;
02857     }
02858     crc = drflac_crc8(crc, utf8[0], 8);
02859 
02860     if ((utf8[0] & 0x80) == 0) {
02861         *pNumberOut = utf8[0];
02862         *pCRCOut = crc;
02863         return DRFLAC_SUCCESS;
02864     }
02865 
02866     /*byteCount = 1;*/
02867     if ((utf8[0] & 0xE0) == 0xC0) {
02868         byteCount = 2;
02869     } else if ((utf8[0] & 0xF0) == 0xE0) {
02870         byteCount = 3;
02871     } else if ((utf8[0] & 0xF8) == 0xF0) {
02872         byteCount = 4;
02873     } else if ((utf8[0] & 0xFC) == 0xF8) {
02874         byteCount = 5;
02875     } else if ((utf8[0] & 0xFE) == 0xFC) {
02876         byteCount = 6;
02877     } else if ((utf8[0] & 0xFF) == 0xFE) {
02878         byteCount = 7;
02879     } else {
02880         *pNumberOut = 0;
02881         return DRFLAC_CRC_MISMATCH;     /* Bad UTF-8 encoding. */
02882     }
02883 
02884     /* Read extra bytes. */
02885     DRFLAC_ASSERT(byteCount > 1);
02886 
02887     result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
02888     for (i = 1; i < byteCount; ++i) {
02889         if (!drflac__read_uint8(bs, 8, utf8 + i)) {
02890             *pNumberOut = 0;
02891             return DRFLAC_AT_END;
02892         }
02893         crc = drflac_crc8(crc, utf8[i], 8);
02894 
02895         result = (result << 6) | (utf8[i] & 0x3F);
02896     }
02897 
02898     *pNumberOut = result;
02899     *pCRCOut = crc;
02900     return DRFLAC_SUCCESS;
02901 }
02902 
02903 
02904 
02905 /*
02906 The next two functions are responsible for calculating the prediction.
02907 
02908 When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
02909 safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
02910 */
02911 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
02912 {
02913     drflac_int32 prediction = 0;
02914 
02915     DRFLAC_ASSERT(order <= 32);
02916 
02917     /* 32-bit version. */
02918 
02919     /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
02920     switch (order)
02921     {
02922     case 32: prediction += coefficients[31] * pDecodedSamples[-32];
02923     case 31: prediction += coefficients[30] * pDecodedSamples[-31];
02924     case 30: prediction += coefficients[29] * pDecodedSamples[-30];
02925     case 29: prediction += coefficients[28] * pDecodedSamples[-29];
02926     case 28: prediction += coefficients[27] * pDecodedSamples[-28];
02927     case 27: prediction += coefficients[26] * pDecodedSamples[-27];
02928     case 26: prediction += coefficients[25] * pDecodedSamples[-26];
02929     case 25: prediction += coefficients[24] * pDecodedSamples[-25];
02930     case 24: prediction += coefficients[23] * pDecodedSamples[-24];
02931     case 23: prediction += coefficients[22] * pDecodedSamples[-23];
02932     case 22: prediction += coefficients[21] * pDecodedSamples[-22];
02933     case 21: prediction += coefficients[20] * pDecodedSamples[-21];
02934     case 20: prediction += coefficients[19] * pDecodedSamples[-20];
02935     case 19: prediction += coefficients[18] * pDecodedSamples[-19];
02936     case 18: prediction += coefficients[17] * pDecodedSamples[-18];
02937     case 17: prediction += coefficients[16] * pDecodedSamples[-17];
02938     case 16: prediction += coefficients[15] * pDecodedSamples[-16];
02939     case 15: prediction += coefficients[14] * pDecodedSamples[-15];
02940     case 14: prediction += coefficients[13] * pDecodedSamples[-14];
02941     case 13: prediction += coefficients[12] * pDecodedSamples[-13];
02942     case 12: prediction += coefficients[11] * pDecodedSamples[-12];
02943     case 11: prediction += coefficients[10] * pDecodedSamples[-11];
02944     case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
02945     case  9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
02946     case  8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
02947     case  7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
02948     case  6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
02949     case  5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
02950     case  4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
02951     case  3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
02952     case  2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
02953     case  1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
02954     }
02955 
02956     return (drflac_int32)(prediction >> shift);
02957 }
02958 
02959 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
02960 {
02961     drflac_int64 prediction;
02962 
02963     DRFLAC_ASSERT(order <= 32);
02964 
02965     /* 64-bit version. */
02966 
02967     /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
02968 #ifndef DRFLAC_64BIT
02969     if (order == 8)
02970     {
02971         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
02972         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
02973         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
02974         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
02975         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
02976         prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
02977         prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
02978         prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
02979     }
02980     else if (order == 7)
02981     {
02982         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
02983         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
02984         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
02985         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
02986         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
02987         prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
02988         prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
02989     }
02990     else if (order == 3)
02991     {
02992         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
02993         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
02994         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
02995     }
02996     else if (order == 6)
02997     {
02998         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
02999         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
03000         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
03001         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
03002         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
03003         prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
03004     }
03005     else if (order == 5)
03006     {
03007         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
03008         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
03009         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
03010         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
03011         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
03012     }
03013     else if (order == 4)
03014     {
03015         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
03016         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
03017         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
03018         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
03019     }
03020     else if (order == 12)
03021     {
03022         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
03023         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
03024         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
03025         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
03026         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
03027         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
03028         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
03029         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
03030         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
03031         prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
03032         prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
03033         prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
03034     }
03035     else if (order == 2)
03036     {
03037         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
03038         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
03039     }
03040     else if (order == 1)
03041     {
03042         prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
03043     }
03044     else if (order == 10)
03045     {
03046         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
03047         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
03048         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
03049         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
03050         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
03051         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
03052         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
03053         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
03054         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
03055         prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
03056     }
03057     else if (order == 9)
03058     {
03059         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
03060         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
03061         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
03062         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
03063         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
03064         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
03065         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
03066         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
03067         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
03068     }
03069     else if (order == 11)
03070     {
03071         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
03072         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
03073         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
03074         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
03075         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
03076         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
03077         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
03078         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
03079         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
03080         prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
03081         prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
03082     }
03083     else
03084     {
03085         int j;
03086 
03087         prediction = 0;
03088         for (j = 0; j < (int)order; ++j) {
03089             prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
03090         }
03091     }
03092 #endif
03093 
03094     /*
03095     VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
03096     reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
03097     */
03098 #ifdef DRFLAC_64BIT
03099     prediction = 0;
03100     switch (order)
03101     {
03102     case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
03103     case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
03104     case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
03105     case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
03106     case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
03107     case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
03108     case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
03109     case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
03110     case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
03111     case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
03112     case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
03113     case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
03114     case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
03115     case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
03116     case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
03117     case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
03118     case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
03119     case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
03120     case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
03121     case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
03122     case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
03123     case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
03124     case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
03125     case  9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
03126     case  8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
03127     case  7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
03128     case  6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
03129     case  5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
03130     case  4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
03131     case  3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
03132     case  2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
03133     case  1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
03134     }
03135 #endif
03136 
03137     return (drflac_int32)(prediction >> shift);
03138 }
03139 
03140 
03141 #if 0
03142 /*
03143 Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
03144 sake of readability and should only be used as a reference.
03145 */
03146 static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
03147 {
03148     drflac_uint32 i;
03149 
03150     DRFLAC_ASSERT(bs != NULL);
03151     DRFLAC_ASSERT(count > 0);
03152     DRFLAC_ASSERT(pSamplesOut != NULL);
03153 
03154     for (i = 0; i < count; ++i) {
03155         drflac_uint32 zeroCounter = 0;
03156         for (;;) {
03157             drflac_uint8 bit;
03158             if (!drflac__read_uint8(bs, 1, &bit)) {
03159                 return DRFLAC_FALSE;
03160             }
03161 
03162             if (bit == 0) {
03163                 zeroCounter += 1;
03164             } else {
03165                 break;
03166             }
03167         }
03168 
03169         drflac_uint32 decodedRice;
03170         if (riceParam > 0) {
03171             if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
03172                 return DRFLAC_FALSE;
03173             }
03174         } else {
03175             decodedRice = 0;
03176         }
03177 
03178         decodedRice |= (zeroCounter << riceParam);
03179         if ((decodedRice & 0x01)) {
03180             decodedRice = ~(decodedRice >> 1);
03181         } else {
03182             decodedRice =  (decodedRice >> 1);
03183         }
03184 
03185 
03186         if (bitsPerSample+shift >= 32) {
03187             pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
03188         } else {
03189             pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
03190         }
03191     }
03192 
03193     return DRFLAC_TRUE;
03194 }
03195 #endif
03196 
03197 #if 0
03198 static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
03199 {
03200     drflac_uint32 zeroCounter = 0;
03201     drflac_uint32 decodedRice;
03202 
03203     for (;;) {
03204         drflac_uint8 bit;
03205         if (!drflac__read_uint8(bs, 1, &bit)) {
03206             return DRFLAC_FALSE;
03207         }
03208 
03209         if (bit == 0) {
03210             zeroCounter += 1;
03211         } else {
03212             break;
03213         }
03214     }
03215 
03216     if (riceParam > 0) {
03217         if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
03218             return DRFLAC_FALSE;
03219         }
03220     } else {
03221         decodedRice = 0;
03222     }
03223 
03224     *pZeroCounterOut = zeroCounter;
03225     *pRiceParamPartOut = decodedRice;
03226     return DRFLAC_TRUE;
03227 }
03228 #endif
03229 
03230 #if 0
03231 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
03232 {
03233     drflac_cache_t riceParamMask;
03234     drflac_uint32 zeroCounter;
03235     drflac_uint32 setBitOffsetPlus1;
03236     drflac_uint32 riceParamPart;
03237     drflac_uint32 riceLength;
03238 
03239     DRFLAC_ASSERT(riceParam > 0);   /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
03240 
03241     riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
03242 
03243     zeroCounter = 0;
03244     while (bs->cache == 0) {
03245         zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
03246         if (!drflac__reload_cache(bs)) {
03247             return DRFLAC_FALSE;
03248         }
03249     }
03250 
03251     setBitOffsetPlus1 = drflac__clz(bs->cache);
03252     zeroCounter += setBitOffsetPlus1;
03253     setBitOffsetPlus1 += 1;
03254 
03255     riceLength = setBitOffsetPlus1 + riceParam;
03256     if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
03257         riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
03258 
03259         bs->consumedBits += riceLength;
03260         bs->cache <<= riceLength;
03261     } else {
03262         drflac_uint32 bitCountLo;
03263         drflac_cache_t resultHi;
03264 
03265         bs->consumedBits += riceLength;
03266         bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1);    /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
03267 
03268         /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
03269         bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
03270         resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam);  /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
03271 
03272         if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
03273 #ifndef DR_FLAC_NO_CRC
03274             drflac__update_crc16(bs);
03275 #endif
03276             bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
03277             bs->consumedBits = 0;
03278 #ifndef DR_FLAC_NO_CRC
03279             bs->crc16Cache = bs->cache;
03280 #endif
03281         } else {
03282             /* Slow path. We need to fetch more data from the client. */
03283             if (!drflac__reload_cache(bs)) {
03284                 return DRFLAC_FALSE;
03285             }
03286         }
03287 
03288         riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
03289 
03290         bs->consumedBits += bitCountLo;
03291         bs->cache <<= bitCountLo;
03292     }
03293 
03294     pZeroCounterOut[0] = zeroCounter;
03295     pRiceParamPartOut[0] = riceParamPart;
03296 
03297     return DRFLAC_TRUE;
03298 }
03299 #endif
03300 
03301 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
03302 {
03303     drflac_uint32  riceParamPlus1 = riceParam + 1;
03304     /*drflac_cache_t riceParamPlus1Mask  = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
03305     drflac_uint32  riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
03306     drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
03307 
03308     /*
03309     The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
03310     no idea how this will work in practice...
03311     */
03312     drflac_cache_t bs_cache = bs->cache;
03313     drflac_uint32  bs_consumedBits = bs->consumedBits;
03314 
03315     /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
03316     drflac_uint32  lzcount = drflac__clz(bs_cache);
03317     if (lzcount < sizeof(bs_cache)*8) {
03318         pZeroCounterOut[0] = lzcount;
03319 
03320         /*
03321         It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
03322         this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
03323         outside of this function at a higher level.
03324         */
03325     extract_rice_param_part:
03326         bs_cache       <<= lzcount;
03327         bs_consumedBits += lzcount;
03328 
03329         if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
03330             /* Getting here means the rice parameter part is wholly contained within the current cache line. */
03331             pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
03332             bs_cache       <<= riceParamPlus1;
03333             bs_consumedBits += riceParamPlus1;
03334         } else {
03335             drflac_uint32 riceParamPartHi;
03336             drflac_uint32 riceParamPartLo;
03337             drflac_uint32 riceParamPartLoBitCount;
03338 
03339             /*
03340             Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
03341             line, reload the cache, and then combine it with the head of the next cache line.
03342             */
03343 
03344             /* Grab the high part of the rice parameter part. */
03345             riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
03346 
03347             /* Before reloading the cache we need to grab the size in bits of the low part. */
03348             riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
03349             DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
03350 
03351             /* Now reload the cache. */
03352             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
03353             #ifndef DR_FLAC_NO_CRC
03354                 drflac__update_crc16(bs);
03355             #endif
03356                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
03357                 bs_consumedBits = riceParamPartLoBitCount;
03358             #ifndef DR_FLAC_NO_CRC
03359                 bs->crc16Cache = bs_cache;
03360             #endif
03361             } else {
03362                 /* Slow path. We need to fetch more data from the client. */
03363                 if (!drflac__reload_cache(bs)) {
03364                     return DRFLAC_FALSE;
03365                 }
03366 
03367                 bs_cache = bs->cache;
03368                 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
03369             }
03370 
03371             /* We should now have enough information to construct the rice parameter part. */
03372             riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
03373             pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
03374 
03375             bs_cache <<= riceParamPartLoBitCount;
03376         }
03377     } else {
03378         /*
03379         Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
03380         to drflac__clz() and we need to reload the cache.
03381         */
03382         drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
03383         for (;;) {
03384             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
03385             #ifndef DR_FLAC_NO_CRC
03386                 drflac__update_crc16(bs);
03387             #endif
03388                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
03389                 bs_consumedBits = 0;
03390             #ifndef DR_FLAC_NO_CRC
03391                 bs->crc16Cache = bs_cache;
03392             #endif
03393             } else {
03394                 /* Slow path. We need to fetch more data from the client. */
03395                 if (!drflac__reload_cache(bs)) {
03396                     return DRFLAC_FALSE;
03397                 }
03398 
03399                 bs_cache = bs->cache;
03400                 bs_consumedBits = bs->consumedBits;
03401             }
03402 
03403             lzcount = drflac__clz(bs_cache);
03404             zeroCounter += lzcount;
03405 
03406             if (lzcount < sizeof(bs_cache)*8) {
03407                 break;
03408             }
03409         }
03410 
03411         pZeroCounterOut[0] = zeroCounter;
03412         goto extract_rice_param_part;
03413     }
03414 
03415     /* Make sure the cache is restored at the end of it all. */
03416     bs->cache = bs_cache;
03417     bs->consumedBits = bs_consumedBits;
03418 
03419     return DRFLAC_TRUE;
03420 }
03421 
03422 static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
03423 {
03424     drflac_uint32  riceParamPlus1 = riceParam + 1;
03425     drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
03426 
03427     /*
03428     The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
03429     no idea how this will work in practice...
03430     */
03431     drflac_cache_t bs_cache = bs->cache;
03432     drflac_uint32  bs_consumedBits = bs->consumedBits;
03433 
03434     /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
03435     drflac_uint32  lzcount = drflac__clz(bs_cache);
03436     if (lzcount < sizeof(bs_cache)*8) {
03437         /*
03438         It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
03439         this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
03440         outside of this function at a higher level.
03441         */
03442     extract_rice_param_part:
03443         bs_cache       <<= lzcount;
03444         bs_consumedBits += lzcount;
03445 
03446         if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
03447             /* Getting here means the rice parameter part is wholly contained within the current cache line. */
03448             bs_cache       <<= riceParamPlus1;
03449             bs_consumedBits += riceParamPlus1;
03450         } else {
03451             /*
03452             Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
03453             line, reload the cache, and then combine it with the head of the next cache line.
03454             */
03455 
03456             /* Before reloading the cache we need to grab the size in bits of the low part. */
03457             drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
03458             DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
03459 
03460             /* Now reload the cache. */
03461             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
03462             #ifndef DR_FLAC_NO_CRC
03463                 drflac__update_crc16(bs);
03464             #endif
03465                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
03466                 bs_consumedBits = riceParamPartLoBitCount;
03467             #ifndef DR_FLAC_NO_CRC
03468                 bs->crc16Cache = bs_cache;
03469             #endif
03470             } else {
03471                 /* Slow path. We need to fetch more data from the client. */
03472                 if (!drflac__reload_cache(bs)) {
03473                     return DRFLAC_FALSE;
03474                 }
03475 
03476                 bs_cache = bs->cache;
03477                 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
03478             }
03479 
03480             bs_cache <<= riceParamPartLoBitCount;
03481         }
03482     } else {
03483         /*
03484         Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
03485         to drflac__clz() and we need to reload the cache.
03486         */
03487         for (;;) {
03488             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
03489             #ifndef DR_FLAC_NO_CRC
03490                 drflac__update_crc16(bs);
03491             #endif
03492                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
03493                 bs_consumedBits = 0;
03494             #ifndef DR_FLAC_NO_CRC
03495                 bs->crc16Cache = bs_cache;
03496             #endif
03497             } else {
03498                 /* Slow path. We need to fetch more data from the client. */
03499                 if (!drflac__reload_cache(bs)) {
03500                     return DRFLAC_FALSE;
03501                 }
03502 
03503                 bs_cache = bs->cache;
03504                 bs_consumedBits = bs->consumedBits;
03505             }
03506 
03507             lzcount = drflac__clz(bs_cache);
03508             if (lzcount < sizeof(bs_cache)*8) {
03509                 break;
03510             }
03511         }
03512 
03513         goto extract_rice_param_part;
03514     }
03515 
03516     /* Make sure the cache is restored at the end of it all. */
03517     bs->cache = bs_cache;
03518     bs->consumedBits = bs_consumedBits;
03519 
03520     return DRFLAC_TRUE;
03521 }
03522 
03523 
03524 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
03525 {
03526     drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
03527     drflac_uint32 zeroCountPart0;
03528     drflac_uint32 riceParamPart0;
03529     drflac_uint32 riceParamMask;
03530     drflac_uint32 i;
03531 
03532     DRFLAC_ASSERT(bs != NULL);
03533     DRFLAC_ASSERT(count > 0);
03534     DRFLAC_ASSERT(pSamplesOut != NULL);
03535 
03536     (void)bitsPerSample;
03537     (void)order;
03538     (void)shift;
03539     (void)coefficients;
03540 
03541     riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
03542 
03543     i = 0;
03544     while (i < count) {
03545         /* Rice extraction. */
03546         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
03547             return DRFLAC_FALSE;
03548         }
03549 
03550         /* Rice reconstruction. */
03551         riceParamPart0 &= riceParamMask;
03552         riceParamPart0 |= (zeroCountPart0 << riceParam);
03553         riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
03554 
03555         pSamplesOut[i] = riceParamPart0;
03556 
03557         i += 1;
03558     }
03559 
03560     return DRFLAC_TRUE;
03561 }
03562 
03563 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
03564 {
03565     drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
03566     drflac_uint32 zeroCountPart0 = 0;
03567     drflac_uint32 zeroCountPart1 = 0;
03568     drflac_uint32 zeroCountPart2 = 0;
03569     drflac_uint32 zeroCountPart3 = 0;
03570     drflac_uint32 riceParamPart0 = 0;
03571     drflac_uint32 riceParamPart1 = 0;
03572     drflac_uint32 riceParamPart2 = 0;
03573     drflac_uint32 riceParamPart3 = 0;
03574     drflac_uint32 riceParamMask;
03575     const drflac_int32* pSamplesOutEnd;
03576     drflac_uint32 i;
03577 
03578     DRFLAC_ASSERT(bs != NULL);
03579     DRFLAC_ASSERT(count > 0);
03580     DRFLAC_ASSERT(pSamplesOut != NULL);
03581 
03582     if (order == 0) {
03583         return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
03584     }
03585 
03586     riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
03587     pSamplesOutEnd = pSamplesOut + (count & ~3);
03588 
03589     if (bitsPerSample+shift > 32) {
03590         while (pSamplesOut < pSamplesOutEnd) {
03591             /*
03592             Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
03593             against an array. Not sure why, but perhaps it's making more efficient use of registers?
03594             */
03595             if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
03596                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
03597                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
03598                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
03599                 return DRFLAC_FALSE;
03600             }
03601 
03602             riceParamPart0 &= riceParamMask;
03603             riceParamPart1 &= riceParamMask;
03604             riceParamPart2 &= riceParamMask;
03605             riceParamPart3 &= riceParamMask;
03606 
03607             riceParamPart0 |= (zeroCountPart0 << riceParam);
03608             riceParamPart1 |= (zeroCountPart1 << riceParam);
03609             riceParamPart2 |= (zeroCountPart2 << riceParam);
03610             riceParamPart3 |= (zeroCountPart3 << riceParam);
03611 
03612             riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
03613             riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
03614             riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
03615             riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
03616 
03617             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
03618             pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 1);
03619             pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 2);
03620             pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 3);
03621 
03622             pSamplesOut += 4;
03623         }
03624     } else {
03625         while (pSamplesOut < pSamplesOutEnd) {
03626             if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
03627                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
03628                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
03629                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
03630                 return DRFLAC_FALSE;
03631             }
03632 
03633             riceParamPart0 &= riceParamMask;
03634             riceParamPart1 &= riceParamMask;
03635             riceParamPart2 &= riceParamMask;
03636             riceParamPart3 &= riceParamMask;
03637 
03638             riceParamPart0 |= (zeroCountPart0 << riceParam);
03639             riceParamPart1 |= (zeroCountPart1 << riceParam);
03640             riceParamPart2 |= (zeroCountPart2 << riceParam);
03641             riceParamPart3 |= (zeroCountPart3 << riceParam);
03642 
03643             riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
03644             riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
03645             riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
03646             riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
03647 
03648             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
03649             pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 1);
03650             pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 2);
03651             pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 3);
03652 
03653             pSamplesOut += 4;
03654         }
03655     }
03656 
03657     i = (count & ~3);
03658     while (i < count) {
03659         /* Rice extraction. */
03660         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
03661             return DRFLAC_FALSE;
03662         }
03663 
03664         /* Rice reconstruction. */
03665         riceParamPart0 &= riceParamMask;
03666         riceParamPart0 |= (zeroCountPart0 << riceParam);
03667         riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
03668         /*riceParamPart0  = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
03669 
03670         /* Sample reconstruction. */
03671         if (bitsPerSample+shift > 32) {
03672             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
03673         } else {
03674             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
03675         }
03676 
03677         i += 1;
03678         pSamplesOut += 1;
03679     }
03680 
03681     return DRFLAC_TRUE;
03682 }
03683 
03684 #if defined(DRFLAC_SUPPORT_SSE2)
03685 static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
03686 {
03687     __m128i r;
03688 
03689     /* Pack. */
03690     r = _mm_packs_epi32(a, b);
03691 
03692     /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
03693     r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
03694 
03695     /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
03696     r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
03697     r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
03698 
03699     return r;
03700 }
03701 #endif
03702 
03703 #if defined(DRFLAC_SUPPORT_SSE41)
03704 static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
03705 {
03706     return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
03707 }
03708 
03709 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
03710 {
03711     __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
03712     __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
03713     return _mm_add_epi32(x64, x32);
03714 }
03715 
03716 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
03717 {
03718     return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
03719 }
03720 
03721 static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
03722 {
03723     /*
03724     To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
03725     is shifted with zero bits, whereas the right side is shifted with sign bits.
03726     */
03727     __m128i lo = _mm_srli_epi64(x, count);
03728     __m128i hi = _mm_srai_epi32(x, count);
03729 
03730     hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0));    /* The high part needs to have the low part cleared. */
03731 
03732     return _mm_or_si128(lo, hi);
03733 }
03734 
03735 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
03736 {
03737     int i;
03738     drflac_uint32 riceParamMask;
03739     drflac_int32* pDecodedSamples    = pSamplesOut;
03740     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
03741     drflac_uint32 zeroCountParts0 = 0;
03742     drflac_uint32 zeroCountParts1 = 0;
03743     drflac_uint32 zeroCountParts2 = 0;
03744     drflac_uint32 zeroCountParts3 = 0;
03745     drflac_uint32 riceParamParts0 = 0;
03746     drflac_uint32 riceParamParts1 = 0;
03747     drflac_uint32 riceParamParts2 = 0;
03748     drflac_uint32 riceParamParts3 = 0;
03749     __m128i coefficients128_0;
03750     __m128i coefficients128_4;
03751     __m128i coefficients128_8;
03752     __m128i samples128_0;
03753     __m128i samples128_4;
03754     __m128i samples128_8;
03755     __m128i riceParamMask128;
03756 
03757     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
03758 
03759     riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
03760     riceParamMask128 = _mm_set1_epi32(riceParamMask);
03761 
03762     /* Pre-load. */
03763     coefficients128_0 = _mm_setzero_si128();
03764     coefficients128_4 = _mm_setzero_si128();
03765     coefficients128_8 = _mm_setzero_si128();
03766 
03767     samples128_0 = _mm_setzero_si128();
03768     samples128_4 = _mm_setzero_si128();
03769     samples128_8 = _mm_setzero_si128();
03770 
03771     /*
03772     Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
03773     what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
03774     in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
03775     so I think there's opportunity for this to be simplified.
03776     */
03777 #if 1
03778     {
03779         int runningOrder = order;
03780 
03781         /* 0 - 3. */
03782         if (runningOrder >= 4) {
03783             coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
03784             samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
03785             runningOrder -= 4;
03786         } else {
03787             switch (runningOrder) {
03788                 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
03789                 case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
03790                 case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
03791             }
03792             runningOrder = 0;
03793         }
03794 
03795         /* 4 - 7 */
03796         if (runningOrder >= 4) {
03797             coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
03798             samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
03799             runningOrder -= 4;
03800         } else {
03801             switch (runningOrder) {
03802                 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
03803                 case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
03804                 case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
03805             }
03806             runningOrder = 0;
03807         }
03808 
03809         /* 8 - 11 */
03810         if (runningOrder == 4) {
03811             coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
03812             samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
03813             runningOrder -= 4;
03814         } else {
03815             switch (runningOrder) {
03816                 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
03817                 case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
03818                 case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
03819             }
03820             runningOrder = 0;
03821         }
03822 
03823         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
03824         coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
03825         coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
03826         coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
03827     }
03828 #else
03829     /* This causes strict-aliasing warnings with GCC. */
03830     switch (order)
03831     {
03832     case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
03833     case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
03834     case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
03835     case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
03836     case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
03837     case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
03838     case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
03839     case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
03840     case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
03841     case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
03842     case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
03843     case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
03844     }
03845 #endif
03846 
03847     /* For this version we are doing one sample at a time. */
03848     while (pDecodedSamples < pDecodedSamplesEnd) {
03849         __m128i prediction128;
03850         __m128i zeroCountPart128;
03851         __m128i riceParamPart128;
03852 
03853         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
03854             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
03855             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
03856             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
03857             return DRFLAC_FALSE;
03858         }
03859 
03860         zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
03861         riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
03862 
03863         riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
03864         riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
03865         riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01)));  /* <-- SSE2 compatible */
03866         /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/   /* <-- Only supported from SSE4.1 and is slower in my testing... */
03867 
03868         if (order <= 4) {
03869             for (i = 0; i < 4; i += 1) {
03870                 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
03871 
03872                 /* Horizontal add and shift. */
03873                 prediction128 = drflac__mm_hadd_epi32(prediction128);
03874                 prediction128 = _mm_srai_epi32(prediction128, shift);
03875                 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
03876 
03877                 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
03878                 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
03879             }
03880         } else if (order <= 8) {
03881             for (i = 0; i < 4; i += 1) {
03882                 prediction128 =                              _mm_mullo_epi32(coefficients128_4, samples128_4);
03883                 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
03884 
03885                 /* Horizontal add and shift. */
03886                 prediction128 = drflac__mm_hadd_epi32(prediction128);
03887                 prediction128 = _mm_srai_epi32(prediction128, shift);
03888                 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
03889 
03890                 samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
03891                 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
03892                 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
03893             }
03894         } else {
03895             for (i = 0; i < 4; i += 1) {
03896                 prediction128 =                              _mm_mullo_epi32(coefficients128_8, samples128_8);
03897                 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
03898                 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
03899 
03900                 /* Horizontal add and shift. */
03901                 prediction128 = drflac__mm_hadd_epi32(prediction128);
03902                 prediction128 = _mm_srai_epi32(prediction128, shift);
03903                 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
03904 
03905                 samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
03906                 samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
03907                 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
03908                 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
03909             }
03910         }
03911 
03912         /* We store samples in groups of 4. */
03913         _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
03914         pDecodedSamples += 4;
03915     }
03916 
03917     /* Make sure we process the last few samples. */
03918     i = (count & ~3);
03919     while (i < (int)count) {
03920         /* Rice extraction. */
03921         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
03922             return DRFLAC_FALSE;
03923         }
03924 
03925         /* Rice reconstruction. */
03926         riceParamParts0 &= riceParamMask;
03927         riceParamParts0 |= (zeroCountParts0 << riceParam);
03928         riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
03929 
03930         /* Sample reconstruction. */
03931         pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
03932 
03933         i += 1;
03934         pDecodedSamples += 1;
03935     }
03936 
03937     return DRFLAC_TRUE;
03938 }
03939 
03940 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
03941 {
03942     int i;
03943     drflac_uint32 riceParamMask;
03944     drflac_int32* pDecodedSamples    = pSamplesOut;
03945     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
03946     drflac_uint32 zeroCountParts0 = 0;
03947     drflac_uint32 zeroCountParts1 = 0;
03948     drflac_uint32 zeroCountParts2 = 0;
03949     drflac_uint32 zeroCountParts3 = 0;
03950     drflac_uint32 riceParamParts0 = 0;
03951     drflac_uint32 riceParamParts1 = 0;
03952     drflac_uint32 riceParamParts2 = 0;
03953     drflac_uint32 riceParamParts3 = 0;
03954     __m128i coefficients128_0;
03955     __m128i coefficients128_4;
03956     __m128i coefficients128_8;
03957     __m128i samples128_0;
03958     __m128i samples128_4;
03959     __m128i samples128_8;
03960     __m128i prediction128;
03961     __m128i riceParamMask128;
03962 
03963     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
03964 
03965     DRFLAC_ASSERT(order <= 12);
03966 
03967     riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
03968     riceParamMask128 = _mm_set1_epi32(riceParamMask);
03969 
03970     prediction128 = _mm_setzero_si128();
03971 
03972     /* Pre-load. */
03973     coefficients128_0  = _mm_setzero_si128();
03974     coefficients128_4  = _mm_setzero_si128();
03975     coefficients128_8  = _mm_setzero_si128();
03976 
03977     samples128_0  = _mm_setzero_si128();
03978     samples128_4  = _mm_setzero_si128();
03979     samples128_8  = _mm_setzero_si128();
03980 
03981 #if 1
03982     {
03983         int runningOrder = order;
03984 
03985         /* 0 - 3. */
03986         if (runningOrder >= 4) {
03987             coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
03988             samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
03989             runningOrder -= 4;
03990         } else {
03991             switch (runningOrder) {
03992                 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
03993                 case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
03994                 case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
03995             }
03996             runningOrder = 0;
03997         }
03998 
03999         /* 4 - 7 */
04000         if (runningOrder >= 4) {
04001             coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
04002             samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
04003             runningOrder -= 4;
04004         } else {
04005             switch (runningOrder) {
04006                 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
04007                 case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
04008                 case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
04009             }
04010             runningOrder = 0;
04011         }
04012 
04013         /* 8 - 11 */
04014         if (runningOrder == 4) {
04015             coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
04016             samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
04017             runningOrder -= 4;
04018         } else {
04019             switch (runningOrder) {
04020                 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
04021                 case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
04022                 case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
04023             }
04024             runningOrder = 0;
04025         }
04026 
04027         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
04028         coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
04029         coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
04030         coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
04031     }
04032 #else
04033     switch (order)
04034     {
04035     case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
04036     case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
04037     case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
04038     case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
04039     case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
04040     case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
04041     case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
04042     case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
04043     case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
04044     case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
04045     case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
04046     case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
04047     }
04048 #endif
04049 
04050     /* For this version we are doing one sample at a time. */
04051     while (pDecodedSamples < pDecodedSamplesEnd) {
04052         __m128i zeroCountPart128;
04053         __m128i riceParamPart128;
04054 
04055         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
04056             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
04057             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
04058             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
04059             return DRFLAC_FALSE;
04060         }
04061 
04062         zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
04063         riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
04064 
04065         riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
04066         riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
04067         riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
04068 
04069         for (i = 0; i < 4; i += 1) {
04070             prediction128 = _mm_xor_si128(prediction128, prediction128);    /* Reset to 0. */
04071 
04072             switch (order)
04073             {
04074             case 12:
04075             case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
04076             case 10:
04077             case  9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
04078             case  8:
04079             case  7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
04080             case  6:
04081             case  5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
04082             case  4:
04083             case  3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
04084             case  2:
04085             case  1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
04086             }
04087 
04088             /* Horizontal add and shift. */
04089             prediction128 = drflac__mm_hadd_epi64(prediction128);
04090             prediction128 = drflac__mm_srai_epi64(prediction128, shift);
04091             prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
04092 
04093             /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
04094             samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
04095             samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
04096             samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
04097 
04098             /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
04099             riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
04100         }
04101 
04102         /* We store samples in groups of 4. */
04103         _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
04104         pDecodedSamples += 4;
04105     }
04106 
04107     /* Make sure we process the last few samples. */
04108     i = (count & ~3);
04109     while (i < (int)count) {
04110         /* Rice extraction. */
04111         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
04112             return DRFLAC_FALSE;
04113         }
04114 
04115         /* Rice reconstruction. */
04116         riceParamParts0 &= riceParamMask;
04117         riceParamParts0 |= (zeroCountParts0 << riceParam);
04118         riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
04119 
04120         /* Sample reconstruction. */
04121         pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
04122 
04123         i += 1;
04124         pDecodedSamples += 1;
04125     }
04126 
04127     return DRFLAC_TRUE;
04128 }
04129 
04130 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
04131 {
04132     DRFLAC_ASSERT(bs != NULL);
04133     DRFLAC_ASSERT(count > 0);
04134     DRFLAC_ASSERT(pSamplesOut != NULL);
04135 
04136     /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
04137     if (order > 0 && order <= 12) {
04138         if (bitsPerSample+shift > 32) {
04139             return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
04140         } else {
04141             return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
04142         }
04143     } else {
04144         return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
04145     }
04146 }
04147 #endif
04148 
04149 #if defined(DRFLAC_SUPPORT_NEON)
04150 static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
04151 {
04152     vst1q_s32(p+0, x.val[0]);
04153     vst1q_s32(p+4, x.val[1]);
04154 }
04155 
04156 static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
04157 {
04158     vst1q_u32(p+0, x.val[0]);
04159     vst1q_u32(p+4, x.val[1]);
04160 }
04161 
04162 static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
04163 {
04164     vst1q_f32(p+0, x.val[0]);
04165     vst1q_f32(p+4, x.val[1]);
04166 }
04167 
04168 static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
04169 {
04170     vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
04171 }
04172 
04173 static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
04174 {
04175     vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
04176 }
04177 
04178 static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
04179 {
04180     drflac_int32 x[4];
04181     x[3] = x3;
04182     x[2] = x2;
04183     x[1] = x1;
04184     x[0] = x0;
04185     return vld1q_s32(x);
04186 }
04187 
04188 static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
04189 {
04190     /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
04191 
04192     /* Reference */
04193     /*return drflac__vdupq_n_s32x4(
04194         vgetq_lane_s32(a, 0),
04195         vgetq_lane_s32(b, 3),
04196         vgetq_lane_s32(b, 2),
04197         vgetq_lane_s32(b, 1)
04198     );*/
04199 
04200     return vextq_s32(b, a, 1);
04201 }
04202 
04203 static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
04204 {
04205     /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
04206 
04207     /* Reference */
04208     /*return drflac__vdupq_n_s32x4(
04209         vgetq_lane_s32(a, 0),
04210         vgetq_lane_s32(b, 3),
04211         vgetq_lane_s32(b, 2),
04212         vgetq_lane_s32(b, 1)
04213     );*/
04214 
04215     return vextq_u32(b, a, 1);
04216 }
04217 
04218 static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
04219 {
04220     /* The sum must end up in position 0. */
04221 
04222     /* Reference */
04223     /*return vdupq_n_s32(
04224         vgetq_lane_s32(x, 3) +
04225         vgetq_lane_s32(x, 2) +
04226         vgetq_lane_s32(x, 1) +
04227         vgetq_lane_s32(x, 0)
04228     );*/
04229 
04230     int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
04231     return vpadd_s32(r, r);
04232 }
04233 
04234 static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
04235 {
04236     return vadd_s64(vget_high_s64(x), vget_low_s64(x));
04237 }
04238 
04239 static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
04240 {
04241     /* Reference */
04242     /*return drflac__vdupq_n_s32x4(
04243         vgetq_lane_s32(x, 0),
04244         vgetq_lane_s32(x, 1),
04245         vgetq_lane_s32(x, 2),
04246         vgetq_lane_s32(x, 3)
04247     );*/
04248 
04249     return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
04250 }
04251 
04252 static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
04253 {
04254     return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
04255 }
04256 
04257 static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
04258 {
04259     return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
04260 }
04261 
04262 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
04263 {
04264     int i;
04265     drflac_uint32 riceParamMask;
04266     drflac_int32* pDecodedSamples    = pSamplesOut;
04267     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
04268     drflac_uint32 zeroCountParts[4];
04269     drflac_uint32 riceParamParts[4];
04270     int32x4_t coefficients128_0;
04271     int32x4_t coefficients128_4;
04272     int32x4_t coefficients128_8;
04273     int32x4_t samples128_0;
04274     int32x4_t samples128_4;
04275     int32x4_t samples128_8;
04276     uint32x4_t riceParamMask128;
04277     int32x4_t riceParam128;
04278     int32x2_t shift64;
04279     uint32x4_t one128;
04280 
04281     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
04282 
04283     riceParamMask    = ~((~0UL) << riceParam);
04284     riceParamMask128 = vdupq_n_u32(riceParamMask);
04285 
04286     riceParam128 = vdupq_n_s32(riceParam);
04287     shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
04288     one128 = vdupq_n_u32(1);
04289 
04290     /*
04291     Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
04292     what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
04293     in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
04294     so I think there's opportunity for this to be simplified.
04295     */
04296     {
04297         int runningOrder = order;
04298         drflac_int32 tempC[4] = {0, 0, 0, 0};
04299         drflac_int32 tempS[4] = {0, 0, 0, 0};
04300 
04301         /* 0 - 3. */
04302         if (runningOrder >= 4) {
04303             coefficients128_0 = vld1q_s32(coefficients + 0);
04304             samples128_0      = vld1q_s32(pSamplesOut  - 4);
04305             runningOrder -= 4;
04306         } else {
04307             switch (runningOrder) {
04308                 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
04309                 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
04310                 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
04311             }
04312 
04313             coefficients128_0 = vld1q_s32(tempC);
04314             samples128_0      = vld1q_s32(tempS);
04315             runningOrder = 0;
04316         }
04317 
04318         /* 4 - 7 */
04319         if (runningOrder >= 4) {
04320             coefficients128_4 = vld1q_s32(coefficients + 4);
04321             samples128_4      = vld1q_s32(pSamplesOut  - 8);
04322             runningOrder -= 4;
04323         } else {
04324             switch (runningOrder) {
04325                 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
04326                 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
04327                 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
04328             }
04329 
04330             coefficients128_4 = vld1q_s32(tempC);
04331             samples128_4      = vld1q_s32(tempS);
04332             runningOrder = 0;
04333         }
04334 
04335         /* 8 - 11 */
04336         if (runningOrder == 4) {
04337             coefficients128_8 = vld1q_s32(coefficients + 8);
04338             samples128_8      = vld1q_s32(pSamplesOut  - 12);
04339             runningOrder -= 4;
04340         } else {
04341             switch (runningOrder) {
04342                 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
04343                 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
04344                 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
04345             }
04346 
04347             coefficients128_8 = vld1q_s32(tempC);
04348             samples128_8      = vld1q_s32(tempS);
04349             runningOrder = 0;
04350         }
04351 
04352         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
04353         coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
04354         coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
04355         coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
04356     }
04357 
04358     /* For this version we are doing one sample at a time. */
04359     while (pDecodedSamples < pDecodedSamplesEnd) {
04360         int32x4_t prediction128;
04361         int32x2_t prediction64;
04362         uint32x4_t zeroCountPart128;
04363         uint32x4_t riceParamPart128;
04364 
04365         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
04366             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
04367             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
04368             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
04369             return DRFLAC_FALSE;
04370         }
04371 
04372         zeroCountPart128 = vld1q_u32(zeroCountParts);
04373         riceParamPart128 = vld1q_u32(riceParamParts);
04374 
04375         riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
04376         riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
04377         riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
04378 
04379         if (order <= 4) {
04380             for (i = 0; i < 4; i += 1) {
04381                 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
04382 
04383                 /* Horizontal add and shift. */
04384                 prediction64 = drflac__vhaddq_s32(prediction128);
04385                 prediction64 = vshl_s32(prediction64, shift64);
04386                 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
04387 
04388                 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
04389                 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
04390             }
04391         } else if (order <= 8) {
04392             for (i = 0; i < 4; i += 1) {
04393                 prediction128 =                vmulq_s32(coefficients128_4, samples128_4);
04394                 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
04395 
04396                 /* Horizontal add and shift. */
04397                 prediction64 = drflac__vhaddq_s32(prediction128);
04398                 prediction64 = vshl_s32(prediction64, shift64);
04399                 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
04400 
04401                 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
04402                 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
04403                 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
04404             }
04405         } else {
04406             for (i = 0; i < 4; i += 1) {
04407                 prediction128 =                vmulq_s32(coefficients128_8, samples128_8);
04408                 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
04409                 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
04410 
04411                 /* Horizontal add and shift. */
04412                 prediction64 = drflac__vhaddq_s32(prediction128);
04413                 prediction64 = vshl_s32(prediction64, shift64);
04414                 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
04415 
04416                 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
04417                 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
04418                 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
04419                 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
04420             }
04421         }
04422 
04423         /* We store samples in groups of 4. */
04424         vst1q_s32(pDecodedSamples, samples128_0);
04425         pDecodedSamples += 4;
04426     }
04427 
04428     /* Make sure we process the last few samples. */
04429     i = (count & ~3);
04430     while (i < (int)count) {
04431         /* Rice extraction. */
04432         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
04433             return DRFLAC_FALSE;
04434         }
04435 
04436         /* Rice reconstruction. */
04437         riceParamParts[0] &= riceParamMask;
04438         riceParamParts[0] |= (zeroCountParts[0] << riceParam);
04439         riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
04440 
04441         /* Sample reconstruction. */
04442         pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
04443 
04444         i += 1;
04445         pDecodedSamples += 1;
04446     }
04447 
04448     return DRFLAC_TRUE;
04449 }
04450 
04451 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
04452 {
04453     int i;
04454     drflac_uint32 riceParamMask;
04455     drflac_int32* pDecodedSamples    = pSamplesOut;
04456     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
04457     drflac_uint32 zeroCountParts[4];
04458     drflac_uint32 riceParamParts[4];
04459     int32x4_t coefficients128_0;
04460     int32x4_t coefficients128_4;
04461     int32x4_t coefficients128_8;
04462     int32x4_t samples128_0;
04463     int32x4_t samples128_4;
04464     int32x4_t samples128_8;
04465     uint32x4_t riceParamMask128;
04466     int32x4_t riceParam128;
04467     int64x1_t shift64;
04468     uint32x4_t one128;
04469 
04470     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
04471 
04472     riceParamMask    = ~((~0UL) << riceParam);
04473     riceParamMask128 = vdupq_n_u32(riceParamMask);
04474 
04475     riceParam128 = vdupq_n_s32(riceParam);
04476     shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
04477     one128 = vdupq_n_u32(1);
04478 
04479     /*
04480     Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
04481     what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
04482     in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
04483     so I think there's opportunity for this to be simplified.
04484     */
04485     {
04486         int runningOrder = order;
04487         drflac_int32 tempC[4] = {0, 0, 0, 0};
04488         drflac_int32 tempS[4] = {0, 0, 0, 0};
04489 
04490         /* 0 - 3. */
04491         if (runningOrder >= 4) {
04492             coefficients128_0 = vld1q_s32(coefficients + 0);
04493             samples128_0      = vld1q_s32(pSamplesOut  - 4);
04494             runningOrder -= 4;
04495         } else {
04496             switch (runningOrder) {
04497                 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
04498                 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
04499                 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
04500             }
04501 
04502             coefficients128_0 = vld1q_s32(tempC);
04503             samples128_0      = vld1q_s32(tempS);
04504             runningOrder = 0;
04505         }
04506 
04507         /* 4 - 7 */
04508         if (runningOrder >= 4) {
04509             coefficients128_4 = vld1q_s32(coefficients + 4);
04510             samples128_4      = vld1q_s32(pSamplesOut  - 8);
04511             runningOrder -= 4;
04512         } else {
04513             switch (runningOrder) {
04514                 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
04515                 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
04516                 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
04517             }
04518 
04519             coefficients128_4 = vld1q_s32(tempC);
04520             samples128_4      = vld1q_s32(tempS);
04521             runningOrder = 0;
04522         }
04523 
04524         /* 8 - 11 */
04525         if (runningOrder == 4) {
04526             coefficients128_8 = vld1q_s32(coefficients + 8);
04527             samples128_8      = vld1q_s32(pSamplesOut  - 12);
04528             runningOrder -= 4;
04529         } else {
04530             switch (runningOrder) {
04531                 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
04532                 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
04533                 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
04534             }
04535 
04536             coefficients128_8 = vld1q_s32(tempC);
04537             samples128_8      = vld1q_s32(tempS);
04538             runningOrder = 0;
04539         }
04540 
04541         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
04542         coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
04543         coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
04544         coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
04545     }
04546 
04547     /* For this version we are doing one sample at a time. */
04548     while (pDecodedSamples < pDecodedSamplesEnd) {
04549         int64x2_t prediction128;
04550         uint32x4_t zeroCountPart128;
04551         uint32x4_t riceParamPart128;
04552 
04553         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
04554             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
04555             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
04556             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
04557             return DRFLAC_FALSE;
04558         }
04559 
04560         zeroCountPart128 = vld1q_u32(zeroCountParts);
04561         riceParamPart128 = vld1q_u32(riceParamParts);
04562 
04563         riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
04564         riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
04565         riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
04566 
04567         for (i = 0; i < 4; i += 1) {
04568             int64x1_t prediction64;
04569 
04570             prediction128 = veorq_s64(prediction128, prediction128);    /* Reset to 0. */
04571             switch (order)
04572             {
04573             case 12:
04574             case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
04575             case 10:
04576             case  9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
04577             case  8:
04578             case  7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
04579             case  6:
04580             case  5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
04581             case  4:
04582             case  3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
04583             case  2:
04584             case  1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
04585             }
04586 
04587             /* Horizontal add and shift. */
04588             prediction64 = drflac__vhaddq_s64(prediction128);
04589             prediction64 = vshl_s64(prediction64, shift64);
04590             prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
04591 
04592             /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
04593             samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
04594             samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
04595             samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
04596 
04597             /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
04598             riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
04599         }
04600 
04601         /* We store samples in groups of 4. */
04602         vst1q_s32(pDecodedSamples, samples128_0);
04603         pDecodedSamples += 4;
04604     }
04605 
04606     /* Make sure we process the last few samples. */
04607     i = (count & ~3);
04608     while (i < (int)count) {
04609         /* Rice extraction. */
04610         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
04611             return DRFLAC_FALSE;
04612         }
04613 
04614         /* Rice reconstruction. */
04615         riceParamParts[0] &= riceParamMask;
04616         riceParamParts[0] |= (zeroCountParts[0] << riceParam);
04617         riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
04618 
04619         /* Sample reconstruction. */
04620         pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
04621 
04622         i += 1;
04623         pDecodedSamples += 1;
04624     }
04625 
04626     return DRFLAC_TRUE;
04627 }
04628 
04629 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
04630 {
04631     DRFLAC_ASSERT(bs != NULL);
04632     DRFLAC_ASSERT(count > 0);
04633     DRFLAC_ASSERT(pSamplesOut != NULL);
04634 
04635     /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
04636     if (order > 0 && order <= 12) {
04637         if (bitsPerSample+shift > 32) {
04638             return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
04639         } else {
04640             return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
04641         }
04642     } else {
04643         return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
04644     }
04645 }
04646 #endif
04647 
04648 static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
04649 {
04650 #if defined(DRFLAC_SUPPORT_SSE41)
04651     if (drflac__gIsSSE41Supported) {
04652         return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
04653     } else
04654 #elif defined(DRFLAC_SUPPORT_NEON)
04655     if (drflac__gIsNEONSupported) {
04656         return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
04657     } else
04658 #endif
04659     {
04660         /* Scalar fallback. */
04661     #if 0
04662         return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
04663     #else
04664         return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
04665     #endif
04666     }
04667 }
04668 
04669 /* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
04670 static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
04671 {
04672     drflac_uint32 i;
04673 
04674     DRFLAC_ASSERT(bs != NULL);
04675     DRFLAC_ASSERT(count > 0);
04676 
04677     for (i = 0; i < count; ++i) {
04678         if (!drflac__seek_rice_parts(bs, riceParam)) {
04679             return DRFLAC_FALSE;
04680         }
04681     }
04682 
04683     return DRFLAC_TRUE;
04684 }
04685 
04686 static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
04687 {
04688     drflac_uint32 i;
04689 
04690     DRFLAC_ASSERT(bs != NULL);
04691     DRFLAC_ASSERT(count > 0);
04692     DRFLAC_ASSERT(unencodedBitsPerSample <= 31);    /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
04693     DRFLAC_ASSERT(pSamplesOut != NULL);
04694 
04695     for (i = 0; i < count; ++i) {
04696         if (unencodedBitsPerSample > 0) {
04697             if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
04698                 return DRFLAC_FALSE;
04699             }
04700         } else {
04701             pSamplesOut[i] = 0;
04702         }
04703 
04704         if (bitsPerSample >= 24) {
04705             pSamplesOut[i] += drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
04706         } else {
04707             pSamplesOut[i] += drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
04708         }
04709     }
04710 
04711     return DRFLAC_TRUE;
04712 }
04713 
04714 
04715 /*
04716 Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
04717 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
04718 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
04719 */
04720 static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
04721 {
04722     drflac_uint8 residualMethod;
04723     drflac_uint8 partitionOrder;
04724     drflac_uint32 samplesInPartition;
04725     drflac_uint32 partitionsRemaining;
04726 
04727     DRFLAC_ASSERT(bs != NULL);
04728     DRFLAC_ASSERT(blockSize != 0);
04729     DRFLAC_ASSERT(pDecodedSamples != NULL);       /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
04730 
04731     if (!drflac__read_uint8(bs, 2, &residualMethod)) {
04732         return DRFLAC_FALSE;
04733     }
04734 
04735     if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
04736         return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
04737     }
04738 
04739     /* Ignore the first <order> values. */
04740     pDecodedSamples += order;
04741 
04742     if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
04743         return DRFLAC_FALSE;
04744     }
04745 
04746     /*
04747     From the FLAC spec:
04748       The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
04749     */
04750     if (partitionOrder > 8) {
04751         return DRFLAC_FALSE;
04752     }
04753 
04754     /* Validation check. */
04755     if ((blockSize / (1 << partitionOrder)) <= order) {
04756         return DRFLAC_FALSE;
04757     }
04758 
04759     samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
04760     partitionsRemaining = (1 << partitionOrder);
04761     for (;;) {
04762         drflac_uint8 riceParam = 0;
04763         if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
04764             if (!drflac__read_uint8(bs, 4, &riceParam)) {
04765                 return DRFLAC_FALSE;
04766             }
04767             if (riceParam == 15) {
04768                 riceParam = 0xFF;
04769             }
04770         } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
04771             if (!drflac__read_uint8(bs, 5, &riceParam)) {
04772                 return DRFLAC_FALSE;
04773             }
04774             if (riceParam == 31) {
04775                 riceParam = 0xFF;
04776             }
04777         }
04778 
04779         if (riceParam != 0xFF) {
04780             if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, order, shift, coefficients, pDecodedSamples)) {
04781                 return DRFLAC_FALSE;
04782             }
04783         } else {
04784             drflac_uint8 unencodedBitsPerSample = 0;
04785             if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
04786                 return DRFLAC_FALSE;
04787             }
04788 
04789             if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, order, shift, coefficients, pDecodedSamples)) {
04790                 return DRFLAC_FALSE;
04791             }
04792         }
04793 
04794         pDecodedSamples += samplesInPartition;
04795 
04796         if (partitionsRemaining == 1) {
04797             break;
04798         }
04799 
04800         partitionsRemaining -= 1;
04801 
04802         if (partitionOrder != 0) {
04803             samplesInPartition = blockSize / (1 << partitionOrder);
04804         }
04805     }
04806 
04807     return DRFLAC_TRUE;
04808 }
04809 
04810 /*
04811 Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
04812 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
04813 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
04814 */
04815 static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
04816 {
04817     drflac_uint8 residualMethod;
04818     drflac_uint8 partitionOrder;
04819     drflac_uint32 samplesInPartition;
04820     drflac_uint32 partitionsRemaining;
04821 
04822     DRFLAC_ASSERT(bs != NULL);
04823     DRFLAC_ASSERT(blockSize != 0);
04824 
04825     if (!drflac__read_uint8(bs, 2, &residualMethod)) {
04826         return DRFLAC_FALSE;
04827     }
04828 
04829     if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
04830         return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
04831     }
04832 
04833     if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
04834         return DRFLAC_FALSE;
04835     }
04836 
04837     /*
04838     From the FLAC spec:
04839       The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
04840     */
04841     if (partitionOrder > 8) {
04842         return DRFLAC_FALSE;
04843     }
04844 
04845     /* Validation check. */
04846     if ((blockSize / (1 << partitionOrder)) <= order) {
04847         return DRFLAC_FALSE;
04848     }
04849 
04850     samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
04851     partitionsRemaining = (1 << partitionOrder);
04852     for (;;)
04853     {
04854         drflac_uint8 riceParam = 0;
04855         if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
04856             if (!drflac__read_uint8(bs, 4, &riceParam)) {
04857                 return DRFLAC_FALSE;
04858             }
04859             if (riceParam == 15) {
04860                 riceParam = 0xFF;
04861             }
04862         } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
04863             if (!drflac__read_uint8(bs, 5, &riceParam)) {
04864                 return DRFLAC_FALSE;
04865             }
04866             if (riceParam == 31) {
04867                 riceParam = 0xFF;
04868             }
04869         }
04870 
04871         if (riceParam != 0xFF) {
04872             if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
04873                 return DRFLAC_FALSE;
04874             }
04875         } else {
04876             drflac_uint8 unencodedBitsPerSample = 0;
04877             if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
04878                 return DRFLAC_FALSE;
04879             }
04880 
04881             if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
04882                 return DRFLAC_FALSE;
04883             }
04884         }
04885 
04886 
04887         if (partitionsRemaining == 1) {
04888             break;
04889         }
04890 
04891         partitionsRemaining -= 1;
04892         samplesInPartition = blockSize / (1 << partitionOrder);
04893     }
04894 
04895     return DRFLAC_TRUE;
04896 }
04897 
04898 
04899 static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
04900 {
04901     drflac_uint32 i;
04902 
04903     /* Only a single sample needs to be decoded here. */
04904     drflac_int32 sample;
04905     if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
04906         return DRFLAC_FALSE;
04907     }
04908 
04909     /*
04910     We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
04911     we'll want to look at a more efficient way.
04912     */
04913     for (i = 0; i < blockSize; ++i) {
04914         pDecodedSamples[i] = sample;
04915     }
04916 
04917     return DRFLAC_TRUE;
04918 }
04919 
04920 static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
04921 {
04922     drflac_uint32 i;
04923 
04924     for (i = 0; i < blockSize; ++i) {
04925         drflac_int32 sample;
04926         if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
04927             return DRFLAC_FALSE;
04928         }
04929 
04930         pDecodedSamples[i] = sample;
04931     }
04932 
04933     return DRFLAC_TRUE;
04934 }
04935 
04936 static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
04937 {
04938     drflac_uint32 i;
04939 
04940     static drflac_int32 lpcCoefficientsTable[5][4] = {
04941         {0,  0, 0,  0},
04942         {1,  0, 0,  0},
04943         {2, -1, 0,  0},
04944         {3, -3, 1,  0},
04945         {4, -6, 4, -1}
04946     };
04947 
04948     /* Warm up samples and coefficients. */
04949     for (i = 0; i < lpcOrder; ++i) {
04950         drflac_int32 sample;
04951         if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
04952             return DRFLAC_FALSE;
04953         }
04954 
04955         pDecodedSamples[i] = sample;
04956     }
04957 
04958     if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
04959         return DRFLAC_FALSE;
04960     }
04961 
04962     return DRFLAC_TRUE;
04963 }
04964 
04965 static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
04966 {
04967     drflac_uint8 i;
04968     drflac_uint8 lpcPrecision;
04969     drflac_int8 lpcShift;
04970     drflac_int32 coefficients[32];
04971 
04972     /* Warm up samples. */
04973     for (i = 0; i < lpcOrder; ++i) {
04974         drflac_int32 sample;
04975         if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
04976             return DRFLAC_FALSE;
04977         }
04978 
04979         pDecodedSamples[i] = sample;
04980     }
04981 
04982     if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
04983         return DRFLAC_FALSE;
04984     }
04985     if (lpcPrecision == 15) {
04986         return DRFLAC_FALSE;    /* Invalid. */
04987     }
04988     lpcPrecision += 1;
04989 
04990     if (!drflac__read_int8(bs, 5, &lpcShift)) {
04991         return DRFLAC_FALSE;
04992     }
04993 
04994     DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
04995     for (i = 0; i < lpcOrder; ++i) {
04996         if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
04997             return DRFLAC_FALSE;
04998         }
04999     }
05000 
05001     if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, coefficients, pDecodedSamples)) {
05002         return DRFLAC_FALSE;
05003     }
05004 
05005     return DRFLAC_TRUE;
05006 }
05007 
05008 
05009 static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
05010 {
05011     const drflac_uint32 sampleRateTable[12]  = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
05012     const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1};   /* -1 = reserved. */
05013 
05014     DRFLAC_ASSERT(bs != NULL);
05015     DRFLAC_ASSERT(header != NULL);
05016 
05017     /* Keep looping until we find a valid sync code. */
05018     for (;;) {
05019         drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
05020         drflac_uint8 reserved = 0;
05021         drflac_uint8 blockingStrategy = 0;
05022         drflac_uint8 blockSize = 0;
05023         drflac_uint8 sampleRate = 0;
05024         drflac_uint8 channelAssignment = 0;
05025         drflac_uint8 bitsPerSample = 0;
05026         drflac_bool32 isVariableBlockSize;
05027 
05028         if (!drflac__find_and_seek_to_next_sync_code(bs)) {
05029             return DRFLAC_FALSE;
05030         }
05031 
05032         if (!drflac__read_uint8(bs, 1, &reserved)) {
05033             return DRFLAC_FALSE;
05034         }
05035         if (reserved == 1) {
05036             continue;
05037         }
05038         crc8 = drflac_crc8(crc8, reserved, 1);
05039 
05040         if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
05041             return DRFLAC_FALSE;
05042         }
05043         crc8 = drflac_crc8(crc8, blockingStrategy, 1);
05044 
05045         if (!drflac__read_uint8(bs, 4, &blockSize)) {
05046             return DRFLAC_FALSE;
05047         }
05048         if (blockSize == 0) {
05049             continue;
05050         }
05051         crc8 = drflac_crc8(crc8, blockSize, 4);
05052 
05053         if (!drflac__read_uint8(bs, 4, &sampleRate)) {
05054             return DRFLAC_FALSE;
05055         }
05056         crc8 = drflac_crc8(crc8, sampleRate, 4);
05057 
05058         if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
05059             return DRFLAC_FALSE;
05060         }
05061         if (channelAssignment > 10) {
05062             continue;
05063         }
05064         crc8 = drflac_crc8(crc8, channelAssignment, 4);
05065 
05066         if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
05067             return DRFLAC_FALSE;
05068         }
05069         if (bitsPerSample == 3 || bitsPerSample == 7) {
05070             continue;
05071         }
05072         crc8 = drflac_crc8(crc8, bitsPerSample, 3);
05073 
05074 
05075         if (!drflac__read_uint8(bs, 1, &reserved)) {
05076             return DRFLAC_FALSE;
05077         }
05078         if (reserved == 1) {
05079             continue;
05080         }
05081         crc8 = drflac_crc8(crc8, reserved, 1);
05082 
05083 
05084         isVariableBlockSize = blockingStrategy == 1;
05085         if (isVariableBlockSize) {
05086             drflac_uint64 pcmFrameNumber;
05087             drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
05088             if (result != DRFLAC_SUCCESS) {
05089                 if (result == DRFLAC_AT_END) {
05090                     return DRFLAC_FALSE;
05091                 } else {
05092                     continue;
05093                 }
05094             }
05095             header->flacFrameNumber  = 0;
05096             header->pcmFrameNumber = pcmFrameNumber;
05097         } else {
05098             drflac_uint64 flacFrameNumber = 0;
05099             drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
05100             if (result != DRFLAC_SUCCESS) {
05101                 if (result == DRFLAC_AT_END) {
05102                     return DRFLAC_FALSE;
05103                 } else {
05104                     continue;
05105                 }
05106             }
05107             header->flacFrameNumber  = (drflac_uint32)flacFrameNumber;   /* <-- Safe cast. */
05108             header->pcmFrameNumber = 0;
05109         }
05110 
05111 
05112         DRFLAC_ASSERT(blockSize > 0);
05113         if (blockSize == 1) {
05114             header->blockSizeInPCMFrames = 192;
05115         } else if (blockSize >= 2 && blockSize <= 5) {
05116             header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
05117         } else if (blockSize == 6) {
05118             if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
05119                 return DRFLAC_FALSE;
05120             }
05121             crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
05122             header->blockSizeInPCMFrames += 1;
05123         } else if (blockSize == 7) {
05124             if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
05125                 return DRFLAC_FALSE;
05126             }
05127             crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
05128             header->blockSizeInPCMFrames += 1;
05129         } else {
05130             DRFLAC_ASSERT(blockSize >= 8);
05131             header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
05132         }
05133 
05134 
05135         if (sampleRate <= 11) {
05136             header->sampleRate = sampleRateTable[sampleRate];
05137         } else if (sampleRate == 12) {
05138             if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
05139                 return DRFLAC_FALSE;
05140             }
05141             crc8 = drflac_crc8(crc8, header->sampleRate, 8);
05142             header->sampleRate *= 1000;
05143         } else if (sampleRate == 13) {
05144             if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
05145                 return DRFLAC_FALSE;
05146             }
05147             crc8 = drflac_crc8(crc8, header->sampleRate, 16);
05148         } else if (sampleRate == 14) {
05149             if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
05150                 return DRFLAC_FALSE;
05151             }
05152             crc8 = drflac_crc8(crc8, header->sampleRate, 16);
05153             header->sampleRate *= 10;
05154         } else {
05155             continue;  /* Invalid. Assume an invalid block. */
05156         }
05157 
05158 
05159         header->channelAssignment = channelAssignment;
05160 
05161         header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
05162         if (header->bitsPerSample == 0) {
05163             header->bitsPerSample = streaminfoBitsPerSample;
05164         }
05165 
05166         if (!drflac__read_uint8(bs, 8, &header->crc8)) {
05167             return DRFLAC_FALSE;
05168         }
05169 
05170 #ifndef DR_FLAC_NO_CRC
05171         if (header->crc8 != crc8) {
05172             continue;    /* CRC mismatch. Loop back to the top and find the next sync code. */
05173         }
05174 #endif
05175         return DRFLAC_TRUE;
05176     }
05177 }
05178 
05179 static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
05180 {
05181     drflac_uint8 header;
05182     int type;
05183 
05184     if (!drflac__read_uint8(bs, 8, &header)) {
05185         return DRFLAC_FALSE;
05186     }
05187 
05188     /* First bit should always be 0. */
05189     if ((header & 0x80) != 0) {
05190         return DRFLAC_FALSE;
05191     }
05192 
05193     type = (header & 0x7E) >> 1;
05194     if (type == 0) {
05195         pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
05196     } else if (type == 1) {
05197         pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
05198     } else {
05199         if ((type & 0x20) != 0) {
05200             pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
05201             pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
05202         } else if ((type & 0x08) != 0) {
05203             pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
05204             pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
05205             if (pSubframe->lpcOrder > 4) {
05206                 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
05207                 pSubframe->lpcOrder = 0;
05208             }
05209         } else {
05210             pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
05211         }
05212     }
05213 
05214     if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
05215         return DRFLAC_FALSE;
05216     }
05217 
05218     /* Wasted bits per sample. */
05219     pSubframe->wastedBitsPerSample = 0;
05220     if ((header & 0x01) == 1) {
05221         unsigned int wastedBitsPerSample;
05222         if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
05223             return DRFLAC_FALSE;
05224         }
05225         pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
05226     }
05227 
05228     return DRFLAC_TRUE;
05229 }
05230 
05231 static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
05232 {
05233     drflac_subframe* pSubframe;
05234     drflac_uint32 subframeBitsPerSample;
05235 
05236     DRFLAC_ASSERT(bs != NULL);
05237     DRFLAC_ASSERT(frame != NULL);
05238 
05239     pSubframe = frame->subframes + subframeIndex;
05240     if (!drflac__read_subframe_header(bs, pSubframe)) {
05241         return DRFLAC_FALSE;
05242     }
05243 
05244     /* Side channels require an extra bit per sample. Took a while to figure that one out... */
05245     subframeBitsPerSample = frame->header.bitsPerSample;
05246     if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
05247         subframeBitsPerSample += 1;
05248     } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
05249         subframeBitsPerSample += 1;
05250     }
05251 
05252     /* Need to handle wasted bits per sample. */
05253     if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
05254         return DRFLAC_FALSE;
05255     }
05256     subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
05257 
05258     pSubframe->pSamplesS32 = pDecodedSamplesOut;
05259 
05260     switch (pSubframe->subframeType)
05261     {
05262         case DRFLAC_SUBFRAME_CONSTANT:
05263         {
05264             drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
05265         } break;
05266 
05267         case DRFLAC_SUBFRAME_VERBATIM:
05268         {
05269             drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
05270         } break;
05271 
05272         case DRFLAC_SUBFRAME_FIXED:
05273         {
05274             drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
05275         } break;
05276 
05277         case DRFLAC_SUBFRAME_LPC:
05278         {
05279             drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
05280         } break;
05281 
05282         default: return DRFLAC_FALSE;
05283     }
05284 
05285     return DRFLAC_TRUE;
05286 }
05287 
05288 static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
05289 {
05290     drflac_subframe* pSubframe;
05291     drflac_uint32 subframeBitsPerSample;
05292 
05293     DRFLAC_ASSERT(bs != NULL);
05294     DRFLAC_ASSERT(frame != NULL);
05295 
05296     pSubframe = frame->subframes + subframeIndex;
05297     if (!drflac__read_subframe_header(bs, pSubframe)) {
05298         return DRFLAC_FALSE;
05299     }
05300 
05301     /* Side channels require an extra bit per sample. Took a while to figure that one out... */
05302     subframeBitsPerSample = frame->header.bitsPerSample;
05303     if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
05304         subframeBitsPerSample += 1;
05305     } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
05306         subframeBitsPerSample += 1;
05307     }
05308 
05309     /* Need to handle wasted bits per sample. */
05310     if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
05311         return DRFLAC_FALSE;
05312     }
05313     subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
05314 
05315     pSubframe->pSamplesS32 = NULL;
05316 
05317     switch (pSubframe->subframeType)
05318     {
05319         case DRFLAC_SUBFRAME_CONSTANT:
05320         {
05321             if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
05322                 return DRFLAC_FALSE;
05323             }
05324         } break;
05325 
05326         case DRFLAC_SUBFRAME_VERBATIM:
05327         {
05328             unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
05329             if (!drflac__seek_bits(bs, bitsToSeek)) {
05330                 return DRFLAC_FALSE;
05331             }
05332         } break;
05333 
05334         case DRFLAC_SUBFRAME_FIXED:
05335         {
05336             unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
05337             if (!drflac__seek_bits(bs, bitsToSeek)) {
05338                 return DRFLAC_FALSE;
05339             }
05340 
05341             if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
05342                 return DRFLAC_FALSE;
05343             }
05344         } break;
05345 
05346         case DRFLAC_SUBFRAME_LPC:
05347         {
05348             drflac_uint8 lpcPrecision;
05349 
05350             unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
05351             if (!drflac__seek_bits(bs, bitsToSeek)) {
05352                 return DRFLAC_FALSE;
05353             }
05354 
05355             if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
05356                 return DRFLAC_FALSE;
05357             }
05358             if (lpcPrecision == 15) {
05359                 return DRFLAC_FALSE;    /* Invalid. */
05360             }
05361             lpcPrecision += 1;
05362 
05363 
05364             bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5;    /* +5 for shift. */
05365             if (!drflac__seek_bits(bs, bitsToSeek)) {
05366                 return DRFLAC_FALSE;
05367             }
05368 
05369             if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
05370                 return DRFLAC_FALSE;
05371             }
05372         } break;
05373 
05374         default: return DRFLAC_FALSE;
05375     }
05376 
05377     return DRFLAC_TRUE;
05378 }
05379 
05380 
05381 static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
05382 {
05383     drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
05384 
05385     DRFLAC_ASSERT(channelAssignment <= 10);
05386     return lookup[channelAssignment];
05387 }
05388 
05389 static drflac_result drflac__decode_flac_frame(drflac* pFlac)
05390 {
05391     int channelCount;
05392     int i;
05393     drflac_uint8 paddingSizeInBits;
05394     drflac_uint16 desiredCRC16;
05395 #ifndef DR_FLAC_NO_CRC
05396     drflac_uint16 actualCRC16;
05397 #endif
05398 
05399     /* This function should be called while the stream is sitting on the first byte after the frame header. */
05400     DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
05401 
05402     /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
05403     if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
05404         return DRFLAC_ERROR;
05405     }
05406 
05407     /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
05408     channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
05409     if (channelCount != (int)pFlac->channels) {
05410         return DRFLAC_ERROR;
05411     }
05412 
05413     for (i = 0; i < channelCount; ++i) {
05414         if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
05415             return DRFLAC_ERROR;
05416         }
05417     }
05418 
05419     paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
05420     if (paddingSizeInBits > 0) {
05421         drflac_uint8 padding = 0;
05422         if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
05423             return DRFLAC_AT_END;
05424         }
05425     }
05426 
05427 #ifndef DR_FLAC_NO_CRC
05428     actualCRC16 = drflac__flush_crc16(&pFlac->bs);
05429 #endif
05430     if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
05431         return DRFLAC_AT_END;
05432     }
05433 
05434 #ifndef DR_FLAC_NO_CRC
05435     if (actualCRC16 != desiredCRC16) {
05436         return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
05437     }
05438 #endif
05439 
05440     pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
05441 
05442     return DRFLAC_SUCCESS;
05443 }
05444 
05445 static drflac_result drflac__seek_flac_frame(drflac* pFlac)
05446 {
05447     int channelCount;
05448     int i;
05449     drflac_uint16 desiredCRC16;
05450 #ifndef DR_FLAC_NO_CRC
05451     drflac_uint16 actualCRC16;
05452 #endif
05453 
05454     channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
05455     for (i = 0; i < channelCount; ++i) {
05456         if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
05457             return DRFLAC_ERROR;
05458         }
05459     }
05460 
05461     /* Padding. */
05462     if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
05463         return DRFLAC_ERROR;
05464     }
05465 
05466     /* CRC. */
05467 #ifndef DR_FLAC_NO_CRC
05468     actualCRC16 = drflac__flush_crc16(&pFlac->bs);
05469 #endif
05470     if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
05471         return DRFLAC_AT_END;
05472     }
05473 
05474 #ifndef DR_FLAC_NO_CRC
05475     if (actualCRC16 != desiredCRC16) {
05476         return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
05477     }
05478 #endif
05479 
05480     return DRFLAC_SUCCESS;
05481 }
05482 
05483 static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
05484 {
05485     DRFLAC_ASSERT(pFlac != NULL);
05486 
05487     for (;;) {
05488         drflac_result result;
05489 
05490         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05491             return DRFLAC_FALSE;
05492         }
05493 
05494         result = drflac__decode_flac_frame(pFlac);
05495         if (result != DRFLAC_SUCCESS) {
05496             if (result == DRFLAC_CRC_MISMATCH) {
05497                 continue;   /* CRC mismatch. Skip to the next frame. */
05498             } else {
05499                 return DRFLAC_FALSE;
05500             }
05501         }
05502 
05503         return DRFLAC_TRUE;
05504     }
05505 }
05506 
05507 static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
05508 {
05509     drflac_uint64 firstPCMFrame;
05510     drflac_uint64 lastPCMFrame;
05511 
05512     DRFLAC_ASSERT(pFlac != NULL);
05513 
05514     firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
05515     if (firstPCMFrame == 0) {
05516         firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
05517     }
05518 
05519     lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
05520     if (lastPCMFrame > 0) {
05521         lastPCMFrame -= 1; /* Needs to be zero based. */
05522     }
05523 
05524     if (pFirstPCMFrame) {
05525         *pFirstPCMFrame = firstPCMFrame;
05526     }
05527     if (pLastPCMFrame) {
05528         *pLastPCMFrame = lastPCMFrame;
05529     }
05530 }
05531 
05532 static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
05533 {
05534     drflac_bool32 result;
05535 
05536     DRFLAC_ASSERT(pFlac != NULL);
05537 
05538     result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
05539 
05540     DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
05541     pFlac->currentPCMFrame = 0;
05542 
05543     return result;
05544 }
05545 
05546 static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
05547 {
05548     /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
05549     DRFLAC_ASSERT(pFlac != NULL);
05550     return drflac__seek_flac_frame(pFlac);
05551 }
05552 
05553 
05554 static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
05555 {
05556     drflac_uint64 pcmFramesRead = 0;
05557     while (pcmFramesToSeek > 0) {
05558         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
05559             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
05560                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
05561             }
05562         } else {
05563             if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
05564                 pcmFramesRead   += pcmFramesToSeek;
05565                 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek;   /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
05566                 pcmFramesToSeek  = 0;
05567             } else {
05568                 pcmFramesRead   += pFlac->currentFLACFrame.pcmFramesRemaining;
05569                 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
05570                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
05571             }
05572         }
05573     }
05574 
05575     pFlac->currentPCMFrame += pcmFramesRead;
05576     return pcmFramesRead;
05577 }
05578 
05579 
05580 static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
05581 {
05582     drflac_bool32 isMidFrame = DRFLAC_FALSE;
05583     drflac_uint64 runningPCMFrameCount;
05584 
05585     DRFLAC_ASSERT(pFlac != NULL);
05586 
05587     /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
05588     if (pcmFrameIndex >= pFlac->currentPCMFrame) {
05589         /* Seeking forward. Need to seek from the current position. */
05590         runningPCMFrameCount = pFlac->currentPCMFrame;
05591 
05592         /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
05593         if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
05594             if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05595                 return DRFLAC_FALSE;
05596             }
05597         } else {
05598             isMidFrame = DRFLAC_TRUE;
05599         }
05600     } else {
05601         /* Seeking backwards. Need to seek from the start of the file. */
05602         runningPCMFrameCount = 0;
05603 
05604         /* Move back to the start. */
05605         if (!drflac__seek_to_first_frame(pFlac)) {
05606             return DRFLAC_FALSE;
05607         }
05608 
05609         /* Decode the first frame in preparation for sample-exact seeking below. */
05610         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05611             return DRFLAC_FALSE;
05612         }
05613     }
05614 
05615     /*
05616     We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
05617     header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
05618     */
05619     for (;;) {
05620         drflac_uint64 pcmFrameCountInThisFLACFrame;
05621         drflac_uint64 firstPCMFrameInFLACFrame = 0;
05622         drflac_uint64 lastPCMFrameInFLACFrame = 0;
05623 
05624         drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
05625 
05626         pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
05627         if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
05628             /*
05629             The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
05630             it never existed and keep iterating.
05631             */
05632             drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
05633 
05634             if (!isMidFrame) {
05635                 drflac_result result = drflac__decode_flac_frame(pFlac);
05636                 if (result == DRFLAC_SUCCESS) {
05637                     /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
05638                     return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
05639                 } else {
05640                     if (result == DRFLAC_CRC_MISMATCH) {
05641                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
05642                     } else {
05643                         return DRFLAC_FALSE;
05644                     }
05645                 }
05646             } else {
05647                 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
05648                 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
05649             }
05650         } else {
05651             /*
05652             It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
05653             frame never existed and leave the running sample count untouched.
05654             */
05655             if (!isMidFrame) {
05656                 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
05657                 if (result == DRFLAC_SUCCESS) {
05658                     runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
05659                 } else {
05660                     if (result == DRFLAC_CRC_MISMATCH) {
05661                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
05662                     } else {
05663                         return DRFLAC_FALSE;
05664                     }
05665                 }
05666             } else {
05667                 /*
05668                 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
05669                 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
05670                 */
05671                 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
05672                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
05673                 isMidFrame = DRFLAC_FALSE;
05674             }
05675 
05676             /* If we are seeking to the end of the file and we've just hit it, we're done. */
05677             if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
05678                 return DRFLAC_TRUE;
05679             }
05680         }
05681 
05682     next_iteration:
05683         /* Grab the next frame in preparation for the next iteration. */
05684         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05685             return DRFLAC_FALSE;
05686         }
05687     }
05688 }
05689 
05690 
05691 #if !defined(DR_FLAC_NO_CRC)
05692 /*
05693 We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
05694 uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
05695 location.
05696 */
05697 #define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
05698 
05699 static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
05700 {
05701     DRFLAC_ASSERT(pFlac != NULL);
05702     DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
05703     DRFLAC_ASSERT(targetByte >= rangeLo);
05704     DRFLAC_ASSERT(targetByte <= rangeHi);
05705 
05706     *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
05707 
05708     for (;;) {
05709         /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
05710         if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
05711             /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
05712             if (targetByte == 0) {
05713                 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
05714                 return DRFLAC_FALSE;
05715             }
05716 
05717             /* Halve the byte location and continue. */
05718             targetByte = rangeLo + ((rangeHi - rangeLo)/2);
05719             rangeHi = targetByte;
05720         } else {
05721             /* Getting here should mean that we have seeked to an appropriate byte. */
05722 
05723             /* Clear the details of the FLAC frame so we don't misreport data. */
05724             DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
05725 
05726             /*
05727             Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
05728             CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
05729             so it needs to stay this way for now.
05730             */
05731 #if 1
05732             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
05733                 /* Halve the byte location and continue. */
05734                 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
05735                 rangeHi = targetByte;
05736             } else {
05737                 break;
05738             }
05739 #else
05740             if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05741                 /* Halve the byte location and continue. */
05742                 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
05743                 rangeHi = targetByte;
05744             } else {
05745                 break;
05746             }
05747 #endif
05748         }
05749     }
05750 
05751     /* The current PCM frame needs to be updated based on the frame we just seeked to. */
05752     drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
05753 
05754     DRFLAC_ASSERT(targetByte <= rangeHi);
05755 
05756     *pLastSuccessfulSeekOffset = targetByte;
05757     return DRFLAC_TRUE;
05758 }
05759 
05760 static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
05761 {
05762     /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
05763 #if 0
05764     if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
05765         /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
05766         if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
05767             return DRFLAC_FALSE;
05768         }
05769     }
05770 #endif
05771 
05772     return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
05773 }
05774 
05775 
05776 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
05777 {
05778     /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
05779 
05780     drflac_uint64 targetByte;
05781     drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
05782     drflac_uint64 pcmRangeHi = 0;
05783     drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
05784     drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
05785     drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
05786 
05787     targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
05788     if (targetByte > byteRangeHi) {
05789         targetByte = byteRangeHi;
05790     }
05791 
05792     for (;;) {
05793         if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
05794             /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
05795             drflac_uint64 newPCMRangeLo;
05796             drflac_uint64 newPCMRangeHi;
05797             drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
05798 
05799             /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
05800             if (pcmRangeLo == newPCMRangeLo) {
05801                 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
05802                     break;  /* Failed to seek to closest frame. */
05803                 }
05804 
05805                 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
05806                     return DRFLAC_TRUE;
05807                 } else {
05808                     break;  /* Failed to seek forward. */
05809                 }
05810             }
05811 
05812             pcmRangeLo = newPCMRangeLo;
05813             pcmRangeHi = newPCMRangeHi;
05814 
05815             if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
05816                 /* The target PCM frame is in this FLAC frame. */
05817                 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
05818                     return DRFLAC_TRUE;
05819                 } else {
05820                     break;  /* Failed to seek to FLAC frame. */
05821                 }
05822             } else {
05823                 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
05824 
05825                 if (pcmRangeLo > pcmFrameIndex) {
05826                     /* We seeked too far forward. We need to move our target byte backward and try again. */
05827                     byteRangeHi = lastSuccessfulSeekOffset;
05828                     if (byteRangeLo > byteRangeHi) {
05829                         byteRangeLo = byteRangeHi;
05830                     }
05831 
05832                     targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
05833                     if (targetByte < byteRangeLo) {
05834                         targetByte = byteRangeLo;
05835                     }
05836                 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
05837                     /* We didn't seek far enough. We need to move our target byte forward and try again. */
05838 
05839                     /* If we're close enough we can just seek forward. */
05840                     if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
05841                         if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
05842                             return DRFLAC_TRUE;
05843                         } else {
05844                             break;  /* Failed to seek to FLAC frame. */
05845                         }
05846                     } else {
05847                         byteRangeLo = lastSuccessfulSeekOffset;
05848                         if (byteRangeHi < byteRangeLo) {
05849                             byteRangeHi = byteRangeLo;
05850                         }
05851 
05852                         targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
05853                         if (targetByte > byteRangeHi) {
05854                             targetByte = byteRangeHi;
05855                         }
05856 
05857                         if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
05858                             closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
05859                         }
05860                     }
05861                 }
05862             }
05863         } else {
05864             /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
05865             break;
05866         }
05867     }
05868 
05869     drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
05870     return DRFLAC_FALSE;
05871 }
05872 
05873 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
05874 {
05875     drflac_uint64 byteRangeLo;
05876     drflac_uint64 byteRangeHi;
05877     drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
05878 
05879     /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
05880     if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
05881         return DRFLAC_FALSE;
05882     }
05883 
05884     /* If we're close enough to the start, just move to the start and seek forward. */
05885     if (pcmFrameIndex < seekForwardThreshold) {
05886         return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
05887     }
05888 
05889     /*
05890     Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
05891     the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
05892     */
05893     byteRangeLo = pFlac->firstFLACFramePosInBytes;
05894     byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
05895 
05896     return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
05897 }
05898 #endif  /* !DR_FLAC_NO_CRC */
05899 
05900 static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
05901 {
05902     drflac_uint32 iClosestSeekpoint = 0;
05903     drflac_bool32 isMidFrame = DRFLAC_FALSE;
05904     drflac_uint64 runningPCMFrameCount;
05905     drflac_uint32 iSeekpoint;
05906 
05907 
05908     DRFLAC_ASSERT(pFlac != NULL);
05909 
05910     if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
05911         return DRFLAC_FALSE;
05912     }
05913 
05914     for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
05915         if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
05916             break;
05917         }
05918 
05919         iClosestSeekpoint = iSeekpoint;
05920     }
05921 
05922     /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
05923     if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
05924         return DRFLAC_FALSE;
05925     }
05926     if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
05927         return DRFLAC_FALSE;
05928     }
05929 
05930 #if !defined(DR_FLAC_NO_CRC)
05931     /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
05932     if (pFlac->totalPCMFrameCount > 0) {
05933         drflac_uint64 byteRangeLo;
05934         drflac_uint64 byteRangeHi;
05935 
05936         byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
05937         byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
05938 
05939         /*
05940         If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
05941         value for byteRangeHi which will clamp it appropriately.
05942 
05943         Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
05944         have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
05945         */
05946         if (iClosestSeekpoint < pFlac->seekpointCount-1) {
05947             drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
05948 
05949             /* Basic validation on the seekpoints to ensure they're usable. */
05950             if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
05951                 return DRFLAC_FALSE;    /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
05952             }
05953 
05954             if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
05955                 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
05956             }
05957         }
05958 
05959         if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
05960             if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05961                 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
05962 
05963                 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
05964                     return DRFLAC_TRUE;
05965                 }
05966             }
05967         }
05968     }
05969 #endif  /* !DR_FLAC_NO_CRC */
05970 
05971     /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
05972 
05973     /*
05974     If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
05975     from the seekpoint's first sample.
05976     */
05977     if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
05978         /* Optimized case. Just seek forward from where we are. */
05979         runningPCMFrameCount = pFlac->currentPCMFrame;
05980 
05981         /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
05982         if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
05983             if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05984                 return DRFLAC_FALSE;
05985             }
05986         } else {
05987             isMidFrame = DRFLAC_TRUE;
05988         }
05989     } else {
05990         /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
05991         runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
05992 
05993         if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
05994             return DRFLAC_FALSE;
05995         }
05996 
05997         /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
05998         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
05999             return DRFLAC_FALSE;
06000         }
06001     }
06002 
06003     for (;;) {
06004         drflac_uint64 pcmFrameCountInThisFLACFrame;
06005         drflac_uint64 firstPCMFrameInFLACFrame = 0;
06006         drflac_uint64 lastPCMFrameInFLACFrame = 0;
06007 
06008         drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
06009 
06010         pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
06011         if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
06012             /*
06013             The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
06014             it never existed and keep iterating.
06015             */
06016             drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
06017 
06018             if (!isMidFrame) {
06019                 drflac_result result = drflac__decode_flac_frame(pFlac);
06020                 if (result == DRFLAC_SUCCESS) {
06021                     /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
06022                     return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
06023                 } else {
06024                     if (result == DRFLAC_CRC_MISMATCH) {
06025                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
06026                     } else {
06027                         return DRFLAC_FALSE;
06028                     }
06029                 }
06030             } else {
06031                 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
06032                 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
06033             }
06034         } else {
06035             /*
06036             It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
06037             frame never existed and leave the running sample count untouched.
06038             */
06039             if (!isMidFrame) {
06040                 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
06041                 if (result == DRFLAC_SUCCESS) {
06042                     runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
06043                 } else {
06044                     if (result == DRFLAC_CRC_MISMATCH) {
06045                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
06046                     } else {
06047                         return DRFLAC_FALSE;
06048                     }
06049                 }
06050             } else {
06051                 /*
06052                 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
06053                 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
06054                 */
06055                 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
06056                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
06057                 isMidFrame = DRFLAC_FALSE;
06058             }
06059 
06060             /* If we are seeking to the end of the file and we've just hit it, we're done. */
06061             if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
06062                 return DRFLAC_TRUE;
06063             }
06064         }
06065 
06066     next_iteration:
06067         /* Grab the next frame in preparation for the next iteration. */
06068         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
06069             return DRFLAC_FALSE;
06070         }
06071     }
06072 }
06073 
06074 
06075 #ifndef DR_FLAC_NO_OGG
06076 typedef struct
06077 {
06078     drflac_uint8 capturePattern[4];  /* Should be "OggS" */
06079     drflac_uint8 structureVersion;   /* Always 0. */
06080     drflac_uint8 headerType;
06081     drflac_uint64 granulePosition;
06082     drflac_uint32 serialNumber;
06083     drflac_uint32 sequenceNumber;
06084     drflac_uint32 checksum;
06085     drflac_uint8 segmentCount;
06086     drflac_uint8 segmentTable[255];
06087 } drflac_ogg_page_header;
06088 #endif
06089 
06090 typedef struct
06091 {
06092     drflac_read_proc onRead;
06093     drflac_seek_proc onSeek;
06094     drflac_meta_proc onMeta;
06095     drflac_container container;
06096     void* pUserData;
06097     void* pUserDataMD;
06098     drflac_uint32 sampleRate;
06099     drflac_uint8  channels;
06100     drflac_uint8  bitsPerSample;
06101     drflac_uint64 totalPCMFrameCount;
06102     drflac_uint16 maxBlockSizeInPCMFrames;
06103     drflac_uint64 runningFilePos;
06104     drflac_bool32 hasStreamInfoBlock;
06105     drflac_bool32 hasMetadataBlocks;
06106     drflac_bs bs;                           /* <-- A bit streamer is required for loading data during initialization. */
06107     drflac_frame_header firstFrameHeader;   /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
06108 
06109 #ifndef DR_FLAC_NO_OGG
06110     drflac_uint32 oggSerial;
06111     drflac_uint64 oggFirstBytePos;
06112     drflac_ogg_page_header oggBosHeader;
06113 #endif
06114 } drflac_init_info;
06115 
06116 static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
06117 {
06118     blockHeader = drflac__be2host_32(blockHeader);
06119     *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
06120     *blockType   = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
06121     *blockSize   =                (blockHeader & 0x00FFFFFFUL);
06122 }
06123 
06124 static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
06125 {
06126     drflac_uint32 blockHeader;
06127 
06128     *blockSize = 0;
06129     if (onRead(pUserData, &blockHeader, 4) != 4) {
06130         return DRFLAC_FALSE;
06131     }
06132 
06133     drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
06134     return DRFLAC_TRUE;
06135 }
06136 
06137 static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
06138 {
06139     drflac_uint32 blockSizes;
06140     drflac_uint64 frameSizes = 0;
06141     drflac_uint64 importantProps;
06142     drflac_uint8 md5[16];
06143 
06144     /* min/max block size. */
06145     if (onRead(pUserData, &blockSizes, 4) != 4) {
06146         return DRFLAC_FALSE;
06147     }
06148 
06149     /* min/max frame size. */
06150     if (onRead(pUserData, &frameSizes, 6) != 6) {
06151         return DRFLAC_FALSE;
06152     }
06153 
06154     /* Sample rate, channels, bits per sample and total sample count. */
06155     if (onRead(pUserData, &importantProps, 8) != 8) {
06156         return DRFLAC_FALSE;
06157     }
06158 
06159     /* MD5 */
06160     if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
06161         return DRFLAC_FALSE;
06162     }
06163 
06164     blockSizes     = drflac__be2host_32(blockSizes);
06165     frameSizes     = drflac__be2host_64(frameSizes);
06166     importantProps = drflac__be2host_64(importantProps);
06167 
06168     pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
06169     pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
06170     pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
06171     pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) <<  0)) >> 16);
06172     pStreamInfo->sampleRate              = (drflac_uint32)((importantProps &  (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
06173     pStreamInfo->channels                = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
06174     pStreamInfo->bitsPerSample           = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
06175     pStreamInfo->totalPCMFrameCount      =                ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
06176     DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
06177 
06178     return DRFLAC_TRUE;
06179 }
06180 
06181 
06182 static void* drflac__malloc_default(size_t sz, void* pUserData)
06183 {
06184     (void)pUserData;
06185     return DRFLAC_MALLOC(sz);
06186 }
06187 
06188 static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
06189 {
06190     (void)pUserData;
06191     return DRFLAC_REALLOC(p, sz);
06192 }
06193 
06194 static void drflac__free_default(void* p, void* pUserData)
06195 {
06196     (void)pUserData;
06197     DRFLAC_FREE(p);
06198 }
06199 
06200 
06201 static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
06202 {
06203     if (pAllocationCallbacks == NULL) {
06204         return NULL;
06205     }
06206 
06207     if (pAllocationCallbacks->onMalloc != NULL) {
06208         return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
06209     }
06210 
06211     /* Try using realloc(). */
06212     if (pAllocationCallbacks->onRealloc != NULL) {
06213         return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
06214     }
06215 
06216     return NULL;
06217 }
06218 
06219 static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
06220 {
06221     if (pAllocationCallbacks == NULL) {
06222         return NULL;
06223     }
06224 
06225     if (pAllocationCallbacks->onRealloc != NULL) {
06226         return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
06227     }
06228 
06229     /* Try emulating realloc() in terms of malloc()/free(). */
06230     if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
06231         void* p2;
06232 
06233         p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
06234         if (p2 == NULL) {
06235             return NULL;
06236         }
06237 
06238         if (p != NULL) {
06239             DRFLAC_COPY_MEMORY(p2, p, szOld);
06240             pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
06241         }
06242 
06243         return p2;
06244     }
06245 
06246     return NULL;
06247 }
06248 
06249 static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
06250 {
06251     if (p == NULL || pAllocationCallbacks == NULL) {
06252         return;
06253     }
06254 
06255     if (pAllocationCallbacks->onFree != NULL) {
06256         pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
06257     }
06258 }
06259 
06260 
06261 static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeektableSize, drflac_allocation_callbacks* pAllocationCallbacks)
06262 {
06263     /*
06264     We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
06265     we'll be sitting on byte 42.
06266     */
06267     drflac_uint64 runningFilePos = 42;
06268     drflac_uint64 seektablePos   = 0;
06269     drflac_uint32 seektableSize  = 0;
06270 
06271     for (;;) {
06272         drflac_metadata metadata;
06273         drflac_uint8 isLastBlock = 0;
06274         drflac_uint8 blockType;
06275         drflac_uint32 blockSize;
06276         if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
06277             return DRFLAC_FALSE;
06278         }
06279         runningFilePos += 4;
06280 
06281         metadata.type = blockType;
06282         metadata.pRawData = NULL;
06283         metadata.rawDataSize = 0;
06284 
06285         switch (blockType)
06286         {
06287             case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
06288             {
06289                 if (blockSize < 4) {
06290                     return DRFLAC_FALSE;
06291                 }
06292 
06293                 if (onMeta) {
06294                     void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
06295                     if (pRawData == NULL) {
06296                         return DRFLAC_FALSE;
06297                     }
06298 
06299                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
06300                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06301                         return DRFLAC_FALSE;
06302                     }
06303 
06304                     metadata.pRawData = pRawData;
06305                     metadata.rawDataSize = blockSize;
06306                     metadata.data.application.id       = drflac__be2host_32(*(drflac_uint32*)pRawData);
06307                     metadata.data.application.pData    = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
06308                     metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
06309                     onMeta(pUserDataMD, &metadata);
06310 
06311                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06312                 }
06313             } break;
06314 
06315             case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
06316             {
06317                 seektablePos  = runningFilePos;
06318                 seektableSize = blockSize;
06319 
06320                 if (onMeta) {
06321                     drflac_uint32 iSeekpoint;
06322                     void* pRawData;
06323 
06324                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
06325                     if (pRawData == NULL) {
06326                         return DRFLAC_FALSE;
06327                     }
06328 
06329                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
06330                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06331                         return DRFLAC_FALSE;
06332                     }
06333 
06334                     metadata.pRawData = pRawData;
06335                     metadata.rawDataSize = blockSize;
06336                     metadata.data.seektable.seekpointCount = blockSize/sizeof(drflac_seekpoint);
06337                     metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
06338 
06339                     /* Endian swap. */
06340                     for (iSeekpoint = 0; iSeekpoint < metadata.data.seektable.seekpointCount; ++iSeekpoint) {
06341                         drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
06342                         pSeekpoint->firstPCMFrame   = drflac__be2host_64(pSeekpoint->firstPCMFrame);
06343                         pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
06344                         pSeekpoint->pcmFrameCount   = drflac__be2host_16(pSeekpoint->pcmFrameCount);
06345                     }
06346 
06347                     onMeta(pUserDataMD, &metadata);
06348 
06349                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06350                 }
06351             } break;
06352 
06353             case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
06354             {
06355                 if (blockSize < 8) {
06356                     return DRFLAC_FALSE;
06357                 }
06358 
06359                 if (onMeta) {
06360                     void* pRawData;
06361                     const char* pRunningData;
06362                     const char* pRunningDataEnd;
06363                     drflac_uint32 i;
06364 
06365                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
06366                     if (pRawData == NULL) {
06367                         return DRFLAC_FALSE;
06368                     }
06369 
06370                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
06371                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06372                         return DRFLAC_FALSE;
06373                     }
06374 
06375                     metadata.pRawData = pRawData;
06376                     metadata.rawDataSize = blockSize;
06377 
06378                     pRunningData    = (const char*)pRawData;
06379                     pRunningDataEnd = (const char*)pRawData + blockSize;
06380 
06381                     metadata.data.vorbis_comment.vendorLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06382 
06383                     /* Need space for the rest of the block */
06384                     if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
06385                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06386                         return DRFLAC_FALSE;
06387                     }
06388                     metadata.data.vorbis_comment.vendor       = pRunningData;                                            pRunningData += metadata.data.vorbis_comment.vendorLength;
06389                     metadata.data.vorbis_comment.commentCount = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06390 
06391                     /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
06392                     if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
06393                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06394                         return DRFLAC_FALSE;
06395                     }
06396                     metadata.data.vorbis_comment.pComments    = pRunningData;
06397 
06398                     /* Check that the comments section is valid before passing it to the callback */
06399                     for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
06400                         drflac_uint32 commentLength;
06401 
06402                         if (pRunningDataEnd - pRunningData < 4) {
06403                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06404                             return DRFLAC_FALSE;
06405                         }
06406 
06407                         commentLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06408                         if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
06409                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06410                             return DRFLAC_FALSE;
06411                         }
06412                         pRunningData += commentLength;
06413                     }
06414 
06415                     onMeta(pUserDataMD, &metadata);
06416 
06417                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06418                 }
06419             } break;
06420 
06421             case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
06422             {
06423                 if (blockSize < 396) {
06424                     return DRFLAC_FALSE;
06425                 }
06426 
06427                 if (onMeta) {
06428                     void* pRawData;
06429                     const char* pRunningData;
06430                     const char* pRunningDataEnd;
06431                     drflac_uint8 iTrack;
06432                     drflac_uint8 iIndex;
06433 
06434                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
06435                     if (pRawData == NULL) {
06436                         return DRFLAC_FALSE;
06437                     }
06438 
06439                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
06440                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06441                         return DRFLAC_FALSE;
06442                     }
06443 
06444                     metadata.pRawData = pRawData;
06445                     metadata.rawDataSize = blockSize;
06446 
06447                     pRunningData    = (const char*)pRawData;
06448                     pRunningDataEnd = (const char*)pRawData + blockSize;
06449 
06450                     DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128);                              pRunningData += 128;
06451                     metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
06452                     metadata.data.cuesheet.isCD              = (pRunningData[0] & 0x80) != 0;                           pRunningData += 259;
06453                     metadata.data.cuesheet.trackCount        = pRunningData[0];                                         pRunningData += 1;
06454                     metadata.data.cuesheet.pTrackData        = pRunningData;
06455 
06456                     /* Check that the cuesheet tracks are valid before passing it to the callback */
06457                     for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
06458                         drflac_uint8 indexCount;
06459                         drflac_uint32 indexPointSize;
06460 
06461                         if (pRunningDataEnd - pRunningData < 36) {
06462                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06463                             return DRFLAC_FALSE;
06464                         }
06465 
06466                         /* Skip to the index point count */
06467                         pRunningData += 35;
06468                         indexCount = pRunningData[0]; pRunningData += 1;
06469                         indexPointSize = indexCount * sizeof(drflac_cuesheet_track_index);
06470                         if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
06471                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06472                             return DRFLAC_FALSE;
06473                         }
06474 
06475                         /* Endian swap. */
06476                         for (iIndex = 0; iIndex < indexCount; ++iIndex) {
06477                             drflac_cuesheet_track_index* pTrack = (drflac_cuesheet_track_index*)pRunningData;
06478                             pRunningData += sizeof(drflac_cuesheet_track_index);
06479                             pTrack->offset = drflac__be2host_64(pTrack->offset);
06480                         }
06481                     }
06482 
06483                     onMeta(pUserDataMD, &metadata);
06484 
06485                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06486                 }
06487             } break;
06488 
06489             case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
06490             {
06491                 if (blockSize < 32) {
06492                     return DRFLAC_FALSE;
06493                 }
06494 
06495                 if (onMeta) {
06496                     void* pRawData;
06497                     const char* pRunningData;
06498                     const char* pRunningDataEnd;
06499 
06500                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
06501                     if (pRawData == NULL) {
06502                         return DRFLAC_FALSE;
06503                     }
06504 
06505                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
06506                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06507                         return DRFLAC_FALSE;
06508                     }
06509 
06510                     metadata.pRawData = pRawData;
06511                     metadata.rawDataSize = blockSize;
06512 
06513                     pRunningData    = (const char*)pRawData;
06514                     pRunningDataEnd = (const char*)pRawData + blockSize;
06515 
06516                     metadata.data.picture.type       = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06517                     metadata.data.picture.mimeLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06518 
06519                     /* Need space for the rest of the block */
06520                     if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
06521                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06522                         return DRFLAC_FALSE;
06523                     }
06524                     metadata.data.picture.mime              = pRunningData;                                            pRunningData += metadata.data.picture.mimeLength;
06525                     metadata.data.picture.descriptionLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06526 
06527                     /* Need space for the rest of the block */
06528                     if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
06529                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06530                         return DRFLAC_FALSE;
06531                     }
06532                     metadata.data.picture.description     = pRunningData;                                            pRunningData += metadata.data.picture.descriptionLength;
06533                     metadata.data.picture.width           = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06534                     metadata.data.picture.height          = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06535                     metadata.data.picture.colorDepth      = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06536                     metadata.data.picture.indexColorCount = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06537                     metadata.data.picture.pictureDataSize = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
06538                     metadata.data.picture.pPictureData    = (const drflac_uint8*)pRunningData;
06539 
06540                     /* Need space for the picture after the block */
06541                     if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
06542                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06543                         return DRFLAC_FALSE;
06544                     }
06545 
06546                     onMeta(pUserDataMD, &metadata);
06547 
06548                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06549                 }
06550             } break;
06551 
06552             case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
06553             {
06554                 if (onMeta) {
06555                     metadata.data.padding.unused = 0;
06556 
06557                     /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
06558                     if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
06559                         isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
06560                     } else {
06561                         onMeta(pUserDataMD, &metadata);
06562                     }
06563                 }
06564             } break;
06565 
06566             case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
06567             {
06568                 /* Invalid chunk. Just skip over this one. */
06569                 if (onMeta) {
06570                     if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
06571                         isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
06572                     }
06573                 }
06574             } break;
06575 
06576             default:
06577             {
06578                 /*
06579                 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
06580                 can at the very least report the chunk to the application and let it look at the raw data.
06581                 */
06582                 if (onMeta) {
06583                     void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
06584                     if (pRawData == NULL) {
06585                         return DRFLAC_FALSE;
06586                     }
06587 
06588                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
06589                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06590                         return DRFLAC_FALSE;
06591                     }
06592 
06593                     metadata.pRawData = pRawData;
06594                     metadata.rawDataSize = blockSize;
06595                     onMeta(pUserDataMD, &metadata);
06596 
06597                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
06598                 }
06599             } break;
06600         }
06601 
06602         /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
06603         if (onMeta == NULL && blockSize > 0) {
06604             if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
06605                 isLastBlock = DRFLAC_TRUE;
06606             }
06607         }
06608 
06609         runningFilePos += blockSize;
06610         if (isLastBlock) {
06611             break;
06612         }
06613     }
06614 
06615     *pSeektablePos = seektablePos;
06616     *pSeektableSize = seektableSize;
06617     *pFirstFramePos = runningFilePos;
06618 
06619     return DRFLAC_TRUE;
06620 }
06621 
06622 static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
06623 {
06624     /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
06625 
06626     drflac_uint8 isLastBlock;
06627     drflac_uint8 blockType;
06628     drflac_uint32 blockSize;
06629 
06630     (void)onSeek;
06631 
06632     pInit->container = drflac_container_native;
06633 
06634     /* The first metadata block should be the STREAMINFO block. */
06635     if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
06636         return DRFLAC_FALSE;
06637     }
06638 
06639     if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
06640         if (!relaxed) {
06641             /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
06642             return DRFLAC_FALSE;
06643         } else {
06644             /*
06645             Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
06646             for that frame.
06647             */
06648             pInit->hasStreamInfoBlock = DRFLAC_FALSE;
06649             pInit->hasMetadataBlocks  = DRFLAC_FALSE;
06650 
06651             if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
06652                 return DRFLAC_FALSE;    /* Couldn't find a frame. */
06653             }
06654 
06655             if (pInit->firstFrameHeader.bitsPerSample == 0) {
06656                 return DRFLAC_FALSE;    /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
06657             }
06658 
06659             pInit->sampleRate              = pInit->firstFrameHeader.sampleRate;
06660             pInit->channels                = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
06661             pInit->bitsPerSample           = pInit->firstFrameHeader.bitsPerSample;
06662             pInit->maxBlockSizeInPCMFrames = 65535;   /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
06663             return DRFLAC_TRUE;
06664         }
06665     } else {
06666         drflac_streaminfo streaminfo;
06667         if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
06668             return DRFLAC_FALSE;
06669         }
06670 
06671         pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
06672         pInit->sampleRate              = streaminfo.sampleRate;
06673         pInit->channels                = streaminfo.channels;
06674         pInit->bitsPerSample           = streaminfo.bitsPerSample;
06675         pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
06676         pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;    /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
06677         pInit->hasMetadataBlocks       = !isLastBlock;
06678 
06679         if (onMeta) {
06680             drflac_metadata metadata;
06681             metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
06682             metadata.pRawData = NULL;
06683             metadata.rawDataSize = 0;
06684             metadata.data.streaminfo = streaminfo;
06685             onMeta(pUserDataMD, &metadata);
06686         }
06687 
06688         return DRFLAC_TRUE;
06689     }
06690 }
06691 
06692 #ifndef DR_FLAC_NO_OGG
06693 #define DRFLAC_OGG_MAX_PAGE_SIZE            65307
06694 #define DRFLAC_OGG_CAPTURE_PATTERN_CRC32    1605413199  /* CRC-32 of "OggS". */
06695 
06696 typedef enum
06697 {
06698     drflac_ogg_recover_on_crc_mismatch,
06699     drflac_ogg_fail_on_crc_mismatch
06700 } drflac_ogg_crc_mismatch_recovery;
06701 
06702 #ifndef DR_FLAC_NO_CRC
06703 static drflac_uint32 drflac__crc32_table[] = {
06704     0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
06705     0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
06706     0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
06707     0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
06708     0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
06709     0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
06710     0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
06711     0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
06712     0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
06713     0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
06714     0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
06715     0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
06716     0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
06717     0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
06718     0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
06719     0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
06720     0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
06721     0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
06722     0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
06723     0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
06724     0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
06725     0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
06726     0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
06727     0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
06728     0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
06729     0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
06730     0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
06731     0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
06732     0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
06733     0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
06734     0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
06735     0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
06736     0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
06737     0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
06738     0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
06739     0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
06740     0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
06741     0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
06742     0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
06743     0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
06744     0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
06745     0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
06746     0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
06747     0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
06748     0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
06749     0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
06750     0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
06751     0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
06752     0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
06753     0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
06754     0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
06755     0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
06756     0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
06757     0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
06758     0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
06759     0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
06760     0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
06761     0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
06762     0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
06763     0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
06764     0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
06765     0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
06766     0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
06767     0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
06768 };
06769 #endif
06770 
06771 static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
06772 {
06773 #ifndef DR_FLAC_NO_CRC
06774     return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
06775 #else
06776     (void)data;
06777     return crc32;
06778 #endif
06779 }
06780 
06781 #if 0
06782 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
06783 {
06784     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
06785     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
06786     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  8) & 0xFF));
06787     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  0) & 0xFF));
06788     return crc32;
06789 }
06790 
06791 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
06792 {
06793     crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
06794     crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >>  0) & 0xFFFFFFFF));
06795     return crc32;
06796 }
06797 #endif
06798 
06799 static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
06800 {
06801     /* This can be optimized. */
06802     drflac_uint32 i;
06803     for (i = 0; i < dataSize; ++i) {
06804         crc32 = drflac_crc32_byte(crc32, pData[i]);
06805     }
06806     return crc32;
06807 }
06808 
06809 
06810 static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
06811 {
06812     return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
06813 }
06814 
06815 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
06816 {
06817     return 27 + pHeader->segmentCount;
06818 }
06819 
06820 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
06821 {
06822     drflac_uint32 pageBodySize = 0;
06823     int i;
06824 
06825     for (i = 0; i < pHeader->segmentCount; ++i) {
06826         pageBodySize += pHeader->segmentTable[i];
06827     }
06828 
06829     return pageBodySize;
06830 }
06831 
06832 static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
06833 {
06834     drflac_uint8 data[23];
06835     drflac_uint32 i;
06836 
06837     DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
06838 
06839     if (onRead(pUserData, data, 23) != 23) {
06840         return DRFLAC_AT_END;
06841     }
06842     *pBytesRead += 23;
06843 
06844     /*
06845     It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
06846     us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
06847     like to have it map to the structure of the underlying data.
06848     */
06849     pHeader->capturePattern[0] = 'O';
06850     pHeader->capturePattern[1] = 'g';
06851     pHeader->capturePattern[2] = 'g';
06852     pHeader->capturePattern[3] = 'S';
06853 
06854     pHeader->structureVersion = data[0];
06855     pHeader->headerType       = data[1];
06856     DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
06857     DRFLAC_COPY_MEMORY(&pHeader->serialNumber,    &data[10], 4);
06858     DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber,  &data[14], 4);
06859     DRFLAC_COPY_MEMORY(&pHeader->checksum,        &data[18], 4);
06860     pHeader->segmentCount     = data[22];
06861 
06862     /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
06863     data[18] = 0;
06864     data[19] = 0;
06865     data[20] = 0;
06866     data[21] = 0;
06867 
06868     for (i = 0; i < 23; ++i) {
06869         *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
06870     }
06871 
06872 
06873     if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
06874         return DRFLAC_AT_END;
06875     }
06876     *pBytesRead += pHeader->segmentCount;
06877 
06878     for (i = 0; i < pHeader->segmentCount; ++i) {
06879         *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
06880     }
06881 
06882     return DRFLAC_SUCCESS;
06883 }
06884 
06885 static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
06886 {
06887     drflac_uint8 id[4];
06888 
06889     *pBytesRead = 0;
06890 
06891     if (onRead(pUserData, id, 4) != 4) {
06892         return DRFLAC_AT_END;
06893     }
06894     *pBytesRead += 4;
06895 
06896     /* We need to read byte-by-byte until we find the OggS capture pattern. */
06897     for (;;) {
06898         if (drflac_ogg__is_capture_pattern(id)) {
06899             drflac_result result;
06900 
06901             *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
06902 
06903             result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
06904             if (result == DRFLAC_SUCCESS) {
06905                 return DRFLAC_SUCCESS;
06906             } else {
06907                 if (result == DRFLAC_CRC_MISMATCH) {
06908                     continue;
06909                 } else {
06910                     return result;
06911                 }
06912             }
06913         } else {
06914             /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
06915             id[0] = id[1];
06916             id[1] = id[2];
06917             id[2] = id[3];
06918             if (onRead(pUserData, &id[3], 1) != 1) {
06919                 return DRFLAC_AT_END;
06920             }
06921             *pBytesRead += 1;
06922         }
06923     }
06924 }
06925 
06926 
06927 /*
06928 The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
06929 in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
06930 in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
06931 dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
06932 the physical Ogg bitstream are converted and delivered in native FLAC format.
06933 */
06934 typedef struct
06935 {
06936     drflac_read_proc onRead;                /* The original onRead callback from drflac_open() and family. */
06937     drflac_seek_proc onSeek;                /* The original onSeek callback from drflac_open() and family. */
06938     void* pUserData;                        /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
06939     drflac_uint64 currentBytePos;           /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
06940     drflac_uint64 firstBytePos;             /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
06941     drflac_uint32 serialNumber;             /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
06942     drflac_ogg_page_header bosPageHeader;   /* Used for seeking. */
06943     drflac_ogg_page_header currentPageHeader;
06944     drflac_uint32 bytesRemainingInPage;
06945     drflac_uint32 pageDataSize;
06946     drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
06947 } drflac_oggbs; /* oggbs = Ogg Bitstream */
06948 
06949 static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
06950 {
06951     size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
06952     oggbs->currentBytePos += bytesActuallyRead;
06953 
06954     return bytesActuallyRead;
06955 }
06956 
06957 static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
06958 {
06959     if (origin == drflac_seek_origin_start) {
06960         if (offset <= 0x7FFFFFFF) {
06961             if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
06962                 return DRFLAC_FALSE;
06963             }
06964             oggbs->currentBytePos = offset;
06965 
06966             return DRFLAC_TRUE;
06967         } else {
06968             if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
06969                 return DRFLAC_FALSE;
06970             }
06971             oggbs->currentBytePos = offset;
06972 
06973             return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
06974         }
06975     } else {
06976         while (offset > 0x7FFFFFFF) {
06977             if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
06978                 return DRFLAC_FALSE;
06979             }
06980             oggbs->currentBytePos += 0x7FFFFFFF;
06981             offset -= 0x7FFFFFFF;
06982         }
06983 
06984         if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) {    /* <-- Safe cast thanks to the loop above. */
06985             return DRFLAC_FALSE;
06986         }
06987         oggbs->currentBytePos += offset;
06988 
06989         return DRFLAC_TRUE;
06990     }
06991 }
06992 
06993 static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
06994 {
06995     drflac_ogg_page_header header;
06996     for (;;) {
06997         drflac_uint32 crc32 = 0;
06998         drflac_uint32 bytesRead;
06999         drflac_uint32 pageBodySize;
07000 #ifndef DR_FLAC_NO_CRC
07001         drflac_uint32 actualCRC32;
07002 #endif
07003 
07004         if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
07005             return DRFLAC_FALSE;
07006         }
07007         oggbs->currentBytePos += bytesRead;
07008 
07009         pageBodySize = drflac_ogg__get_page_body_size(&header);
07010         if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
07011             continue;   /* Invalid page size. Assume it's corrupted and just move to the next page. */
07012         }
07013 
07014         if (header.serialNumber != oggbs->serialNumber) {
07015             /* It's not a FLAC page. Skip it. */
07016             if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
07017                 return DRFLAC_FALSE;
07018             }
07019             continue;
07020         }
07021 
07022 
07023         /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
07024         if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
07025             return DRFLAC_FALSE;
07026         }
07027         oggbs->pageDataSize = pageBodySize;
07028 
07029 #ifndef DR_FLAC_NO_CRC
07030         actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
07031         if (actualCRC32 != header.checksum) {
07032             if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
07033                 continue;   /* CRC mismatch. Skip this page. */
07034             } else {
07035                 /*
07036                 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
07037                 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
07038                 seek did not fully complete.
07039                 */
07040                 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
07041                 return DRFLAC_FALSE;
07042             }
07043         }
07044 #else
07045         (void)recoveryMethod;   /* <-- Silence a warning. */
07046 #endif
07047 
07048         oggbs->currentPageHeader = header;
07049         oggbs->bytesRemainingInPage = pageBodySize;
07050         return DRFLAC_TRUE;
07051     }
07052 }
07053 
07054 /* Function below is unused at the moment, but I might be re-adding it later. */
07055 #if 0
07056 static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
07057 {
07058     drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
07059     drflac_uint8 iSeg = 0;
07060     drflac_uint32 iByte = 0;
07061     while (iByte < bytesConsumedInPage) {
07062         drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
07063         if (iByte + segmentSize > bytesConsumedInPage) {
07064             break;
07065         } else {
07066             iSeg += 1;
07067             iByte += segmentSize;
07068         }
07069     }
07070 
07071     *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
07072     return iSeg;
07073 }
07074 
07075 static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
07076 {
07077     /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
07078     for (;;) {
07079         drflac_bool32 atEndOfPage = DRFLAC_FALSE;
07080 
07081         drflac_uint8 bytesRemainingInSeg;
07082         drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
07083 
07084         drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
07085         for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
07086             drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
07087             if (segmentSize < 255) {
07088                 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
07089                     atEndOfPage = DRFLAC_TRUE;
07090                 }
07091 
07092                 break;
07093             }
07094 
07095             bytesToEndOfPacketOrPage += segmentSize;
07096         }
07097 
07098         /*
07099         At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
07100         want to load the next page and keep searching for the end of the packet.
07101         */
07102         drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
07103         oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
07104 
07105         if (atEndOfPage) {
07106             /*
07107             We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
07108             straddle pages.
07109             */
07110             if (!drflac_oggbs__goto_next_page(oggbs)) {
07111                 return DRFLAC_FALSE;
07112             }
07113 
07114             /* If it's a fresh packet it most likely means we're at the next packet. */
07115             if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
07116                 return DRFLAC_TRUE;
07117             }
07118         } else {
07119             /* We're at the next packet. */
07120             return DRFLAC_TRUE;
07121         }
07122     }
07123 }
07124 
07125 static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
07126 {
07127     /* The bitstream should be sitting on the first byte just after the header of the frame. */
07128 
07129     /* What we're actually doing here is seeking to the start of the next packet. */
07130     return drflac_oggbs__seek_to_next_packet(oggbs);
07131 }
07132 #endif
07133 
07134 static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
07135 {
07136     drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
07137     drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
07138     size_t bytesRead = 0;
07139 
07140     DRFLAC_ASSERT(oggbs != NULL);
07141     DRFLAC_ASSERT(pRunningBufferOut != NULL);
07142 
07143     /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
07144     while (bytesRead < bytesToRead) {
07145         size_t bytesRemainingToRead = bytesToRead - bytesRead;
07146 
07147         if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
07148             DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
07149             bytesRead += bytesRemainingToRead;
07150             oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
07151             break;
07152         }
07153 
07154         /* If we get here it means some of the requested data is contained in the next pages. */
07155         if (oggbs->bytesRemainingInPage > 0) {
07156             DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
07157             bytesRead += oggbs->bytesRemainingInPage;
07158             pRunningBufferOut += oggbs->bytesRemainingInPage;
07159             oggbs->bytesRemainingInPage = 0;
07160         }
07161 
07162         DRFLAC_ASSERT(bytesRemainingToRead > 0);
07163         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
07164             break;  /* Failed to go to the next page. Might have simply hit the end of the stream. */
07165         }
07166     }
07167 
07168     return bytesRead;
07169 }
07170 
07171 static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
07172 {
07173     drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
07174     int bytesSeeked = 0;
07175 
07176     DRFLAC_ASSERT(oggbs != NULL);
07177     DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
07178 
07179     /* Seeking is always forward which makes things a lot simpler. */
07180     if (origin == drflac_seek_origin_start) {
07181         if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
07182             return DRFLAC_FALSE;
07183         }
07184 
07185         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
07186             return DRFLAC_FALSE;
07187         }
07188 
07189         return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
07190     }
07191 
07192     DRFLAC_ASSERT(origin == drflac_seek_origin_current);
07193 
07194     while (bytesSeeked < offset) {
07195         int bytesRemainingToSeek = offset - bytesSeeked;
07196         DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
07197 
07198         if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
07199             bytesSeeked += bytesRemainingToSeek;
07200             (void)bytesSeeked;  /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
07201             oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
07202             break;
07203         }
07204 
07205         /* If we get here it means some of the requested data is contained in the next pages. */
07206         if (oggbs->bytesRemainingInPage > 0) {
07207             bytesSeeked += (int)oggbs->bytesRemainingInPage;
07208             oggbs->bytesRemainingInPage = 0;
07209         }
07210 
07211         DRFLAC_ASSERT(bytesRemainingToSeek > 0);
07212         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
07213             /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
07214             return DRFLAC_FALSE;
07215         }
07216     }
07217 
07218     return DRFLAC_TRUE;
07219 }
07220 
07221 
07222 static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
07223 {
07224     drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
07225     drflac_uint64 originalBytePos;
07226     drflac_uint64 runningGranulePosition;
07227     drflac_uint64 runningFrameBytePos;
07228     drflac_uint64 runningPCMFrameCount;
07229 
07230     DRFLAC_ASSERT(oggbs != NULL);
07231 
07232     originalBytePos = oggbs->currentBytePos;   /* For recovery. Points to the OggS identifier. */
07233 
07234     /* First seek to the first frame. */
07235     if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
07236         return DRFLAC_FALSE;
07237     }
07238     oggbs->bytesRemainingInPage = 0;
07239 
07240     runningGranulePosition = 0;
07241     for (;;) {
07242         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
07243             drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
07244             return DRFLAC_FALSE;   /* Never did find that sample... */
07245         }
07246 
07247         runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
07248         if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
07249             break; /* The sample is somewhere in the previous page. */
07250         }
07251 
07252         /*
07253         At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
07254         disregard any pages that do not begin a fresh packet.
07255         */
07256         if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {    /* <-- Is it a fresh page? */
07257             if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
07258                 drflac_uint8 firstBytesInPage[2];
07259                 firstBytesInPage[0] = oggbs->pageData[0];
07260                 firstBytesInPage[1] = oggbs->pageData[1];
07261 
07262                 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) {    /* <-- Does the page begin with a frame's sync code? */
07263                     runningGranulePosition = oggbs->currentPageHeader.granulePosition;
07264                 }
07265 
07266                 continue;
07267             }
07268         }
07269     }
07270 
07271     /*
07272     We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
07273     start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
07274     a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
07275     we find the one containing the target sample.
07276     */
07277     if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
07278         return DRFLAC_FALSE;
07279     }
07280     if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
07281         return DRFLAC_FALSE;
07282     }
07283 
07284     /*
07285     At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
07286     looping over these frames until we find the one containing the sample we're after.
07287     */
07288     runningPCMFrameCount = runningGranulePosition;
07289     for (;;) {
07290         /*
07291         There are two ways to find the sample and seek past irrelevant frames:
07292           1) Use the native FLAC decoder.
07293           2) Use Ogg's framing system.
07294 
07295         Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
07296         do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
07297         duplication for the decoding of frame headers.
07298 
07299         Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
07300         bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
07301         standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
07302         the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
07303         using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
07304         avoid the use of the drflac_bs object.
07305 
07306         Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
07307           1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
07308           2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
07309           3) Simplicity.
07310         */
07311         drflac_uint64 firstPCMFrameInFLACFrame = 0;
07312         drflac_uint64 lastPCMFrameInFLACFrame = 0;
07313         drflac_uint64 pcmFrameCountInThisFrame;
07314 
07315         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
07316             return DRFLAC_FALSE;
07317         }
07318 
07319         drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
07320 
07321         pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
07322 
07323         /* If we are seeking to the end of the file and we've just hit it, we're done. */
07324         if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
07325             drflac_result result = drflac__decode_flac_frame(pFlac);
07326             if (result == DRFLAC_SUCCESS) {
07327                 pFlac->currentPCMFrame = pcmFrameIndex;
07328                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
07329                 return DRFLAC_TRUE;
07330             } else {
07331                 return DRFLAC_FALSE;
07332             }
07333         }
07334 
07335         if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
07336             /*
07337             The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
07338             it never existed and keep iterating.
07339             */
07340             drflac_result result = drflac__decode_flac_frame(pFlac);
07341             if (result == DRFLAC_SUCCESS) {
07342                 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
07343                 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount);    /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
07344                 if (pcmFramesToDecode == 0) {
07345                     return DRFLAC_TRUE;
07346                 }
07347 
07348                 pFlac->currentPCMFrame = runningPCMFrameCount;
07349 
07350                 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
07351             } else {
07352                 if (result == DRFLAC_CRC_MISMATCH) {
07353                     continue;   /* CRC mismatch. Pretend this frame never existed. */
07354                 } else {
07355                     return DRFLAC_FALSE;
07356                 }
07357             }
07358         } else {
07359             /*
07360             It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
07361             frame never existed and leave the running sample count untouched.
07362             */
07363             drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
07364             if (result == DRFLAC_SUCCESS) {
07365                 runningPCMFrameCount += pcmFrameCountInThisFrame;
07366             } else {
07367                 if (result == DRFLAC_CRC_MISMATCH) {
07368                     continue;   /* CRC mismatch. Pretend this frame never existed. */
07369                 } else {
07370                     return DRFLAC_FALSE;
07371                 }
07372             }
07373         }
07374     }
07375 }
07376 
07377 
07378 
07379 static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
07380 {
07381     drflac_ogg_page_header header;
07382     drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
07383     drflac_uint32 bytesRead = 0;
07384 
07385     /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
07386     (void)relaxed;
07387 
07388     pInit->container = drflac_container_ogg;
07389     pInit->oggFirstBytePos = 0;
07390 
07391     /*
07392     We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
07393     stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
07394     any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
07395     */
07396     if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
07397         return DRFLAC_FALSE;
07398     }
07399     pInit->runningFilePos += bytesRead;
07400 
07401     for (;;) {
07402         int pageBodySize;
07403 
07404         /* Break if we're past the beginning of stream page. */
07405         if ((header.headerType & 0x02) == 0) {
07406             return DRFLAC_FALSE;
07407         }
07408 
07409         /* Check if it's a FLAC header. */
07410         pageBodySize = drflac_ogg__get_page_body_size(&header);
07411         if (pageBodySize == 51) {   /* 51 = the lacing value of the FLAC header packet. */
07412             /* It could be a FLAC page... */
07413             drflac_uint32 bytesRemainingInPage = pageBodySize;
07414             drflac_uint8 packetType;
07415 
07416             if (onRead(pUserData, &packetType, 1) != 1) {
07417                 return DRFLAC_FALSE;
07418             }
07419 
07420             bytesRemainingInPage -= 1;
07421             if (packetType == 0x7F) {
07422                 /* Increasingly more likely to be a FLAC page... */
07423                 drflac_uint8 sig[4];
07424                 if (onRead(pUserData, sig, 4) != 4) {
07425                     return DRFLAC_FALSE;
07426                 }
07427 
07428                 bytesRemainingInPage -= 4;
07429                 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
07430                     /* Almost certainly a FLAC page... */
07431                     drflac_uint8 mappingVersion[2];
07432                     if (onRead(pUserData, mappingVersion, 2) != 2) {
07433                         return DRFLAC_FALSE;
07434                     }
07435 
07436                     if (mappingVersion[0] != 1) {
07437                         return DRFLAC_FALSE;   /* Only supporting version 1.x of the Ogg mapping. */
07438                     }
07439 
07440                     /*
07441                     The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
07442                     be handling it in a generic way based on the serial number and packet types.
07443                     */
07444                     if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
07445                         return DRFLAC_FALSE;
07446                     }
07447 
07448                     /* Expecting the native FLAC signature "fLaC". */
07449                     if (onRead(pUserData, sig, 4) != 4) {
07450                         return DRFLAC_FALSE;
07451                     }
07452 
07453                     if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
07454                         /* The remaining data in the page should be the STREAMINFO block. */
07455                         drflac_streaminfo streaminfo;
07456                         drflac_uint8 isLastBlock;
07457                         drflac_uint8 blockType;
07458                         drflac_uint32 blockSize;
07459                         if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
07460                             return DRFLAC_FALSE;
07461                         }
07462 
07463                         if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
07464                             return DRFLAC_FALSE;    /* Invalid block type. First block must be the STREAMINFO block. */
07465                         }
07466 
07467                         if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
07468                             /* Success! */
07469                             pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
07470                             pInit->sampleRate              = streaminfo.sampleRate;
07471                             pInit->channels                = streaminfo.channels;
07472                             pInit->bitsPerSample           = streaminfo.bitsPerSample;
07473                             pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
07474                             pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
07475                             pInit->hasMetadataBlocks       = !isLastBlock;
07476 
07477                             if (onMeta) {
07478                                 drflac_metadata metadata;
07479                                 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
07480                                 metadata.pRawData = NULL;
07481                                 metadata.rawDataSize = 0;
07482                                 metadata.data.streaminfo = streaminfo;
07483                                 onMeta(pUserDataMD, &metadata);
07484                             }
07485 
07486                             pInit->runningFilePos  += pageBodySize;
07487                             pInit->oggFirstBytePos  = pInit->runningFilePos - 79;   /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
07488                             pInit->oggSerial        = header.serialNumber;
07489                             pInit->oggBosHeader     = header;
07490                             break;
07491                         } else {
07492                             /* Failed to read STREAMINFO block. Aww, so close... */
07493                             return DRFLAC_FALSE;
07494                         }
07495                     } else {
07496                         /* Invalid file. */
07497                         return DRFLAC_FALSE;
07498                     }
07499                 } else {
07500                     /* Not a FLAC header. Skip it. */
07501                     if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
07502                         return DRFLAC_FALSE;
07503                     }
07504                 }
07505             } else {
07506                 /* Not a FLAC header. Seek past the entire page and move on to the next. */
07507                 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
07508                     return DRFLAC_FALSE;
07509                 }
07510             }
07511         } else {
07512             if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
07513                 return DRFLAC_FALSE;
07514             }
07515         }
07516 
07517         pInit->runningFilePos += pageBodySize;
07518 
07519 
07520         /* Read the header of the next page. */
07521         if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
07522             return DRFLAC_FALSE;
07523         }
07524         pInit->runningFilePos += bytesRead;
07525     }
07526 
07527     /*
07528     If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
07529     packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
07530     Ogg bistream object.
07531     */
07532     pInit->hasMetadataBlocks = DRFLAC_TRUE;    /* <-- Always have at least VORBIS_COMMENT metadata block. */
07533     return DRFLAC_TRUE;
07534 }
07535 #endif
07536 
07537 static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
07538 {
07539     drflac_bool32 relaxed;
07540     drflac_uint8 id[4];
07541 
07542     if (pInit == NULL || onRead == NULL || onSeek == NULL) {
07543         return DRFLAC_FALSE;
07544     }
07545 
07546     DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
07547     pInit->onRead       = onRead;
07548     pInit->onSeek       = onSeek;
07549     pInit->onMeta       = onMeta;
07550     pInit->container    = container;
07551     pInit->pUserData    = pUserData;
07552     pInit->pUserDataMD  = pUserDataMD;
07553 
07554     pInit->bs.onRead    = onRead;
07555     pInit->bs.onSeek    = onSeek;
07556     pInit->bs.pUserData = pUserData;
07557     drflac__reset_cache(&pInit->bs);
07558 
07559 
07560     /* If the container is explicitly defined then we can try opening in relaxed mode. */
07561     relaxed = container != drflac_container_unknown;
07562 
07563     /* Skip over any ID3 tags. */
07564     for (;;) {
07565         if (onRead(pUserData, id, 4) != 4) {
07566             return DRFLAC_FALSE;    /* Ran out of data. */
07567         }
07568         pInit->runningFilePos += 4;
07569 
07570         if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
07571             drflac_uint8 header[6];
07572             drflac_uint8 flags;
07573             drflac_uint32 headerSize;
07574 
07575             if (onRead(pUserData, header, 6) != 6) {
07576                 return DRFLAC_FALSE;    /* Ran out of data. */
07577             }
07578             pInit->runningFilePos += 6;
07579 
07580             flags = header[1];
07581 
07582             DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
07583             headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
07584             if (flags & 0x10) {
07585                 headerSize += 10;
07586             }
07587 
07588             if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
07589                 return DRFLAC_FALSE;    /* Failed to seek past the tag. */
07590             }
07591             pInit->runningFilePos += headerSize;
07592         } else {
07593             break;
07594         }
07595     }
07596 
07597     if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
07598         return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
07599     }
07600 #ifndef DR_FLAC_NO_OGG
07601     if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
07602         return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
07603     }
07604 #endif
07605 
07606     /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
07607     if (relaxed) {
07608         if (container == drflac_container_native) {
07609             return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
07610         }
07611 #ifndef DR_FLAC_NO_OGG
07612         if (container == drflac_container_ogg) {
07613             return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
07614         }
07615 #endif
07616     }
07617 
07618     /* Unsupported container. */
07619     return DRFLAC_FALSE;
07620 }
07621 
07622 static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
07623 {
07624     DRFLAC_ASSERT(pFlac != NULL);
07625     DRFLAC_ASSERT(pInit != NULL);
07626 
07627     DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
07628     pFlac->bs                      = pInit->bs;
07629     pFlac->onMeta                  = pInit->onMeta;
07630     pFlac->pUserDataMD             = pInit->pUserDataMD;
07631     pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
07632     pFlac->sampleRate              = pInit->sampleRate;
07633     pFlac->channels                = (drflac_uint8)pInit->channels;
07634     pFlac->bitsPerSample           = (drflac_uint8)pInit->bitsPerSample;
07635     pFlac->totalPCMFrameCount      = pInit->totalPCMFrameCount;
07636     pFlac->container               = pInit->container;
07637 }
07638 
07639 
07640 static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
07641 {
07642     drflac_init_info init;
07643     drflac_uint32 allocationSize;
07644     drflac_uint32 wholeSIMDVectorCountPerChannel;
07645     drflac_uint32 decodedSamplesAllocationSize;
07646 #ifndef DR_FLAC_NO_OGG
07647     drflac_oggbs oggbs;
07648 #endif
07649     drflac_uint64 firstFramePos;
07650     drflac_uint64 seektablePos;
07651     drflac_uint32 seektableSize;
07652     drflac_allocation_callbacks allocationCallbacks;
07653     drflac* pFlac;
07654 
07655     /* CPU support first. */
07656     drflac__init_cpu_caps();
07657 
07658     if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
07659         return NULL;
07660     }
07661 
07662     if (pAllocationCallbacks != NULL) {
07663         allocationCallbacks = *pAllocationCallbacks;
07664         if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
07665             return NULL;    /* Invalid allocation callbacks. */
07666         }
07667     } else {
07668         allocationCallbacks.pUserData = NULL;
07669         allocationCallbacks.onMalloc  = drflac__malloc_default;
07670         allocationCallbacks.onRealloc = drflac__realloc_default;
07671         allocationCallbacks.onFree    = drflac__free_default;
07672     }
07673 
07674 
07675     /*
07676     The size of the allocation for the drflac object needs to be large enough to fit the following:
07677       1) The main members of the drflac structure
07678       2) A block of memory large enough to store the decoded samples of the largest frame in the stream
07679       3) If the container is Ogg, a drflac_oggbs object
07680 
07681     The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
07682     the different SIMD instruction sets.
07683     */
07684     allocationSize = sizeof(drflac);
07685 
07686     /*
07687     The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
07688     we are supporting.
07689     */
07690     if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
07691         wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
07692     } else {
07693         wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
07694     }
07695 
07696     decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
07697 
07698     allocationSize += decodedSamplesAllocationSize;
07699     allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE;  /* Allocate extra bytes to ensure we have enough for alignment. */
07700 
07701 #ifndef DR_FLAC_NO_OGG
07702     /* There's additional data required for Ogg streams. */
07703     if (init.container == drflac_container_ogg) {
07704         allocationSize += sizeof(drflac_oggbs);
07705     }
07706 
07707     DRFLAC_ZERO_MEMORY(&oggbs, sizeof(oggbs));
07708     if (init.container == drflac_container_ogg) {
07709         oggbs.onRead = onRead;
07710         oggbs.onSeek = onSeek;
07711         oggbs.pUserData = pUserData;
07712         oggbs.currentBytePos = init.oggFirstBytePos;
07713         oggbs.firstBytePos = init.oggFirstBytePos;
07714         oggbs.serialNumber = init.oggSerial;
07715         oggbs.bosPageHeader = init.oggBosHeader;
07716         oggbs.bytesRemainingInPage = 0;
07717     }
07718 #endif
07719 
07720     /*
07721     This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
07722     consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
07723     and decoding the metadata.
07724     */
07725     firstFramePos = 42;   /* <-- We know we are at byte 42 at this point. */
07726     seektablePos  = 0;
07727     seektableSize = 0;
07728     if (init.hasMetadataBlocks) {
07729         drflac_read_proc onReadOverride = onRead;
07730         drflac_seek_proc onSeekOverride = onSeek;
07731         void* pUserDataOverride = pUserData;
07732 
07733 #ifndef DR_FLAC_NO_OGG
07734         if (init.container == drflac_container_ogg) {
07735             onReadOverride = drflac__on_read_ogg;
07736             onSeekOverride = drflac__on_seek_ogg;
07737             pUserDataOverride = (void*)&oggbs;
07738         }
07739 #endif
07740 
07741         if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seektableSize, &allocationCallbacks)) {
07742             return NULL;
07743         }
07744 
07745         allocationSize += seektableSize;
07746     }
07747 
07748 
07749     pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
07750     if (pFlac == NULL) {
07751         return NULL;
07752     }
07753 
07754     drflac__init_from_info(pFlac, &init);
07755     pFlac->allocationCallbacks = allocationCallbacks;
07756     pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
07757 
07758 #ifndef DR_FLAC_NO_OGG
07759     if (init.container == drflac_container_ogg) {
07760         drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + seektableSize);
07761         *pInternalOggbs = oggbs;
07762 
07763         /* The Ogg bistream needs to be layered on top of the original bitstream. */
07764         pFlac->bs.onRead = drflac__on_read_ogg;
07765         pFlac->bs.onSeek = drflac__on_seek_ogg;
07766         pFlac->bs.pUserData = (void*)pInternalOggbs;
07767         pFlac->_oggbs = (void*)pInternalOggbs;
07768     }
07769 #endif
07770 
07771     pFlac->firstFLACFramePosInBytes = firstFramePos;
07772 
07773     /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
07774 #ifndef DR_FLAC_NO_OGG
07775     if (init.container == drflac_container_ogg)
07776     {
07777         pFlac->pSeekpoints = NULL;
07778         pFlac->seekpointCount = 0;
07779     }
07780     else
07781 #endif
07782     {
07783         /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
07784         if (seektablePos != 0) {
07785             pFlac->seekpointCount = seektableSize / sizeof(*pFlac->pSeekpoints);
07786             pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
07787 
07788             DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
07789             DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
07790 
07791             /* Seek to the seektable, then just read directly into our seektable buffer. */
07792             if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
07793                 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints, seektableSize) == seektableSize) {
07794                     /* Endian swap. */
07795                     drflac_uint32 iSeekpoint;
07796                     for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
07797                         pFlac->pSeekpoints[iSeekpoint].firstPCMFrame   = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
07798                         pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
07799                         pFlac->pSeekpoints[iSeekpoint].pcmFrameCount   = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
07800                     }
07801                 } else {
07802                     /* Failed to read the seektable. Pretend we don't have one. */
07803                     pFlac->pSeekpoints = NULL;
07804                     pFlac->seekpointCount = 0;
07805                 }
07806 
07807                 /* We need to seek back to where we were. If this fails it's a critical error. */
07808                 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
07809                     drflac__free_from_callbacks(pFlac, &allocationCallbacks);
07810                     return NULL;
07811                 }
07812             } else {
07813                 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
07814                 pFlac->pSeekpoints = NULL;
07815                 pFlac->seekpointCount = 0;
07816             }
07817         }
07818     }
07819 
07820 
07821     /*
07822     If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
07823     the first frame.
07824     */
07825     if (!init.hasStreamInfoBlock) {
07826         pFlac->currentFLACFrame.header = init.firstFrameHeader;
07827         for (;;) {
07828             drflac_result result = drflac__decode_flac_frame(pFlac);
07829             if (result == DRFLAC_SUCCESS) {
07830                 break;
07831             } else {
07832                 if (result == DRFLAC_CRC_MISMATCH) {
07833                     if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
07834                         drflac__free_from_callbacks(pFlac, &allocationCallbacks);
07835                         return NULL;
07836                     }
07837                     continue;
07838                 } else {
07839                     drflac__free_from_callbacks(pFlac, &allocationCallbacks);
07840                     return NULL;
07841                 }
07842             }
07843         }
07844     }
07845 
07846     return pFlac;
07847 }
07848 
07849 
07850 
07851 #ifndef DR_FLAC_NO_STDIO
07852 #include <stdio.h>
07853 #include <wchar.h>      /* For wcslen(), wcsrtombs() */
07854 
07855 /* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
07856 #include <errno.h>
07857 static drflac_result drflac_result_from_errno(int e)
07858 {
07859     switch (e)
07860     {
07861         case 0: return DRFLAC_SUCCESS;
07862     #ifdef EPERM
07863         case EPERM: return DRFLAC_INVALID_OPERATION;
07864     #endif
07865     #ifdef ENOENT
07866         case ENOENT: return DRFLAC_DOES_NOT_EXIST;
07867     #endif
07868     #ifdef ESRCH
07869         case ESRCH: return DRFLAC_DOES_NOT_EXIST;
07870     #endif
07871     #ifdef EINTR
07872         case EINTR: return DRFLAC_INTERRUPT;
07873     #endif
07874     #ifdef EIO
07875         case EIO: return DRFLAC_IO_ERROR;
07876     #endif
07877     #ifdef ENXIO
07878         case ENXIO: return DRFLAC_DOES_NOT_EXIST;
07879     #endif
07880     #ifdef E2BIG
07881         case E2BIG: return DRFLAC_INVALID_ARGS;
07882     #endif
07883     #ifdef ENOEXEC
07884         case ENOEXEC: return DRFLAC_INVALID_FILE;
07885     #endif
07886     #ifdef EBADF
07887         case EBADF: return DRFLAC_INVALID_FILE;
07888     #endif
07889     #ifdef ECHILD
07890         case ECHILD: return DRFLAC_ERROR;
07891     #endif
07892     #ifdef EAGAIN
07893         case EAGAIN: return DRFLAC_UNAVAILABLE;
07894     #endif
07895     #ifdef ENOMEM
07896         case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
07897     #endif
07898     #ifdef EACCES
07899         case EACCES: return DRFLAC_ACCESS_DENIED;
07900     #endif
07901     #ifdef EFAULT
07902         case EFAULT: return DRFLAC_BAD_ADDRESS;
07903     #endif
07904     #ifdef ENOTBLK
07905         case ENOTBLK: return DRFLAC_ERROR;
07906     #endif
07907     #ifdef EBUSY
07908         case EBUSY: return DRFLAC_BUSY;
07909     #endif
07910     #ifdef EEXIST
07911         case EEXIST: return DRFLAC_ALREADY_EXISTS;
07912     #endif
07913     #ifdef EXDEV
07914         case EXDEV: return DRFLAC_ERROR;
07915     #endif
07916     #ifdef ENODEV
07917         case ENODEV: return DRFLAC_DOES_NOT_EXIST;
07918     #endif
07919     #ifdef ENOTDIR
07920         case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
07921     #endif
07922     #ifdef EISDIR
07923         case EISDIR: return DRFLAC_IS_DIRECTORY;
07924     #endif
07925     #ifdef EINVAL
07926         case EINVAL: return DRFLAC_INVALID_ARGS;
07927     #endif
07928     #ifdef ENFILE
07929         case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
07930     #endif
07931     #ifdef EMFILE
07932         case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
07933     #endif
07934     #ifdef ENOTTY
07935         case ENOTTY: return DRFLAC_INVALID_OPERATION;
07936     #endif
07937     #ifdef ETXTBSY
07938         case ETXTBSY: return DRFLAC_BUSY;
07939     #endif
07940     #ifdef EFBIG
07941         case EFBIG: return DRFLAC_TOO_BIG;
07942     #endif
07943     #ifdef ENOSPC
07944         case ENOSPC: return DRFLAC_NO_SPACE;
07945     #endif
07946     #ifdef ESPIPE
07947         case ESPIPE: return DRFLAC_BAD_SEEK;
07948     #endif
07949     #ifdef EROFS
07950         case EROFS: return DRFLAC_ACCESS_DENIED;
07951     #endif
07952     #ifdef EMLINK
07953         case EMLINK: return DRFLAC_TOO_MANY_LINKS;
07954     #endif
07955     #ifdef EPIPE
07956         case EPIPE: return DRFLAC_BAD_PIPE;
07957     #endif
07958     #ifdef EDOM
07959         case EDOM: return DRFLAC_OUT_OF_RANGE;
07960     #endif
07961     #ifdef ERANGE
07962         case ERANGE: return DRFLAC_OUT_OF_RANGE;
07963     #endif
07964     #ifdef EDEADLK
07965         case EDEADLK: return DRFLAC_DEADLOCK;
07966     #endif
07967     #ifdef ENAMETOOLONG
07968         case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
07969     #endif
07970     #ifdef ENOLCK
07971         case ENOLCK: return DRFLAC_ERROR;
07972     #endif
07973     #ifdef ENOSYS
07974         case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
07975     #endif
07976     #ifdef ENOTEMPTY
07977         case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
07978     #endif
07979     #ifdef ELOOP
07980         case ELOOP: return DRFLAC_TOO_MANY_LINKS;
07981     #endif
07982     #ifdef ENOMSG
07983         case ENOMSG: return DRFLAC_NO_MESSAGE;
07984     #endif
07985     #ifdef EIDRM
07986         case EIDRM: return DRFLAC_ERROR;
07987     #endif
07988     #ifdef ECHRNG
07989         case ECHRNG: return DRFLAC_ERROR;
07990     #endif
07991     #ifdef EL2NSYNC
07992         case EL2NSYNC: return DRFLAC_ERROR;
07993     #endif
07994     #ifdef EL3HLT
07995         case EL3HLT: return DRFLAC_ERROR;
07996     #endif
07997     #ifdef EL3RST
07998         case EL3RST: return DRFLAC_ERROR;
07999     #endif
08000     #ifdef ELNRNG
08001         case ELNRNG: return DRFLAC_OUT_OF_RANGE;
08002     #endif
08003     #ifdef EUNATCH
08004         case EUNATCH: return DRFLAC_ERROR;
08005     #endif
08006     #ifdef ENOCSI
08007         case ENOCSI: return DRFLAC_ERROR;
08008     #endif
08009     #ifdef EL2HLT
08010         case EL2HLT: return DRFLAC_ERROR;
08011     #endif
08012     #ifdef EBADE
08013         case EBADE: return DRFLAC_ERROR;
08014     #endif
08015     #ifdef EBADR
08016         case EBADR: return DRFLAC_ERROR;
08017     #endif
08018     #ifdef EXFULL
08019         case EXFULL: return DRFLAC_ERROR;
08020     #endif
08021     #ifdef ENOANO
08022         case ENOANO: return DRFLAC_ERROR;
08023     #endif
08024     #ifdef EBADRQC
08025         case EBADRQC: return DRFLAC_ERROR;
08026     #endif
08027     #ifdef EBADSLT
08028         case EBADSLT: return DRFLAC_ERROR;
08029     #endif
08030     #ifdef EBFONT
08031         case EBFONT: return DRFLAC_INVALID_FILE;
08032     #endif
08033     #ifdef ENOSTR
08034         case ENOSTR: return DRFLAC_ERROR;
08035     #endif
08036     #ifdef ENODATA
08037         case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
08038     #endif
08039     #ifdef ETIME
08040         case ETIME: return DRFLAC_TIMEOUT;
08041     #endif
08042     #ifdef ENOSR
08043         case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
08044     #endif
08045     #ifdef ENONET
08046         case ENONET: return DRFLAC_NO_NETWORK;
08047     #endif
08048     #ifdef ENOPKG
08049         case ENOPKG: return DRFLAC_ERROR;
08050     #endif
08051     #ifdef EREMOTE
08052         case EREMOTE: return DRFLAC_ERROR;
08053     #endif
08054     #ifdef ENOLINK
08055         case ENOLINK: return DRFLAC_ERROR;
08056     #endif
08057     #ifdef EADV
08058         case EADV: return DRFLAC_ERROR;
08059     #endif
08060     #ifdef ESRMNT
08061         case ESRMNT: return DRFLAC_ERROR;
08062     #endif
08063     #ifdef ECOMM
08064         case ECOMM: return DRFLAC_ERROR;
08065     #endif
08066     #ifdef EPROTO
08067         case EPROTO: return DRFLAC_ERROR;
08068     #endif
08069     #ifdef EMULTIHOP
08070         case EMULTIHOP: return DRFLAC_ERROR;
08071     #endif
08072     #ifdef EDOTDOT
08073         case EDOTDOT: return DRFLAC_ERROR;
08074     #endif
08075     #ifdef EBADMSG
08076         case EBADMSG: return DRFLAC_BAD_MESSAGE;
08077     #endif
08078     #ifdef EOVERFLOW
08079         case EOVERFLOW: return DRFLAC_TOO_BIG;
08080     #endif
08081     #ifdef ENOTUNIQ
08082         case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
08083     #endif
08084     #ifdef EBADFD
08085         case EBADFD: return DRFLAC_ERROR;
08086     #endif
08087     #ifdef EREMCHG
08088         case EREMCHG: return DRFLAC_ERROR;
08089     #endif
08090     #ifdef ELIBACC
08091         case ELIBACC: return DRFLAC_ACCESS_DENIED;
08092     #endif
08093     #ifdef ELIBBAD
08094         case ELIBBAD: return DRFLAC_INVALID_FILE;
08095     #endif
08096     #ifdef ELIBSCN
08097         case ELIBSCN: return DRFLAC_INVALID_FILE;
08098     #endif
08099     #ifdef ELIBMAX
08100         case ELIBMAX: return DRFLAC_ERROR;
08101     #endif
08102     #ifdef ELIBEXEC
08103         case ELIBEXEC: return DRFLAC_ERROR;
08104     #endif
08105     #ifdef EILSEQ
08106         case EILSEQ: return DRFLAC_INVALID_DATA;
08107     #endif
08108     #ifdef ERESTART
08109         case ERESTART: return DRFLAC_ERROR;
08110     #endif
08111     #ifdef ESTRPIPE
08112         case ESTRPIPE: return DRFLAC_ERROR;
08113     #endif
08114     #ifdef EUSERS
08115         case EUSERS: return DRFLAC_ERROR;
08116     #endif
08117     #ifdef ENOTSOCK
08118         case ENOTSOCK: return DRFLAC_NOT_SOCKET;
08119     #endif
08120     #ifdef EDESTADDRREQ
08121         case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
08122     #endif
08123     #ifdef EMSGSIZE
08124         case EMSGSIZE: return DRFLAC_TOO_BIG;
08125     #endif
08126     #ifdef EPROTOTYPE
08127         case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
08128     #endif
08129     #ifdef ENOPROTOOPT
08130         case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
08131     #endif
08132     #ifdef EPROTONOSUPPORT
08133         case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
08134     #endif
08135     #ifdef ESOCKTNOSUPPORT
08136         case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
08137     #endif
08138     #ifdef EOPNOTSUPP
08139         case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
08140     #endif
08141     #ifdef EPFNOSUPPORT
08142         case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
08143     #endif
08144     #ifdef EAFNOSUPPORT
08145         case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
08146     #endif
08147     #ifdef EADDRINUSE
08148         case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
08149     #endif
08150     #ifdef EADDRNOTAVAIL
08151         case EADDRNOTAVAIL: return DRFLAC_ERROR;
08152     #endif
08153     #ifdef ENETDOWN
08154         case ENETDOWN: return DRFLAC_NO_NETWORK;
08155     #endif
08156     #ifdef ENETUNREACH
08157         case ENETUNREACH: return DRFLAC_NO_NETWORK;
08158     #endif
08159     #ifdef ENETRESET
08160         case ENETRESET: return DRFLAC_NO_NETWORK;
08161     #endif
08162     #ifdef ECONNABORTED
08163         case ECONNABORTED: return DRFLAC_NO_NETWORK;
08164     #endif
08165     #ifdef ECONNRESET
08166         case ECONNRESET: return DRFLAC_CONNECTION_RESET;
08167     #endif
08168     #ifdef ENOBUFS
08169         case ENOBUFS: return DRFLAC_NO_SPACE;
08170     #endif
08171     #ifdef EISCONN
08172         case EISCONN: return DRFLAC_ALREADY_CONNECTED;
08173     #endif
08174     #ifdef ENOTCONN
08175         case ENOTCONN: return DRFLAC_NOT_CONNECTED;
08176     #endif
08177     #ifdef ESHUTDOWN
08178         case ESHUTDOWN: return DRFLAC_ERROR;
08179     #endif
08180     #ifdef ETOOMANYREFS
08181         case ETOOMANYREFS: return DRFLAC_ERROR;
08182     #endif
08183     #ifdef ETIMEDOUT
08184         case ETIMEDOUT: return DRFLAC_TIMEOUT;
08185     #endif
08186     #ifdef ECONNREFUSED
08187         case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
08188     #endif
08189     #ifdef EHOSTDOWN
08190         case EHOSTDOWN: return DRFLAC_NO_HOST;
08191     #endif
08192     #ifdef EHOSTUNREACH
08193         case EHOSTUNREACH: return DRFLAC_NO_HOST;
08194     #endif
08195     #ifdef EALREADY
08196         case EALREADY: return DRFLAC_IN_PROGRESS;
08197     #endif
08198     #ifdef EINPROGRESS
08199         case EINPROGRESS: return DRFLAC_IN_PROGRESS;
08200     #endif
08201     #ifdef ESTALE
08202         case ESTALE: return DRFLAC_INVALID_FILE;
08203     #endif
08204     #ifdef EUCLEAN
08205         case EUCLEAN: return DRFLAC_ERROR;
08206     #endif
08207     #ifdef ENOTNAM
08208         case ENOTNAM: return DRFLAC_ERROR;
08209     #endif
08210     #ifdef ENAVAIL
08211         case ENAVAIL: return DRFLAC_ERROR;
08212     #endif
08213     #ifdef EISNAM
08214         case EISNAM: return DRFLAC_ERROR;
08215     #endif
08216     #ifdef EREMOTEIO
08217         case EREMOTEIO: return DRFLAC_IO_ERROR;
08218     #endif
08219     #ifdef EDQUOT
08220         case EDQUOT: return DRFLAC_NO_SPACE;
08221     #endif
08222     #ifdef ENOMEDIUM
08223         case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
08224     #endif
08225     #ifdef EMEDIUMTYPE
08226         case EMEDIUMTYPE: return DRFLAC_ERROR;
08227     #endif
08228     #ifdef ECANCELED
08229         case ECANCELED: return DRFLAC_CANCELLED;
08230     #endif
08231     #ifdef ENOKEY
08232         case ENOKEY: return DRFLAC_ERROR;
08233     #endif
08234     #ifdef EKEYEXPIRED
08235         case EKEYEXPIRED: return DRFLAC_ERROR;
08236     #endif
08237     #ifdef EKEYREVOKED
08238         case EKEYREVOKED: return DRFLAC_ERROR;
08239     #endif
08240     #ifdef EKEYREJECTED
08241         case EKEYREJECTED: return DRFLAC_ERROR;
08242     #endif
08243     #ifdef EOWNERDEAD
08244         case EOWNERDEAD: return DRFLAC_ERROR;
08245     #endif
08246     #ifdef ENOTRECOVERABLE
08247         case ENOTRECOVERABLE: return DRFLAC_ERROR;
08248     #endif
08249     #ifdef ERFKILL
08250         case ERFKILL: return DRFLAC_ERROR;
08251     #endif
08252     #ifdef EHWPOISON
08253         case EHWPOISON: return DRFLAC_ERROR;
08254     #endif
08255         default: return DRFLAC_ERROR;
08256     }
08257 }
08258 
08259 static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
08260 {
08261 #if _MSC_VER && _MSC_VER >= 1400
08262     errno_t err;
08263 #endif
08264 
08265     if (ppFile != NULL) {
08266         *ppFile = NULL;  /* Safety. */
08267     }
08268 
08269     if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
08270         return DRFLAC_INVALID_ARGS;
08271     }
08272 
08273 #if _MSC_VER && _MSC_VER >= 1400
08274     err = fopen_s(ppFile, pFilePath, pOpenMode);
08275     if (err != 0) {
08276         return drflac_result_from_errno(err);
08277     }
08278 #else
08279 #if defined(_WIN32) || defined(__APPLE__)
08280     *ppFile = fopen(pFilePath, pOpenMode);
08281 #else
08282     #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
08283         *ppFile = fopen64(pFilePath, pOpenMode);
08284     #else
08285         *ppFile = fopen(pFilePath, pOpenMode);
08286     #endif
08287 #endif
08288     if (*ppFile == NULL) {
08289         drflac_result result = drflac_result_from_errno(errno);
08290         if (result == DRFLAC_SUCCESS) {
08291             result = DRFLAC_ERROR;   /* Just a safety check to make sure we never ever return success when pFile == NULL. */
08292         }
08293 
08294         return result;
08295     }
08296 #endif
08297 
08298     return DRFLAC_SUCCESS;
08299 }
08300 
08301 /*
08302 _wfopen() isn't always available in all compilation environments.
08303 
08304     * Windows only.
08305     * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
08306     * MinGW-64 (both 32- and 64-bit) seems to support it.
08307     * MinGW wraps it in !defined(__STRICT_ANSI__).
08308 
08309 This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
08310 fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
08311 */
08312 #if defined(_WIN32)
08313     #if defined(_MSC_VER) || defined(__MINGW64__) || !defined(__STRICT_ANSI__)
08314         #define DRFLAC_HAS_WFOPEN
08315     #endif
08316 #endif
08317 
08318 static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
08319 {
08320     if (ppFile != NULL) {
08321         *ppFile = NULL;  /* Safety. */
08322     }
08323 
08324     if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
08325         return DRFLAC_INVALID_ARGS;
08326     }
08327 
08328 #if defined(DRFLAC_HAS_WFOPEN)
08329     {
08330         /* Use _wfopen() on Windows. */
08331     #if defined(_MSC_VER) && _MSC_VER >= 1400
08332         errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
08333         if (err != 0) {
08334             return drflac_result_from_errno(err);
08335         }
08336     #else
08337         *ppFile = _wfopen(pFilePath, pOpenMode);
08338         if (*ppFile == NULL) {
08339             return drflac_result_from_errno(errno);
08340         }
08341     #endif
08342         (void)pAllocationCallbacks;
08343     }
08344 #else
08345     /*
08346     Use fopen() on anything other than Windows. Requires a conversion. This is annoying because fopen() is locale specific. The only real way I can
08347     think of to do this is with wcsrtombs(). Note that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
08348     maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler error I'll look into improving compatibility.
08349     */
08350     {
08351         mbstate_t mbs;
08352         size_t lenMB;
08353         const wchar_t* pFilePathTemp = pFilePath;
08354         char* pFilePathMB = NULL;
08355         char pOpenModeMB[32] = {0};
08356 
08357         /* Get the length first. */
08358         DRFLAC_ZERO_OBJECT(&mbs);
08359         lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
08360         if (lenMB == (size_t)-1) {
08361             return drflac_result_from_errno(errno);
08362         }
08363 
08364         pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
08365         if (pFilePathMB == NULL) {
08366             return DRFLAC_OUT_OF_MEMORY;
08367         }
08368 
08369         pFilePathTemp = pFilePath;
08370         DRFLAC_ZERO_OBJECT(&mbs);
08371         wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
08372 
08373         /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
08374         {
08375             size_t i = 0;
08376             for (;;) {
08377                 if (pOpenMode[i] == 0) {
08378                     pOpenModeMB[i] = '\0';
08379                     break;
08380                 }
08381 
08382                 pOpenModeMB[i] = (char)pOpenMode[i];
08383                 i += 1;
08384             }
08385         }
08386 
08387         *ppFile = fopen(pFilePathMB, pOpenModeMB);
08388 
08389         drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
08390     }
08391 
08392     if (*ppFile == NULL) {
08393         return DRFLAC_ERROR;
08394     }
08395 #endif
08396 
08397     return DRFLAC_SUCCESS;
08398 }
08399 
08400 static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
08401 {
08402     return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
08403 }
08404 
08405 static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
08406 {
08407     DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
08408 
08409     return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
08410 }
08411 
08412 
08413 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
08414 {
08415     drflac* pFlac;
08416     FILE* pFile;
08417 
08418     if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
08419         return NULL;
08420     }
08421 
08422     pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
08423     if (pFlac == NULL) {
08424         fclose(pFile);
08425         return NULL;
08426     }
08427 
08428     return pFlac;
08429 }
08430 
08431 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
08432 {
08433     drflac* pFlac;
08434     FILE* pFile;
08435 
08436     if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
08437         return NULL;
08438     }
08439 
08440     pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
08441     if (pFlac == NULL) {
08442         fclose(pFile);
08443         return NULL;
08444     }
08445 
08446     return pFlac;
08447 }
08448 
08449 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08450 {
08451     drflac* pFlac;
08452     FILE* pFile;
08453 
08454     if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
08455         return NULL;
08456     }
08457 
08458     pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
08459     if (pFlac == NULL) {
08460         fclose(pFile);
08461         return pFlac;
08462     }
08463 
08464     return pFlac;
08465 }
08466 
08467 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08468 {
08469     drflac* pFlac;
08470     FILE* pFile;
08471 
08472     if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
08473         return NULL;
08474     }
08475 
08476     pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
08477     if (pFlac == NULL) {
08478         fclose(pFile);
08479         return pFlac;
08480     }
08481 
08482     return pFlac;
08483 }
08484 #endif  /* DR_FLAC_NO_STDIO */
08485 
08486 static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
08487 {
08488     drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
08489     size_t bytesRemaining;
08490 
08491     DRFLAC_ASSERT(memoryStream != NULL);
08492     DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
08493 
08494     bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
08495     if (bytesToRead > bytesRemaining) {
08496         bytesToRead = bytesRemaining;
08497     }
08498 
08499     if (bytesToRead > 0) {
08500         DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
08501         memoryStream->currentReadPos += bytesToRead;
08502     }
08503 
08504     return bytesToRead;
08505 }
08506 
08507 static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
08508 {
08509     drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
08510 
08511     DRFLAC_ASSERT(memoryStream != NULL);
08512     DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
08513 
08514     if (offset > (drflac_int64)memoryStream->dataSize) {
08515         return DRFLAC_FALSE;
08516     }
08517 
08518     if (origin == drflac_seek_origin_current) {
08519         if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
08520             memoryStream->currentReadPos += offset;
08521         } else {
08522             return DRFLAC_FALSE;  /* Trying to seek too far forward. */
08523         }
08524     } else {
08525         if ((drflac_uint32)offset <= memoryStream->dataSize) {
08526             memoryStream->currentReadPos = offset;
08527         } else {
08528             return DRFLAC_FALSE;  /* Trying to seek too far forward. */
08529         }
08530     }
08531 
08532     return DRFLAC_TRUE;
08533 }
08534 
08535 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
08536 {
08537     drflac__memory_stream memoryStream;
08538     drflac* pFlac;
08539 
08540     memoryStream.data = (const drflac_uint8*)pData;
08541     memoryStream.dataSize = dataSize;
08542     memoryStream.currentReadPos = 0;
08543     pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
08544     if (pFlac == NULL) {
08545         return NULL;
08546     }
08547 
08548     pFlac->memoryStream = memoryStream;
08549 
08550     /* This is an awful hack... */
08551 #ifndef DR_FLAC_NO_OGG
08552     if (pFlac->container == drflac_container_ogg)
08553     {
08554         drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
08555         oggbs->pUserData = &pFlac->memoryStream;
08556     }
08557     else
08558 #endif
08559     {
08560         pFlac->bs.pUserData = &pFlac->memoryStream;
08561     }
08562 
08563     return pFlac;
08564 }
08565 
08566 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08567 {
08568     drflac__memory_stream memoryStream;
08569     drflac* pFlac;
08570 
08571     memoryStream.data = (const drflac_uint8*)pData;
08572     memoryStream.dataSize = dataSize;
08573     memoryStream.currentReadPos = 0;
08574     pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
08575     if (pFlac == NULL) {
08576         return NULL;
08577     }
08578 
08579     pFlac->memoryStream = memoryStream;
08580 
08581     /* This is an awful hack... */
08582 #ifndef DR_FLAC_NO_OGG
08583     if (pFlac->container == drflac_container_ogg)
08584     {
08585         drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
08586         oggbs->pUserData = &pFlac->memoryStream;
08587     }
08588     else
08589 #endif
08590     {
08591         pFlac->bs.pUserData = &pFlac->memoryStream;
08592     }
08593 
08594     return pFlac;
08595 }
08596 
08597 
08598 
08599 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08600 {
08601     return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
08602 }
08603 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08604 {
08605     return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
08606 }
08607 
08608 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08609 {
08610     return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
08611 }
08612 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
08613 {
08614     return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
08615 }
08616 
08617 DRFLAC_API void drflac_close(drflac* pFlac)
08618 {
08619     if (pFlac == NULL) {
08620         return;
08621     }
08622 
08623 #ifndef DR_FLAC_NO_STDIO
08624     /*
08625     If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
08626     was used by looking at the callbacks.
08627     */
08628     if (pFlac->bs.onRead == drflac__on_read_stdio) {
08629         fclose((FILE*)pFlac->bs.pUserData);
08630     }
08631 
08632 #ifndef DR_FLAC_NO_OGG
08633     /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
08634     if (pFlac->container == drflac_container_ogg) {
08635         drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
08636         DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
08637 
08638         if (oggbs->onRead == drflac__on_read_stdio) {
08639             fclose((FILE*)oggbs->pUserData);
08640         }
08641     }
08642 #endif
08643 #endif
08644 
08645     drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
08646 }
08647 
08648 
08649 #if 0
08650 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08651 {
08652     drflac_uint64 i;
08653     for (i = 0; i < frameCount; ++i) {
08654         drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
08655         drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
08656         drflac_uint32 right = left - side;
08657 
08658         pOutputSamples[i*2+0] = (drflac_int32)left;
08659         pOutputSamples[i*2+1] = (drflac_int32)right;
08660     }
08661 }
08662 #endif
08663 
08664 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08665 {
08666     drflac_uint64 i;
08667     drflac_uint64 frameCount4 = frameCount >> 2;
08668     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08669     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08670     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08671     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08672 
08673     for (i = 0; i < frameCount4; ++i) {
08674         drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
08675         drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
08676         drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
08677         drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
08678 
08679         drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
08680         drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
08681         drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
08682         drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
08683 
08684         drflac_uint32 right0 = left0 - side0;
08685         drflac_uint32 right1 = left1 - side1;
08686         drflac_uint32 right2 = left2 - side2;
08687         drflac_uint32 right3 = left3 - side3;
08688 
08689         pOutputSamples[i*8+0] = (drflac_int32)left0;
08690         pOutputSamples[i*8+1] = (drflac_int32)right0;
08691         pOutputSamples[i*8+2] = (drflac_int32)left1;
08692         pOutputSamples[i*8+3] = (drflac_int32)right1;
08693         pOutputSamples[i*8+4] = (drflac_int32)left2;
08694         pOutputSamples[i*8+5] = (drflac_int32)right2;
08695         pOutputSamples[i*8+6] = (drflac_int32)left3;
08696         pOutputSamples[i*8+7] = (drflac_int32)right3;
08697     }
08698 
08699     for (i = (frameCount4 << 2); i < frameCount; ++i) {
08700         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
08701         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
08702         drflac_uint32 right = left - side;
08703 
08704         pOutputSamples[i*2+0] = (drflac_int32)left;
08705         pOutputSamples[i*2+1] = (drflac_int32)right;
08706     }
08707 }
08708 
08709 #if defined(DRFLAC_SUPPORT_SSE2)
08710 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08711 {
08712     drflac_uint64 i;
08713     drflac_uint64 frameCount4 = frameCount >> 2;
08714     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08715     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08716     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08717     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08718 
08719     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
08720 
08721     for (i = 0; i < frameCount4; ++i) {
08722         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
08723         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
08724         __m128i right = _mm_sub_epi32(left, side);
08725 
08726         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
08727         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
08728     }
08729 
08730     for (i = (frameCount4 << 2); i < frameCount; ++i) {
08731         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
08732         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
08733         drflac_uint32 right = left - side;
08734 
08735         pOutputSamples[i*2+0] = (drflac_int32)left;
08736         pOutputSamples[i*2+1] = (drflac_int32)right;
08737     }
08738 }
08739 #endif
08740 
08741 #if defined(DRFLAC_SUPPORT_NEON)
08742 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08743 {
08744     drflac_uint64 i;
08745     drflac_uint64 frameCount4 = frameCount >> 2;
08746     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08747     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08748     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08749     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08750     int32x4_t shift0_4;
08751     int32x4_t shift1_4;
08752 
08753     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
08754 
08755     shift0_4 = vdupq_n_s32(shift0);
08756     shift1_4 = vdupq_n_s32(shift1);
08757 
08758     for (i = 0; i < frameCount4; ++i) {
08759         uint32x4_t left;
08760         uint32x4_t side;
08761         uint32x4_t right;
08762 
08763         left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
08764         side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
08765         right = vsubq_u32(left, side);
08766 
08767         drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
08768     }
08769 
08770     for (i = (frameCount4 << 2); i < frameCount; ++i) {
08771         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
08772         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
08773         drflac_uint32 right = left - side;
08774 
08775         pOutputSamples[i*2+0] = (drflac_int32)left;
08776         pOutputSamples[i*2+1] = (drflac_int32)right;
08777     }
08778 }
08779 #endif
08780 
08781 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08782 {
08783 #if defined(DRFLAC_SUPPORT_SSE2)
08784     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
08785         drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08786     } else
08787 #elif defined(DRFLAC_SUPPORT_NEON)
08788     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
08789         drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08790     } else
08791 #endif
08792     {
08793         /* Scalar fallback. */
08794 #if 0
08795         drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08796 #else
08797         drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08798 #endif
08799     }
08800 }
08801 
08802 
08803 #if 0
08804 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08805 {
08806     drflac_uint64 i;
08807     for (i = 0; i < frameCount; ++i) {
08808         drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
08809         drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
08810         drflac_uint32 left  = right + side;
08811 
08812         pOutputSamples[i*2+0] = (drflac_int32)left;
08813         pOutputSamples[i*2+1] = (drflac_int32)right;
08814     }
08815 }
08816 #endif
08817 
08818 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08819 {
08820     drflac_uint64 i;
08821     drflac_uint64 frameCount4 = frameCount >> 2;
08822     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08823     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08824     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08825     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08826 
08827     for (i = 0; i < frameCount4; ++i) {
08828         drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
08829         drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
08830         drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
08831         drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
08832 
08833         drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
08834         drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
08835         drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
08836         drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
08837 
08838         drflac_uint32 left0 = right0 + side0;
08839         drflac_uint32 left1 = right1 + side1;
08840         drflac_uint32 left2 = right2 + side2;
08841         drflac_uint32 left3 = right3 + side3;
08842 
08843         pOutputSamples[i*8+0] = (drflac_int32)left0;
08844         pOutputSamples[i*8+1] = (drflac_int32)right0;
08845         pOutputSamples[i*8+2] = (drflac_int32)left1;
08846         pOutputSamples[i*8+3] = (drflac_int32)right1;
08847         pOutputSamples[i*8+4] = (drflac_int32)left2;
08848         pOutputSamples[i*8+5] = (drflac_int32)right2;
08849         pOutputSamples[i*8+6] = (drflac_int32)left3;
08850         pOutputSamples[i*8+7] = (drflac_int32)right3;
08851     }
08852 
08853     for (i = (frameCount4 << 2); i < frameCount; ++i) {
08854         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
08855         drflac_uint32 right = pInputSamples1U32[i] << shift1;
08856         drflac_uint32 left  = right + side;
08857 
08858         pOutputSamples[i*2+0] = (drflac_int32)left;
08859         pOutputSamples[i*2+1] = (drflac_int32)right;
08860     }
08861 }
08862 
08863 #if defined(DRFLAC_SUPPORT_SSE2)
08864 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08865 {
08866     drflac_uint64 i;
08867     drflac_uint64 frameCount4 = frameCount >> 2;
08868     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08869     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08870     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08871     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08872 
08873     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
08874 
08875     for (i = 0; i < frameCount4; ++i) {
08876         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
08877         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
08878         __m128i left  = _mm_add_epi32(right, side);
08879 
08880         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
08881         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
08882     }
08883 
08884     for (i = (frameCount4 << 2); i < frameCount; ++i) {
08885         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
08886         drflac_uint32 right = pInputSamples1U32[i] << shift1;
08887         drflac_uint32 left  = right + side;
08888 
08889         pOutputSamples[i*2+0] = (drflac_int32)left;
08890         pOutputSamples[i*2+1] = (drflac_int32)right;
08891     }
08892 }
08893 #endif
08894 
08895 #if defined(DRFLAC_SUPPORT_NEON)
08896 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08897 {
08898     drflac_uint64 i;
08899     drflac_uint64 frameCount4 = frameCount >> 2;
08900     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08901     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08902     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08903     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08904     int32x4_t shift0_4;
08905     int32x4_t shift1_4;
08906 
08907     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
08908 
08909     shift0_4 = vdupq_n_s32(shift0);
08910     shift1_4 = vdupq_n_s32(shift1);
08911 
08912     for (i = 0; i < frameCount4; ++i) {
08913         uint32x4_t side;
08914         uint32x4_t right;
08915         uint32x4_t left;
08916 
08917         side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
08918         right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
08919         left  = vaddq_u32(right, side);
08920 
08921         drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
08922     }
08923 
08924     for (i = (frameCount4 << 2); i < frameCount; ++i) {
08925         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
08926         drflac_uint32 right = pInputSamples1U32[i] << shift1;
08927         drflac_uint32 left  = right + side;
08928 
08929         pOutputSamples[i*2+0] = (drflac_int32)left;
08930         pOutputSamples[i*2+1] = (drflac_int32)right;
08931     }
08932 }
08933 #endif
08934 
08935 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08936 {
08937 #if defined(DRFLAC_SUPPORT_SSE2)
08938     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
08939         drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08940     } else
08941 #elif defined(DRFLAC_SUPPORT_NEON)
08942     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
08943         drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08944     } else
08945 #endif
08946     {
08947         /* Scalar fallback. */
08948 #if 0
08949         drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08950 #else
08951         drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
08952 #endif
08953     }
08954 }
08955 
08956 
08957 #if 0
08958 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08959 {
08960     for (drflac_uint64 i = 0; i < frameCount; ++i) {
08961         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08962         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08963 
08964         mid = (mid << 1) | (side & 0x01);
08965 
08966         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
08967         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
08968     }
08969 }
08970 #endif
08971 
08972 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
08973 {
08974     drflac_uint64 i;
08975     drflac_uint64 frameCount4 = frameCount >> 2;
08976     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
08977     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
08978     drflac_int32 shift = unusedBitsPerSample;
08979 
08980     if (shift > 0) {
08981         shift -= 1;
08982         for (i = 0; i < frameCount4; ++i) {
08983             drflac_uint32 temp0L;
08984             drflac_uint32 temp1L;
08985             drflac_uint32 temp2L;
08986             drflac_uint32 temp3L;
08987             drflac_uint32 temp0R;
08988             drflac_uint32 temp1R;
08989             drflac_uint32 temp2R;
08990             drflac_uint32 temp3R;
08991 
08992             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08993             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08994             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08995             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
08996 
08997             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08998             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
08999             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09000             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09001 
09002             mid0 = (mid0 << 1) | (side0 & 0x01);
09003             mid1 = (mid1 << 1) | (side1 & 0x01);
09004             mid2 = (mid2 << 1) | (side2 & 0x01);
09005             mid3 = (mid3 << 1) | (side3 & 0x01);
09006 
09007             temp0L = (mid0 + side0) << shift;
09008             temp1L = (mid1 + side1) << shift;
09009             temp2L = (mid2 + side2) << shift;
09010             temp3L = (mid3 + side3) << shift;
09011 
09012             temp0R = (mid0 - side0) << shift;
09013             temp1R = (mid1 - side1) << shift;
09014             temp2R = (mid2 - side2) << shift;
09015             temp3R = (mid3 - side3) << shift;
09016 
09017             pOutputSamples[i*8+0] = (drflac_int32)temp0L;
09018             pOutputSamples[i*8+1] = (drflac_int32)temp0R;
09019             pOutputSamples[i*8+2] = (drflac_int32)temp1L;
09020             pOutputSamples[i*8+3] = (drflac_int32)temp1R;
09021             pOutputSamples[i*8+4] = (drflac_int32)temp2L;
09022             pOutputSamples[i*8+5] = (drflac_int32)temp2R;
09023             pOutputSamples[i*8+6] = (drflac_int32)temp3L;
09024             pOutputSamples[i*8+7] = (drflac_int32)temp3R;
09025         }
09026     } else {
09027         for (i = 0; i < frameCount4; ++i) {
09028             drflac_uint32 temp0L;
09029             drflac_uint32 temp1L;
09030             drflac_uint32 temp2L;
09031             drflac_uint32 temp3L;
09032             drflac_uint32 temp0R;
09033             drflac_uint32 temp1R;
09034             drflac_uint32 temp2R;
09035             drflac_uint32 temp3R;
09036 
09037             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09038             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09039             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09040             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09041 
09042             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09043             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09044             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09045             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09046 
09047             mid0 = (mid0 << 1) | (side0 & 0x01);
09048             mid1 = (mid1 << 1) | (side1 & 0x01);
09049             mid2 = (mid2 << 1) | (side2 & 0x01);
09050             mid3 = (mid3 << 1) | (side3 & 0x01);
09051 
09052             temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
09053             temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
09054             temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
09055             temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
09056 
09057             temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
09058             temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
09059             temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
09060             temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
09061 
09062             pOutputSamples[i*8+0] = (drflac_int32)temp0L;
09063             pOutputSamples[i*8+1] = (drflac_int32)temp0R;
09064             pOutputSamples[i*8+2] = (drflac_int32)temp1L;
09065             pOutputSamples[i*8+3] = (drflac_int32)temp1R;
09066             pOutputSamples[i*8+4] = (drflac_int32)temp2L;
09067             pOutputSamples[i*8+5] = (drflac_int32)temp2R;
09068             pOutputSamples[i*8+6] = (drflac_int32)temp3L;
09069             pOutputSamples[i*8+7] = (drflac_int32)temp3R;
09070         }
09071     }
09072 
09073     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09074         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09075         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09076 
09077         mid = (mid << 1) | (side & 0x01);
09078 
09079         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
09080         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
09081     }
09082 }
09083 
09084 #if defined(DRFLAC_SUPPORT_SSE2)
09085 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09086 {
09087     drflac_uint64 i;
09088     drflac_uint64 frameCount4 = frameCount >> 2;
09089     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09090     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09091     drflac_int32 shift = unusedBitsPerSample;
09092 
09093     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09094 
09095     if (shift == 0) {
09096         for (i = 0; i < frameCount4; ++i) {
09097             __m128i mid;
09098             __m128i side;
09099             __m128i left;
09100             __m128i right;
09101 
09102             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
09103             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
09104 
09105             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
09106 
09107             left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
09108             right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
09109 
09110             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
09111             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
09112         }
09113 
09114         for (i = (frameCount4 << 2); i < frameCount; ++i) {
09115             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09116             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09117 
09118             mid = (mid << 1) | (side & 0x01);
09119 
09120             pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
09121             pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
09122         }
09123     } else {
09124         shift -= 1;
09125         for (i = 0; i < frameCount4; ++i) {
09126             __m128i mid;
09127             __m128i side;
09128             __m128i left;
09129             __m128i right;
09130 
09131             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
09132             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
09133 
09134             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
09135 
09136             left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
09137             right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
09138 
09139             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
09140             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
09141         }
09142 
09143         for (i = (frameCount4 << 2); i < frameCount; ++i) {
09144             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09145             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09146 
09147             mid = (mid << 1) | (side & 0x01);
09148 
09149             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
09150             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
09151         }
09152     }
09153 }
09154 #endif
09155 
09156 #if defined(DRFLAC_SUPPORT_NEON)
09157 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09158 {
09159     drflac_uint64 i;
09160     drflac_uint64 frameCount4 = frameCount >> 2;
09161     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09162     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09163     drflac_int32 shift = unusedBitsPerSample;
09164     int32x4_t  wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
09165     int32x4_t  wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
09166     uint32x4_t one4;
09167 
09168     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09169 
09170     wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
09171     wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
09172     one4         = vdupq_n_u32(1);
09173 
09174     if (shift == 0) {
09175         for (i = 0; i < frameCount4; ++i) {
09176             uint32x4_t mid;
09177             uint32x4_t side;
09178             int32x4_t left;
09179             int32x4_t right;
09180 
09181             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
09182             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
09183 
09184             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
09185 
09186             left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
09187             right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
09188 
09189             drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
09190         }
09191 
09192         for (i = (frameCount4 << 2); i < frameCount; ++i) {
09193             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09194             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09195 
09196             mid = (mid << 1) | (side & 0x01);
09197 
09198             pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
09199             pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
09200         }
09201     } else {
09202         int32x4_t shift4;
09203 
09204         shift -= 1;
09205         shift4 = vdupq_n_s32(shift);
09206 
09207         for (i = 0; i < frameCount4; ++i) {
09208             uint32x4_t mid;
09209             uint32x4_t side;
09210             int32x4_t left;
09211             int32x4_t right;
09212 
09213             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
09214             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
09215 
09216             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
09217 
09218             left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
09219             right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
09220 
09221             drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
09222         }
09223 
09224         for (i = (frameCount4 << 2); i < frameCount; ++i) {
09225             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09226             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09227 
09228             mid = (mid << 1) | (side & 0x01);
09229 
09230             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
09231             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
09232         }
09233     }
09234 }
09235 #endif
09236 
09237 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09238 {
09239 #if defined(DRFLAC_SUPPORT_SSE2)
09240     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
09241         drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09242     } else
09243 #elif defined(DRFLAC_SUPPORT_NEON)
09244     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
09245         drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09246     } else
09247 #endif
09248     {
09249         /* Scalar fallback. */
09250 #if 0
09251         drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09252 #else
09253         drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09254 #endif
09255     }
09256 }
09257 
09258 
09259 #if 0
09260 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09261 {
09262     for (drflac_uint64 i = 0; i < frameCount; ++i) {
09263         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
09264         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
09265     }
09266 }
09267 #endif
09268 
09269 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09270 {
09271     drflac_uint64 i;
09272     drflac_uint64 frameCount4 = frameCount >> 2;
09273     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09274     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09275     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09276     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09277 
09278     for (i = 0; i < frameCount4; ++i) {
09279         drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
09280         drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
09281         drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
09282         drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
09283 
09284         drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
09285         drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
09286         drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
09287         drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
09288 
09289         pOutputSamples[i*8+0] = (drflac_int32)tempL0;
09290         pOutputSamples[i*8+1] = (drflac_int32)tempR0;
09291         pOutputSamples[i*8+2] = (drflac_int32)tempL1;
09292         pOutputSamples[i*8+3] = (drflac_int32)tempR1;
09293         pOutputSamples[i*8+4] = (drflac_int32)tempL2;
09294         pOutputSamples[i*8+5] = (drflac_int32)tempR2;
09295         pOutputSamples[i*8+6] = (drflac_int32)tempL3;
09296         pOutputSamples[i*8+7] = (drflac_int32)tempR3;
09297     }
09298 
09299     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09300         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
09301         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
09302     }
09303 }
09304 
09305 #if defined(DRFLAC_SUPPORT_SSE2)
09306 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09307 {
09308     drflac_uint64 i;
09309     drflac_uint64 frameCount4 = frameCount >> 2;
09310     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09311     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09312     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09313     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09314 
09315     for (i = 0; i < frameCount4; ++i) {
09316         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
09317         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
09318 
09319         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
09320         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
09321     }
09322 
09323     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09324         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
09325         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
09326     }
09327 }
09328 #endif
09329 
09330 #if defined(DRFLAC_SUPPORT_NEON)
09331 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09332 {
09333     drflac_uint64 i;
09334     drflac_uint64 frameCount4 = frameCount >> 2;
09335     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09336     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09337     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09338     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09339 
09340     int32x4_t shift4_0 = vdupq_n_s32(shift0);
09341     int32x4_t shift4_1 = vdupq_n_s32(shift1);
09342 
09343     for (i = 0; i < frameCount4; ++i) {
09344         int32x4_t left;
09345         int32x4_t right;
09346 
09347         left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
09348         right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
09349 
09350         drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
09351     }
09352 
09353     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09354         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
09355         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
09356     }
09357 }
09358 #endif
09359 
09360 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
09361 {
09362 #if defined(DRFLAC_SUPPORT_SSE2)
09363     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
09364         drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09365     } else
09366 #elif defined(DRFLAC_SUPPORT_NEON)
09367     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
09368         drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09369     } else
09370 #endif
09371     {
09372         /* Scalar fallback. */
09373 #if 0
09374         drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09375 #else
09376         drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09377 #endif
09378     }
09379 }
09380 
09381 
09382 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
09383 {
09384     drflac_uint64 framesRead;
09385     drflac_uint32 unusedBitsPerSample;
09386 
09387     if (pFlac == NULL || framesToRead == 0) {
09388         return 0;
09389     }
09390 
09391     if (pBufferOut == NULL) {
09392         return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
09393     }
09394 
09395     DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
09396     unusedBitsPerSample = 32 - pFlac->bitsPerSample;
09397 
09398     framesRead = 0;
09399     while (framesToRead > 0) {
09400         /* If we've run out of samples in this frame, go to the next. */
09401         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
09402             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
09403                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
09404             }
09405         } else {
09406             unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
09407             drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
09408             drflac_uint64 frameCountThisIteration = framesToRead;
09409 
09410             if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
09411                 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
09412             }
09413 
09414             if (channelCount == 2) {
09415                 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
09416                 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
09417 
09418                 switch (pFlac->currentFLACFrame.header.channelAssignment)
09419                 {
09420                     case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
09421                     {
09422                         drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
09423                     } break;
09424 
09425                     case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
09426                     {
09427                         drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
09428                     } break;
09429 
09430                     case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
09431                     {
09432                         drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
09433                     } break;
09434 
09435                     case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
09436                     default:
09437                     {
09438                         drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
09439                     } break;
09440                 }
09441             } else {
09442                 /* Generic interleaving. */
09443                 drflac_uint64 i;
09444                 for (i = 0; i < frameCountThisIteration; ++i) {
09445                     unsigned int j;
09446                     for (j = 0; j < channelCount; ++j) {
09447                         pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
09448                     }
09449                 }
09450             }
09451 
09452             framesRead                += frameCountThisIteration;
09453             pBufferOut                += frameCountThisIteration * channelCount;
09454             framesToRead              -= frameCountThisIteration;
09455             pFlac->currentPCMFrame    += frameCountThisIteration;
09456             pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
09457         }
09458     }
09459 
09460     return framesRead;
09461 }
09462 
09463 
09464 #if 0
09465 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09466 {
09467     drflac_uint64 i;
09468     for (i = 0; i < frameCount; ++i) {
09469         drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
09470         drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
09471         drflac_uint32 right = left - side;
09472 
09473         left  >>= 16;
09474         right >>= 16;
09475 
09476         pOutputSamples[i*2+0] = (drflac_int16)left;
09477         pOutputSamples[i*2+1] = (drflac_int16)right;
09478     }
09479 }
09480 #endif
09481 
09482 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09483 {
09484     drflac_uint64 i;
09485     drflac_uint64 frameCount4 = frameCount >> 2;
09486     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09487     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09488     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09489     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09490 
09491     for (i = 0; i < frameCount4; ++i) {
09492         drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
09493         drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
09494         drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
09495         drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
09496 
09497         drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
09498         drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
09499         drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
09500         drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
09501 
09502         drflac_uint32 right0 = left0 - side0;
09503         drflac_uint32 right1 = left1 - side1;
09504         drflac_uint32 right2 = left2 - side2;
09505         drflac_uint32 right3 = left3 - side3;
09506 
09507         left0  >>= 16;
09508         left1  >>= 16;
09509         left2  >>= 16;
09510         left3  >>= 16;
09511 
09512         right0 >>= 16;
09513         right1 >>= 16;
09514         right2 >>= 16;
09515         right3 >>= 16;
09516 
09517         pOutputSamples[i*8+0] = (drflac_int16)left0;
09518         pOutputSamples[i*8+1] = (drflac_int16)right0;
09519         pOutputSamples[i*8+2] = (drflac_int16)left1;
09520         pOutputSamples[i*8+3] = (drflac_int16)right1;
09521         pOutputSamples[i*8+4] = (drflac_int16)left2;
09522         pOutputSamples[i*8+5] = (drflac_int16)right2;
09523         pOutputSamples[i*8+6] = (drflac_int16)left3;
09524         pOutputSamples[i*8+7] = (drflac_int16)right3;
09525     }
09526 
09527     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09528         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
09529         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
09530         drflac_uint32 right = left - side;
09531 
09532         left  >>= 16;
09533         right >>= 16;
09534 
09535         pOutputSamples[i*2+0] = (drflac_int16)left;
09536         pOutputSamples[i*2+1] = (drflac_int16)right;
09537     }
09538 }
09539 
09540 #if defined(DRFLAC_SUPPORT_SSE2)
09541 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09542 {
09543     drflac_uint64 i;
09544     drflac_uint64 frameCount4 = frameCount >> 2;
09545     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09546     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09547     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09548     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09549 
09550     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09551 
09552     for (i = 0; i < frameCount4; ++i) {
09553         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
09554         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
09555         __m128i right = _mm_sub_epi32(left, side);
09556 
09557         left  = _mm_srai_epi32(left,  16);
09558         right = _mm_srai_epi32(right, 16);
09559 
09560         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
09561     }
09562 
09563     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09564         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
09565         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
09566         drflac_uint32 right = left - side;
09567 
09568         left  >>= 16;
09569         right >>= 16;
09570 
09571         pOutputSamples[i*2+0] = (drflac_int16)left;
09572         pOutputSamples[i*2+1] = (drflac_int16)right;
09573     }
09574 }
09575 #endif
09576 
09577 #if defined(DRFLAC_SUPPORT_NEON)
09578 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09579 {
09580     drflac_uint64 i;
09581     drflac_uint64 frameCount4 = frameCount >> 2;
09582     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09583     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09584     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09585     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09586     int32x4_t shift0_4;
09587     int32x4_t shift1_4;
09588 
09589     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09590 
09591     shift0_4 = vdupq_n_s32(shift0);
09592     shift1_4 = vdupq_n_s32(shift1);
09593 
09594     for (i = 0; i < frameCount4; ++i) {
09595         uint32x4_t left;
09596         uint32x4_t side;
09597         uint32x4_t right;
09598 
09599         left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
09600         side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
09601         right = vsubq_u32(left, side);
09602 
09603         left  = vshrq_n_u32(left,  16);
09604         right = vshrq_n_u32(right, 16);
09605 
09606         drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
09607     }
09608 
09609     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09610         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
09611         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
09612         drflac_uint32 right = left - side;
09613 
09614         left  >>= 16;
09615         right >>= 16;
09616 
09617         pOutputSamples[i*2+0] = (drflac_int16)left;
09618         pOutputSamples[i*2+1] = (drflac_int16)right;
09619     }
09620 }
09621 #endif
09622 
09623 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09624 {
09625 #if defined(DRFLAC_SUPPORT_SSE2)
09626     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
09627         drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09628     } else
09629 #elif defined(DRFLAC_SUPPORT_NEON)
09630     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
09631         drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09632     } else
09633 #endif
09634     {
09635         /* Scalar fallback. */
09636 #if 0
09637         drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09638 #else
09639         drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09640 #endif
09641     }
09642 }
09643 
09644 
09645 #if 0
09646 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09647 {
09648     drflac_uint64 i;
09649     for (i = 0; i < frameCount; ++i) {
09650         drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
09651         drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
09652         drflac_uint32 left  = right + side;
09653 
09654         left  >>= 16;
09655         right >>= 16;
09656 
09657         pOutputSamples[i*2+0] = (drflac_int16)left;
09658         pOutputSamples[i*2+1] = (drflac_int16)right;
09659     }
09660 }
09661 #endif
09662 
09663 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09664 {
09665     drflac_uint64 i;
09666     drflac_uint64 frameCount4 = frameCount >> 2;
09667     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09668     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09669     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09670     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09671 
09672     for (i = 0; i < frameCount4; ++i) {
09673         drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
09674         drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
09675         drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
09676         drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
09677 
09678         drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
09679         drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
09680         drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
09681         drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
09682 
09683         drflac_uint32 left0 = right0 + side0;
09684         drflac_uint32 left1 = right1 + side1;
09685         drflac_uint32 left2 = right2 + side2;
09686         drflac_uint32 left3 = right3 + side3;
09687 
09688         left0  >>= 16;
09689         left1  >>= 16;
09690         left2  >>= 16;
09691         left3  >>= 16;
09692 
09693         right0 >>= 16;
09694         right1 >>= 16;
09695         right2 >>= 16;
09696         right3 >>= 16;
09697 
09698         pOutputSamples[i*8+0] = (drflac_int16)left0;
09699         pOutputSamples[i*8+1] = (drflac_int16)right0;
09700         pOutputSamples[i*8+2] = (drflac_int16)left1;
09701         pOutputSamples[i*8+3] = (drflac_int16)right1;
09702         pOutputSamples[i*8+4] = (drflac_int16)left2;
09703         pOutputSamples[i*8+5] = (drflac_int16)right2;
09704         pOutputSamples[i*8+6] = (drflac_int16)left3;
09705         pOutputSamples[i*8+7] = (drflac_int16)right3;
09706     }
09707 
09708     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09709         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
09710         drflac_uint32 right = pInputSamples1U32[i] << shift1;
09711         drflac_uint32 left  = right + side;
09712 
09713         left  >>= 16;
09714         right >>= 16;
09715 
09716         pOutputSamples[i*2+0] = (drflac_int16)left;
09717         pOutputSamples[i*2+1] = (drflac_int16)right;
09718     }
09719 }
09720 
09721 #if defined(DRFLAC_SUPPORT_SSE2)
09722 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09723 {
09724     drflac_uint64 i;
09725     drflac_uint64 frameCount4 = frameCount >> 2;
09726     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09727     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09728     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09729     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09730 
09731     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09732 
09733     for (i = 0; i < frameCount4; ++i) {
09734         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
09735         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
09736         __m128i left  = _mm_add_epi32(right, side);
09737 
09738         left  = _mm_srai_epi32(left,  16);
09739         right = _mm_srai_epi32(right, 16);
09740 
09741         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
09742     }
09743 
09744     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09745         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
09746         drflac_uint32 right = pInputSamples1U32[i] << shift1;
09747         drflac_uint32 left  = right + side;
09748 
09749         left  >>= 16;
09750         right >>= 16;
09751 
09752         pOutputSamples[i*2+0] = (drflac_int16)left;
09753         pOutputSamples[i*2+1] = (drflac_int16)right;
09754     }
09755 }
09756 #endif
09757 
09758 #if defined(DRFLAC_SUPPORT_NEON)
09759 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09760 {
09761     drflac_uint64 i;
09762     drflac_uint64 frameCount4 = frameCount >> 2;
09763     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09764     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09765     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09766     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09767     int32x4_t shift0_4;
09768     int32x4_t shift1_4;
09769 
09770     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09771 
09772     shift0_4 = vdupq_n_s32(shift0);
09773     shift1_4 = vdupq_n_s32(shift1);
09774 
09775     for (i = 0; i < frameCount4; ++i) {
09776         uint32x4_t side;
09777         uint32x4_t right;
09778         uint32x4_t left;
09779 
09780         side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
09781         right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
09782         left  = vaddq_u32(right, side);
09783 
09784         left  = vshrq_n_u32(left,  16);
09785         right = vshrq_n_u32(right, 16);
09786 
09787         drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
09788     }
09789 
09790     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09791         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
09792         drflac_uint32 right = pInputSamples1U32[i] << shift1;
09793         drflac_uint32 left  = right + side;
09794 
09795         left  >>= 16;
09796         right >>= 16;
09797 
09798         pOutputSamples[i*2+0] = (drflac_int16)left;
09799         pOutputSamples[i*2+1] = (drflac_int16)right;
09800     }
09801 }
09802 #endif
09803 
09804 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09805 {
09806 #if defined(DRFLAC_SUPPORT_SSE2)
09807     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
09808         drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09809     } else
09810 #elif defined(DRFLAC_SUPPORT_NEON)
09811     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
09812         drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09813     } else
09814 #endif
09815     {
09816         /* Scalar fallback. */
09817 #if 0
09818         drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09819 #else
09820         drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
09821 #endif
09822     }
09823 }
09824 
09825 
09826 #if 0
09827 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09828 {
09829     for (drflac_uint64 i = 0; i < frameCount; ++i) {
09830         drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09831         drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09832 
09833         mid = (mid << 1) | (side & 0x01);
09834 
09835         pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
09836         pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
09837     }
09838 }
09839 #endif
09840 
09841 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09842 {
09843     drflac_uint64 i;
09844     drflac_uint64 frameCount4 = frameCount >> 2;
09845     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09846     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09847     drflac_uint32 shift = unusedBitsPerSample;
09848 
09849     if (shift > 0) {
09850         shift -= 1;
09851         for (i = 0; i < frameCount4; ++i) {
09852             drflac_uint32 temp0L;
09853             drflac_uint32 temp1L;
09854             drflac_uint32 temp2L;
09855             drflac_uint32 temp3L;
09856             drflac_uint32 temp0R;
09857             drflac_uint32 temp1R;
09858             drflac_uint32 temp2R;
09859             drflac_uint32 temp3R;
09860 
09861             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09862             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09863             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09864             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09865 
09866             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09867             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09868             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09869             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09870 
09871             mid0 = (mid0 << 1) | (side0 & 0x01);
09872             mid1 = (mid1 << 1) | (side1 & 0x01);
09873             mid2 = (mid2 << 1) | (side2 & 0x01);
09874             mid3 = (mid3 << 1) | (side3 & 0x01);
09875 
09876             temp0L = (mid0 + side0) << shift;
09877             temp1L = (mid1 + side1) << shift;
09878             temp2L = (mid2 + side2) << shift;
09879             temp3L = (mid3 + side3) << shift;
09880 
09881             temp0R = (mid0 - side0) << shift;
09882             temp1R = (mid1 - side1) << shift;
09883             temp2R = (mid2 - side2) << shift;
09884             temp3R = (mid3 - side3) << shift;
09885 
09886             temp0L >>= 16;
09887             temp1L >>= 16;
09888             temp2L >>= 16;
09889             temp3L >>= 16;
09890 
09891             temp0R >>= 16;
09892             temp1R >>= 16;
09893             temp2R >>= 16;
09894             temp3R >>= 16;
09895 
09896             pOutputSamples[i*8+0] = (drflac_int16)temp0L;
09897             pOutputSamples[i*8+1] = (drflac_int16)temp0R;
09898             pOutputSamples[i*8+2] = (drflac_int16)temp1L;
09899             pOutputSamples[i*8+3] = (drflac_int16)temp1R;
09900             pOutputSamples[i*8+4] = (drflac_int16)temp2L;
09901             pOutputSamples[i*8+5] = (drflac_int16)temp2R;
09902             pOutputSamples[i*8+6] = (drflac_int16)temp3L;
09903             pOutputSamples[i*8+7] = (drflac_int16)temp3R;
09904         }
09905     } else {
09906         for (i = 0; i < frameCount4; ++i) {
09907             drflac_uint32 temp0L;
09908             drflac_uint32 temp1L;
09909             drflac_uint32 temp2L;
09910             drflac_uint32 temp3L;
09911             drflac_uint32 temp0R;
09912             drflac_uint32 temp1R;
09913             drflac_uint32 temp2R;
09914             drflac_uint32 temp3R;
09915 
09916             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09917             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09918             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09919             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09920 
09921             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09922             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09923             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09924             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09925 
09926             mid0 = (mid0 << 1) | (side0 & 0x01);
09927             mid1 = (mid1 << 1) | (side1 & 0x01);
09928             mid2 = (mid2 << 1) | (side2 & 0x01);
09929             mid3 = (mid3 << 1) | (side3 & 0x01);
09930 
09931             temp0L = ((drflac_int32)(mid0 + side0) >> 1);
09932             temp1L = ((drflac_int32)(mid1 + side1) >> 1);
09933             temp2L = ((drflac_int32)(mid2 + side2) >> 1);
09934             temp3L = ((drflac_int32)(mid3 + side3) >> 1);
09935 
09936             temp0R = ((drflac_int32)(mid0 - side0) >> 1);
09937             temp1R = ((drflac_int32)(mid1 - side1) >> 1);
09938             temp2R = ((drflac_int32)(mid2 - side2) >> 1);
09939             temp3R = ((drflac_int32)(mid3 - side3) >> 1);
09940 
09941             temp0L >>= 16;
09942             temp1L >>= 16;
09943             temp2L >>= 16;
09944             temp3L >>= 16;
09945 
09946             temp0R >>= 16;
09947             temp1R >>= 16;
09948             temp2R >>= 16;
09949             temp3R >>= 16;
09950 
09951             pOutputSamples[i*8+0] = (drflac_int16)temp0L;
09952             pOutputSamples[i*8+1] = (drflac_int16)temp0R;
09953             pOutputSamples[i*8+2] = (drflac_int16)temp1L;
09954             pOutputSamples[i*8+3] = (drflac_int16)temp1R;
09955             pOutputSamples[i*8+4] = (drflac_int16)temp2L;
09956             pOutputSamples[i*8+5] = (drflac_int16)temp2R;
09957             pOutputSamples[i*8+6] = (drflac_int16)temp3L;
09958             pOutputSamples[i*8+7] = (drflac_int16)temp3R;
09959         }
09960     }
09961 
09962     for (i = (frameCount4 << 2); i < frameCount; ++i) {
09963         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
09964         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
09965 
09966         mid = (mid << 1) | (side & 0x01);
09967 
09968         pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
09969         pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
09970     }
09971 }
09972 
09973 #if defined(DRFLAC_SUPPORT_SSE2)
09974 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
09975 {
09976     drflac_uint64 i;
09977     drflac_uint64 frameCount4 = frameCount >> 2;
09978     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
09979     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
09980     drflac_uint32 shift = unusedBitsPerSample;
09981 
09982     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
09983 
09984     if (shift == 0) {
09985         for (i = 0; i < frameCount4; ++i) {
09986             __m128i mid;
09987             __m128i side;
09988             __m128i left;
09989             __m128i right;
09990 
09991             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
09992             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
09993 
09994             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
09995 
09996             left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
09997             right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
09998 
09999             left  = _mm_srai_epi32(left,  16);
10000             right = _mm_srai_epi32(right, 16);
10001 
10002             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10003         }
10004 
10005         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10006             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10007             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10008 
10009             mid = (mid << 1) | (side & 0x01);
10010 
10011             pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10012             pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10013         }
10014     } else {
10015         shift -= 1;
10016         for (i = 0; i < frameCount4; ++i) {
10017             __m128i mid;
10018             __m128i side;
10019             __m128i left;
10020             __m128i right;
10021 
10022             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10023             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10024 
10025             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10026 
10027             left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10028             right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10029 
10030             left  = _mm_srai_epi32(left,  16);
10031             right = _mm_srai_epi32(right, 16);
10032 
10033             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10034         }
10035 
10036         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10037             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10038             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10039 
10040             mid = (mid << 1) | (side & 0x01);
10041 
10042             pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10043             pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10044         }
10045     }
10046 }
10047 #endif
10048 
10049 #if defined(DRFLAC_SUPPORT_NEON)
10050 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10051 {
10052     drflac_uint64 i;
10053     drflac_uint64 frameCount4 = frameCount >> 2;
10054     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10055     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10056     drflac_uint32 shift = unusedBitsPerSample;
10057     int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10058     int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10059 
10060     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10061 
10062     wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10063     wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10064 
10065     if (shift == 0) {
10066         for (i = 0; i < frameCount4; ++i) {
10067             uint32x4_t mid;
10068             uint32x4_t side;
10069             int32x4_t left;
10070             int32x4_t right;
10071 
10072             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10073             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10074 
10075             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10076 
10077             left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10078             right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10079 
10080             left  = vshrq_n_s32(left,  16);
10081             right = vshrq_n_s32(right, 16);
10082 
10083             drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10084         }
10085 
10086         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10087             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10088             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10089 
10090             mid = (mid << 1) | (side & 0x01);
10091 
10092             pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10093             pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10094         }
10095     } else {
10096         int32x4_t shift4;
10097 
10098         shift -= 1;
10099         shift4 = vdupq_n_s32(shift);
10100 
10101         for (i = 0; i < frameCount4; ++i) {
10102             uint32x4_t mid;
10103             uint32x4_t side;
10104             int32x4_t left;
10105             int32x4_t right;
10106 
10107             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10108             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10109 
10110             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10111 
10112             left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10113             right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10114 
10115             left  = vshrq_n_s32(left,  16);
10116             right = vshrq_n_s32(right, 16);
10117 
10118             drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10119         }
10120 
10121         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10122             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10123             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10124 
10125             mid = (mid << 1) | (side & 0x01);
10126 
10127             pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10128             pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10129         }
10130     }
10131 }
10132 #endif
10133 
10134 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10135 {
10136 #if defined(DRFLAC_SUPPORT_SSE2)
10137     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10138         drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10139     } else
10140 #elif defined(DRFLAC_SUPPORT_NEON)
10141     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10142         drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10143     } else
10144 #endif
10145     {
10146         /* Scalar fallback. */
10147 #if 0
10148         drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10149 #else
10150         drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10151 #endif
10152     }
10153 }
10154 
10155 
10156 #if 0
10157 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10158 {
10159     for (drflac_uint64 i = 0; i < frameCount; ++i) {
10160         pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10161         pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10162     }
10163 }
10164 #endif
10165 
10166 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10167 {
10168     drflac_uint64 i;
10169     drflac_uint64 frameCount4 = frameCount >> 2;
10170     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10171     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10172     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10173     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10174 
10175     for (i = 0; i < frameCount4; ++i) {
10176         drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10177         drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10178         drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10179         drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10180 
10181         drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10182         drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10183         drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10184         drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10185 
10186         tempL0 >>= 16;
10187         tempL1 >>= 16;
10188         tempL2 >>= 16;
10189         tempL3 >>= 16;
10190 
10191         tempR0 >>= 16;
10192         tempR1 >>= 16;
10193         tempR2 >>= 16;
10194         tempR3 >>= 16;
10195 
10196         pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10197         pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10198         pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10199         pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10200         pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10201         pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10202         pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10203         pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10204     }
10205 
10206     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10207         pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10208         pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10209     }
10210 }
10211 
10212 #if defined(DRFLAC_SUPPORT_SSE2)
10213 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10214 {
10215     drflac_uint64 i;
10216     drflac_uint64 frameCount4 = frameCount >> 2;
10217     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10218     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10219     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10220     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10221 
10222     for (i = 0; i < frameCount4; ++i) {
10223         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10224         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10225 
10226         left  = _mm_srai_epi32(left,  16);
10227         right = _mm_srai_epi32(right, 16);
10228 
10229         /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10230         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10231     }
10232 
10233     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10234         pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10235         pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10236     }
10237 }
10238 #endif
10239 
10240 #if defined(DRFLAC_SUPPORT_NEON)
10241 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10242 {
10243     drflac_uint64 i;
10244     drflac_uint64 frameCount4 = frameCount >> 2;
10245     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10246     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10247     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10248     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10249 
10250     int32x4_t shift0_4 = vdupq_n_s32(shift0);
10251     int32x4_t shift1_4 = vdupq_n_s32(shift1);
10252 
10253     for (i = 0; i < frameCount4; ++i) {
10254         int32x4_t left;
10255         int32x4_t right;
10256 
10257         left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10258         right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10259 
10260         left  = vshrq_n_s32(left,  16);
10261         right = vshrq_n_s32(right, 16);
10262 
10263         drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10264     }
10265 
10266     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10267         pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10268         pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10269     }
10270 }
10271 #endif
10272 
10273 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10274 {
10275 #if defined(DRFLAC_SUPPORT_SSE2)
10276     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10277         drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10278     } else
10279 #elif defined(DRFLAC_SUPPORT_NEON)
10280     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10281         drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10282     } else
10283 #endif
10284     {
10285         /* Scalar fallback. */
10286 #if 0
10287         drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10288 #else
10289         drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10290 #endif
10291     }
10292 }
10293 
10294 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10295 {
10296     drflac_uint64 framesRead;
10297     drflac_uint32 unusedBitsPerSample;
10298 
10299     if (pFlac == NULL || framesToRead == 0) {
10300         return 0;
10301     }
10302 
10303     if (pBufferOut == NULL) {
10304         return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10305     }
10306 
10307     DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10308     unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10309 
10310     framesRead = 0;
10311     while (framesToRead > 0) {
10312         /* If we've run out of samples in this frame, go to the next. */
10313         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10314             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10315                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
10316             }
10317         } else {
10318             unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10319             drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10320             drflac_uint64 frameCountThisIteration = framesToRead;
10321 
10322             if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10323                 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10324             }
10325 
10326             if (channelCount == 2) {
10327                 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10328                 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10329 
10330                 switch (pFlac->currentFLACFrame.header.channelAssignment)
10331                 {
10332                     case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10333                     {
10334                         drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10335                     } break;
10336 
10337                     case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10338                     {
10339                         drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10340                     } break;
10341 
10342                     case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10343                     {
10344                         drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10345                     } break;
10346 
10347                     case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10348                     default:
10349                     {
10350                         drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10351                     } break;
10352                 }
10353             } else {
10354                 /* Generic interleaving. */
10355                 drflac_uint64 i;
10356                 for (i = 0; i < frameCountThisIteration; ++i) {
10357                     unsigned int j;
10358                     for (j = 0; j < channelCount; ++j) {
10359                         drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10360                         pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10361                     }
10362                 }
10363             }
10364 
10365             framesRead                += frameCountThisIteration;
10366             pBufferOut                += frameCountThisIteration * channelCount;
10367             framesToRead              -= frameCountThisIteration;
10368             pFlac->currentPCMFrame    += frameCountThisIteration;
10369             pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10370         }
10371     }
10372 
10373     return framesRead;
10374 }
10375 
10376 
10377 #if 0
10378 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10379 {
10380     drflac_uint64 i;
10381     for (i = 0; i < frameCount; ++i) {
10382         drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10383         drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10384         drflac_uint32 right = left - side;
10385 
10386         pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
10387         pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10388     }
10389 }
10390 #endif
10391 
10392 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10393 {
10394     drflac_uint64 i;
10395     drflac_uint64 frameCount4 = frameCount >> 2;
10396     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10397     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10398     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10399     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10400 
10401     float factor = 1 / 2147483648.0;
10402 
10403     for (i = 0; i < frameCount4; ++i) {
10404         drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10405         drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10406         drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10407         drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10408 
10409         drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10410         drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10411         drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10412         drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10413 
10414         drflac_uint32 right0 = left0 - side0;
10415         drflac_uint32 right1 = left1 - side1;
10416         drflac_uint32 right2 = left2 - side2;
10417         drflac_uint32 right3 = left3 - side3;
10418 
10419         pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
10420         pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10421         pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
10422         pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10423         pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
10424         pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10425         pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
10426         pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10427     }
10428 
10429     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10430         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
10431         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
10432         drflac_uint32 right = left - side;
10433 
10434         pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
10435         pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10436     }
10437 }
10438 
10439 #if defined(DRFLAC_SUPPORT_SSE2)
10440 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10441 {
10442     drflac_uint64 i;
10443     drflac_uint64 frameCount4 = frameCount >> 2;
10444     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10445     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10446     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10447     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10448     __m128 factor;
10449 
10450     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10451 
10452     factor = _mm_set1_ps(1.0f / 8388608.0f);
10453 
10454     for (i = 0; i < frameCount4; ++i) {
10455         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10456         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10457         __m128i right = _mm_sub_epi32(left, side);
10458         __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
10459         __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10460 
10461         _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10462         _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10463     }
10464 
10465     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10466         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
10467         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
10468         drflac_uint32 right = left - side;
10469 
10470         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10471         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10472     }
10473 }
10474 #endif
10475 
10476 #if defined(DRFLAC_SUPPORT_NEON)
10477 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10478 {
10479     drflac_uint64 i;
10480     drflac_uint64 frameCount4 = frameCount >> 2;
10481     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10482     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10483     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10484     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10485     float32x4_t factor4;
10486     int32x4_t shift0_4;
10487     int32x4_t shift1_4;
10488 
10489     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10490 
10491     factor4  = vdupq_n_f32(1.0f / 8388608.0f);
10492     shift0_4 = vdupq_n_s32(shift0);
10493     shift1_4 = vdupq_n_s32(shift1);
10494 
10495     for (i = 0; i < frameCount4; ++i) {
10496         uint32x4_t left;
10497         uint32x4_t side;
10498         uint32x4_t right;
10499         float32x4_t leftf;
10500         float32x4_t rightf;
10501 
10502         left   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10503         side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10504         right  = vsubq_u32(left, side);
10505         leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
10506         rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10507 
10508         drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10509     }
10510 
10511     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10512         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
10513         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
10514         drflac_uint32 right = left - side;
10515 
10516         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10517         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10518     }
10519 }
10520 #endif
10521 
10522 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10523 {
10524 #if defined(DRFLAC_SUPPORT_SSE2)
10525     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10526         drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10527     } else
10528 #elif defined(DRFLAC_SUPPORT_NEON)
10529     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10530         drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10531     } else
10532 #endif
10533     {
10534         /* Scalar fallback. */
10535 #if 0
10536         drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10537 #else
10538         drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10539 #endif
10540     }
10541 }
10542 
10543 
10544 #if 0
10545 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10546 {
10547     drflac_uint64 i;
10548     for (i = 0; i < frameCount; ++i) {
10549         drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10550         drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10551         drflac_uint32 left  = right + side;
10552 
10553         pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
10554         pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10555     }
10556 }
10557 #endif
10558 
10559 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10560 {
10561     drflac_uint64 i;
10562     drflac_uint64 frameCount4 = frameCount >> 2;
10563     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10564     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10565     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10566     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10567     float factor = 1 / 2147483648.0;
10568 
10569     for (i = 0; i < frameCount4; ++i) {
10570         drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
10571         drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
10572         drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
10573         drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
10574 
10575         drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10576         drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10577         drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10578         drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10579 
10580         drflac_uint32 left0 = right0 + side0;
10581         drflac_uint32 left1 = right1 + side1;
10582         drflac_uint32 left2 = right2 + side2;
10583         drflac_uint32 left3 = right3 + side3;
10584 
10585         pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
10586         pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10587         pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
10588         pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10589         pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
10590         pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10591         pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
10592         pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10593     }
10594 
10595     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10596         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10597         drflac_uint32 right = pInputSamples1U32[i] << shift1;
10598         drflac_uint32 left  = right + side;
10599 
10600         pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
10601         pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10602     }
10603 }
10604 
10605 #if defined(DRFLAC_SUPPORT_SSE2)
10606 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10607 {
10608     drflac_uint64 i;
10609     drflac_uint64 frameCount4 = frameCount >> 2;
10610     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10611     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10612     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10613     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10614     __m128 factor;
10615 
10616     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10617 
10618     factor = _mm_set1_ps(1.0f / 8388608.0f);
10619 
10620     for (i = 0; i < frameCount4; ++i) {
10621         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10622         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10623         __m128i left  = _mm_add_epi32(right, side);
10624         __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
10625         __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10626 
10627         _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10628         _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10629     }
10630 
10631     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10632         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10633         drflac_uint32 right = pInputSamples1U32[i] << shift1;
10634         drflac_uint32 left  = right + side;
10635 
10636         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10637         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10638     }
10639 }
10640 #endif
10641 
10642 #if defined(DRFLAC_SUPPORT_NEON)
10643 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10644 {
10645     drflac_uint64 i;
10646     drflac_uint64 frameCount4 = frameCount >> 2;
10647     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10648     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10649     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10650     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10651     float32x4_t factor4;
10652     int32x4_t shift0_4;
10653     int32x4_t shift1_4;
10654 
10655     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10656 
10657     factor4  = vdupq_n_f32(1.0f / 8388608.0f);
10658     shift0_4 = vdupq_n_s32(shift0);
10659     shift1_4 = vdupq_n_s32(shift1);
10660 
10661     for (i = 0; i < frameCount4; ++i) {
10662         uint32x4_t side;
10663         uint32x4_t right;
10664         uint32x4_t left;
10665         float32x4_t leftf;
10666         float32x4_t rightf;
10667 
10668         side   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10669         right  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10670         left   = vaddq_u32(right, side);
10671         leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
10672         rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10673 
10674         drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10675     }
10676 
10677     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10678         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10679         drflac_uint32 right = pInputSamples1U32[i] << shift1;
10680         drflac_uint32 left  = right + side;
10681 
10682         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10683         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10684     }
10685 }
10686 #endif
10687 
10688 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10689 {
10690 #if defined(DRFLAC_SUPPORT_SSE2)
10691     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10692         drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10693     } else
10694 #elif defined(DRFLAC_SUPPORT_NEON)
10695     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10696         drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10697     } else
10698 #endif
10699     {
10700         /* Scalar fallback. */
10701 #if 0
10702         drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10703 #else
10704         drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10705 #endif
10706     }
10707 }
10708 
10709 
10710 #if 0
10711 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10712 {
10713     for (drflac_uint64 i = 0; i < frameCount; ++i) {
10714         drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10715         drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10716 
10717         mid = (mid << 1) | (side & 0x01);
10718 
10719         pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
10720         pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
10721     }
10722 }
10723 #endif
10724 
10725 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10726 {
10727     drflac_uint64 i;
10728     drflac_uint64 frameCount4 = frameCount >> 2;
10729     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10730     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10731     drflac_uint32 shift = unusedBitsPerSample;
10732     float factor = 1 / 2147483648.0;
10733 
10734     if (shift > 0) {
10735         shift -= 1;
10736         for (i = 0; i < frameCount4; ++i) {
10737             drflac_uint32 temp0L;
10738             drflac_uint32 temp1L;
10739             drflac_uint32 temp2L;
10740             drflac_uint32 temp3L;
10741             drflac_uint32 temp0R;
10742             drflac_uint32 temp1R;
10743             drflac_uint32 temp2R;
10744             drflac_uint32 temp3R;
10745 
10746             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10747             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10748             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10749             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10750 
10751             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10752             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10753             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10754             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10755 
10756             mid0 = (mid0 << 1) | (side0 & 0x01);
10757             mid1 = (mid1 << 1) | (side1 & 0x01);
10758             mid2 = (mid2 << 1) | (side2 & 0x01);
10759             mid3 = (mid3 << 1) | (side3 & 0x01);
10760 
10761             temp0L = (mid0 + side0) << shift;
10762             temp1L = (mid1 + side1) << shift;
10763             temp2L = (mid2 + side2) << shift;
10764             temp3L = (mid3 + side3) << shift;
10765 
10766             temp0R = (mid0 - side0) << shift;
10767             temp1R = (mid1 - side1) << shift;
10768             temp2R = (mid2 - side2) << shift;
10769             temp3R = (mid3 - side3) << shift;
10770 
10771             pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
10772             pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
10773             pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
10774             pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
10775             pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
10776             pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
10777             pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
10778             pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
10779         }
10780     } else {
10781         for (i = 0; i < frameCount4; ++i) {
10782             drflac_uint32 temp0L;
10783             drflac_uint32 temp1L;
10784             drflac_uint32 temp2L;
10785             drflac_uint32 temp3L;
10786             drflac_uint32 temp0R;
10787             drflac_uint32 temp1R;
10788             drflac_uint32 temp2R;
10789             drflac_uint32 temp3R;
10790 
10791             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10792             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10793             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10794             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10795 
10796             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10797             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10798             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10799             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10800 
10801             mid0 = (mid0 << 1) | (side0 & 0x01);
10802             mid1 = (mid1 << 1) | (side1 & 0x01);
10803             mid2 = (mid2 << 1) | (side2 & 0x01);
10804             mid3 = (mid3 << 1) | (side3 & 0x01);
10805 
10806             temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
10807             temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
10808             temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
10809             temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
10810 
10811             temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
10812             temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
10813             temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
10814             temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
10815 
10816             pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
10817             pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
10818             pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
10819             pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
10820             pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
10821             pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
10822             pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
10823             pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
10824         }
10825     }
10826 
10827     for (i = (frameCount4 << 2); i < frameCount; ++i) {
10828         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10829         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10830 
10831         mid = (mid << 1) | (side & 0x01);
10832 
10833         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
10834         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
10835     }
10836 }
10837 
10838 #if defined(DRFLAC_SUPPORT_SSE2)
10839 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10840 {
10841     drflac_uint64 i;
10842     drflac_uint64 frameCount4 = frameCount >> 2;
10843     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10844     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10845     drflac_uint32 shift = unusedBitsPerSample - 8;
10846     float factor;
10847     __m128 factor128;
10848 
10849     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10850 
10851     factor = 1.0f / 8388608.0f;
10852     factor128 = _mm_set1_ps(factor);
10853 
10854     if (shift == 0) {
10855         for (i = 0; i < frameCount4; ++i) {
10856             __m128i mid;
10857             __m128i side;
10858             __m128i tempL;
10859             __m128i tempR;
10860             __m128  leftf;
10861             __m128  rightf;
10862 
10863             mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10864             side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10865 
10866             mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10867 
10868             tempL  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10869             tempR  = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10870 
10871             leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
10872             rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
10873 
10874             _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10875             _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10876         }
10877 
10878         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10879             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10880             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10881 
10882             mid = (mid << 1) | (side & 0x01);
10883 
10884             pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
10885             pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
10886         }
10887     } else {
10888         shift -= 1;
10889         for (i = 0; i < frameCount4; ++i) {
10890             __m128i mid;
10891             __m128i side;
10892             __m128i tempL;
10893             __m128i tempR;
10894             __m128 leftf;
10895             __m128 rightf;
10896 
10897             mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10898             side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10899 
10900             mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10901 
10902             tempL  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10903             tempR  = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10904 
10905             leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
10906             rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
10907 
10908             _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10909             _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10910         }
10911 
10912         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10913             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10914             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10915 
10916             mid = (mid << 1) | (side & 0x01);
10917 
10918             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
10919             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
10920         }
10921     }
10922 }
10923 #endif
10924 
10925 #if defined(DRFLAC_SUPPORT_NEON)
10926 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10927 {
10928     drflac_uint64 i;
10929     drflac_uint64 frameCount4 = frameCount >> 2;
10930     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10931     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10932     drflac_uint32 shift = unusedBitsPerSample - 8;
10933     float factor;
10934     float32x4_t factor4;
10935     int32x4_t shift4;
10936     int32x4_t wbps0_4;  /* Wasted Bits Per Sample */
10937     int32x4_t wbps1_4;  /* Wasted Bits Per Sample */
10938 
10939     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10940 
10941     factor  = 1.0f / 8388608.0f;
10942     factor4 = vdupq_n_f32(factor);
10943     wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10944     wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10945 
10946     if (shift == 0) {
10947         for (i = 0; i < frameCount4; ++i) {
10948             int32x4_t lefti;
10949             int32x4_t righti;
10950             float32x4_t leftf;
10951             float32x4_t rightf;
10952 
10953             uint32x4_t mid  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
10954             uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
10955 
10956             mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10957 
10958             lefti  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10959             righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10960 
10961             leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
10962             rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
10963 
10964             drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10965         }
10966 
10967         for (i = (frameCount4 << 2); i < frameCount; ++i) {
10968             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10969             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10970 
10971             mid = (mid << 1) | (side & 0x01);
10972 
10973             pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
10974             pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
10975         }
10976     } else {
10977         shift -= 1;
10978         shift4 = vdupq_n_s32(shift);
10979         for (i = 0; i < frameCount4; ++i) {
10980             uint32x4_t mid;
10981             uint32x4_t side;
10982             int32x4_t lefti;
10983             int32x4_t righti;
10984             float32x4_t leftf;
10985             float32x4_t rightf;
10986 
10987             mid    = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
10988             side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
10989 
10990             mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10991 
10992             lefti  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10993             righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10994 
10995             leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
10996             rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
10997 
10998             drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10999         }
11000 
11001         for (i = (frameCount4 << 2); i < frameCount; ++i) {
11002             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11003             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11004 
11005             mid = (mid << 1) | (side & 0x01);
11006 
11007             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11008             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11009         }
11010     }
11011 }
11012 #endif
11013 
11014 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11015 {
11016 #if defined(DRFLAC_SUPPORT_SSE2)
11017     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11018         drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11019     } else
11020 #elif defined(DRFLAC_SUPPORT_NEON)
11021     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11022         drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11023     } else
11024 #endif
11025     {
11026         /* Scalar fallback. */
11027 #if 0
11028         drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11029 #else
11030         drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11031 #endif
11032     }
11033 }
11034 
11035 #if 0
11036 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11037 {
11038     for (drflac_uint64 i = 0; i < frameCount; ++i) {
11039         pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11040         pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11041     }
11042 }
11043 #endif
11044 
11045 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11046 {
11047     drflac_uint64 i;
11048     drflac_uint64 frameCount4 = frameCount >> 2;
11049     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11050     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11051     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11052     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11053     float factor = 1 / 2147483648.0;
11054 
11055     for (i = 0; i < frameCount4; ++i) {
11056         drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11057         drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11058         drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11059         drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11060 
11061         drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11062         drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11063         drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11064         drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11065 
11066         pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11067         pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11068         pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11069         pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11070         pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11071         pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11072         pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11073         pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11074     }
11075 
11076     for (i = (frameCount4 << 2); i < frameCount; ++i) {
11077         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11078         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11079     }
11080 }
11081 
11082 #if defined(DRFLAC_SUPPORT_SSE2)
11083 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11084 {
11085     drflac_uint64 i;
11086     drflac_uint64 frameCount4 = frameCount >> 2;
11087     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11088     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11089     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11090     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11091 
11092     float factor = 1.0f / 8388608.0f;
11093     __m128 factor128 = _mm_set1_ps(factor);
11094 
11095     for (i = 0; i < frameCount4; ++i) {
11096         __m128i lefti;
11097         __m128i righti;
11098         __m128 leftf;
11099         __m128 rightf;
11100 
11101         lefti  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11102         righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11103 
11104         leftf  = _mm_mul_ps(_mm_cvtepi32_ps(lefti),  factor128);
11105         rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11106 
11107         _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11108         _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11109     }
11110 
11111     for (i = (frameCount4 << 2); i < frameCount; ++i) {
11112         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11113         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11114     }
11115 }
11116 #endif
11117 
11118 #if defined(DRFLAC_SUPPORT_NEON)
11119 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11120 {
11121     drflac_uint64 i;
11122     drflac_uint64 frameCount4 = frameCount >> 2;
11123     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11124     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11125     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11126     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11127 
11128     float factor = 1.0f / 8388608.0f;
11129     float32x4_t factor4 = vdupq_n_f32(factor);
11130     int32x4_t shift0_4  = vdupq_n_s32(shift0);
11131     int32x4_t shift1_4  = vdupq_n_s32(shift1);
11132 
11133     for (i = 0; i < frameCount4; ++i) {
11134         int32x4_t lefti;
11135         int32x4_t righti;
11136         float32x4_t leftf;
11137         float32x4_t rightf;
11138 
11139         lefti  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11140         righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11141 
11142         leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
11143         rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11144 
11145         drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11146     }
11147 
11148     for (i = (frameCount4 << 2); i < frameCount; ++i) {
11149         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11150         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11151     }
11152 }
11153 #endif
11154 
11155 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11156 {
11157 #if defined(DRFLAC_SUPPORT_SSE2)
11158     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11159         drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11160     } else
11161 #elif defined(DRFLAC_SUPPORT_NEON)
11162     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11163         drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11164     } else
11165 #endif
11166     {
11167         /* Scalar fallback. */
11168 #if 0
11169         drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11170 #else
11171         drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11172 #endif
11173     }
11174 }
11175 
11176 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11177 {
11178     drflac_uint64 framesRead;
11179     drflac_uint32 unusedBitsPerSample;
11180 
11181     if (pFlac == NULL || framesToRead == 0) {
11182         return 0;
11183     }
11184 
11185     if (pBufferOut == NULL) {
11186         return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11187     }
11188 
11189     DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11190     unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11191 
11192     framesRead = 0;
11193     while (framesToRead > 0) {
11194         /* If we've run out of samples in this frame, go to the next. */
11195         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11196             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11197                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
11198             }
11199         } else {
11200             unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11201             drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11202             drflac_uint64 frameCountThisIteration = framesToRead;
11203 
11204             if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11205                 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11206             }
11207 
11208             if (channelCount == 2) {
11209                 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11210                 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11211 
11212                 switch (pFlac->currentFLACFrame.header.channelAssignment)
11213                 {
11214                     case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11215                     {
11216                         drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11217                     } break;
11218 
11219                     case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11220                     {
11221                         drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11222                     } break;
11223 
11224                     case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11225                     {
11226                         drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11227                     } break;
11228 
11229                     case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11230                     default:
11231                     {
11232                         drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11233                     } break;
11234                 }
11235             } else {
11236                 /* Generic interleaving. */
11237                 drflac_uint64 i;
11238                 for (i = 0; i < frameCountThisIteration; ++i) {
11239                     unsigned int j;
11240                     for (j = 0; j < channelCount; ++j) {
11241                         drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11242                         pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11243                     }
11244                 }
11245             }
11246 
11247             framesRead                += frameCountThisIteration;
11248             pBufferOut                += frameCountThisIteration * channelCount;
11249             framesToRead              -= frameCountThisIteration;
11250             pFlac->currentPCMFrame    += frameCountThisIteration;
11251             pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11252         }
11253     }
11254 
11255     return framesRead;
11256 }
11257 
11258 
11259 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11260 {
11261     if (pFlac == NULL) {
11262         return DRFLAC_FALSE;
11263     }
11264 
11265     /* Don't do anything if we're already on the seek point. */
11266     if (pFlac->currentPCMFrame == pcmFrameIndex) {
11267         return DRFLAC_TRUE;
11268     }
11269 
11270     /*
11271     If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11272     when the decoder was opened.
11273     */
11274     if (pFlac->firstFLACFramePosInBytes == 0) {
11275         return DRFLAC_FALSE;
11276     }
11277 
11278     if (pcmFrameIndex == 0) {
11279         pFlac->currentPCMFrame = 0;
11280         return drflac__seek_to_first_frame(pFlac);
11281     } else {
11282         drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11283 
11284         /* Clamp the sample to the end. */
11285         if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11286             pcmFrameIndex = pFlac->totalPCMFrameCount;
11287         }
11288 
11289         /* If the target sample and the current sample are in the same frame we just move the position forward. */
11290         if (pcmFrameIndex > pFlac->currentPCMFrame) {
11291             /* Forward. */
11292             drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11293             if (pFlac->currentFLACFrame.pcmFramesRemaining >  offset) {
11294                 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11295                 pFlac->currentPCMFrame = pcmFrameIndex;
11296                 return DRFLAC_TRUE;
11297             }
11298         } else {
11299             /* Backward. */
11300             drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11301             drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11302             drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11303             if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11304                 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11305                 pFlac->currentPCMFrame = pcmFrameIndex;
11306                 return DRFLAC_TRUE;
11307             }
11308         }
11309 
11310         /*
11311         Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11312         we'll instead use Ogg's natural seeking facility.
11313         */
11314 #ifndef DR_FLAC_NO_OGG
11315         if (pFlac->container == drflac_container_ogg)
11316         {
11317             wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11318         }
11319         else
11320 #endif
11321         {
11322             /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11323             if (!pFlac->_noSeekTableSeek) {
11324                 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11325             }
11326 
11327 #if !defined(DR_FLAC_NO_CRC)
11328             /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11329             if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11330                 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11331             }
11332 #endif
11333 
11334             /* Fall back to brute force if all else fails. */
11335             if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11336                 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11337             }
11338         }
11339 
11340         pFlac->currentPCMFrame = pcmFrameIndex;
11341         return wasSuccessful;
11342     }
11343 }
11344 
11345 
11346 
11347 /* High Level APIs */
11348 
11349 #if defined(SIZE_MAX)
11350     #define DRFLAC_SIZE_MAX  SIZE_MAX
11351 #else
11352     #if defined(DRFLAC_64BIT)
11353         #define DRFLAC_SIZE_MAX  ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11354     #else
11355         #define DRFLAC_SIZE_MAX  0xFFFFFFFF
11356     #endif
11357 #endif
11358 
11359 
11360 /* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11361 #define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11362 static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11363 {                                                                                                                                                                   \
11364     type* pSampleData = NULL;                                                                                                                                       \
11365     drflac_uint64 totalPCMFrameCount;                                                                                                                               \
11366                                                                                                                                                                     \
11367     DRFLAC_ASSERT(pFlac != NULL);                                                                                                                                   \
11368                                                                                                                                                                     \
11369     totalPCMFrameCount = pFlac->totalPCMFrameCount;                                                                                                                 \
11370                                                                                                                                                                     \
11371     if (totalPCMFrameCount == 0) {                                                                                                                                  \
11372         type buffer[4096];                                                                                                                                          \
11373         drflac_uint64 pcmFramesRead;                                                                                                                                \
11374         size_t sampleDataBufferSize = sizeof(buffer);                                                                                                               \
11375                                                                                                                                                                     \
11376         pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks);                                                      \
11377         if (pSampleData == NULL) {                                                                                                                                  \
11378             goto on_error;                                                                                                                                          \
11379         }                                                                                                                                                           \
11380                                                                                                                                                                     \
11381         while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) {          \
11382             if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) {                                                   \
11383                 type* pNewSampleData;                                                                                                                               \
11384                 size_t newSampleDataBufferSize;                                                                                                                     \
11385                                                                                                                                                                     \
11386                 newSampleDataBufferSize = sampleDataBufferSize * 2;                                                                                                 \
11387                 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks);    \
11388                 if (pNewSampleData == NULL) {                                                                                                                       \
11389                     drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks);                                                                          \
11390                     goto on_error;                                                                                                                                  \
11391                 }                                                                                                                                                   \
11392                                                                                                                                                                     \
11393                 sampleDataBufferSize = newSampleDataBufferSize;                                                                                                     \
11394                 pSampleData = pNewSampleData;                                                                                                                       \
11395             }                                                                                                                                                       \
11396                                                                                                                                                                     \
11397             DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type)));                   \
11398             totalPCMFrameCount += pcmFramesRead;                                                                                                                    \
11399         }                                                                                                                                                           \
11400                                                                                                                                                                     \
11401         /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to                                       \
11402            protect those ears from random noise! */                                                                                                                 \
11403         DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type)));   \
11404     } else {                                                                                                                                                        \
11405         drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type);                                                                                   \
11406         if (dataSize > DRFLAC_SIZE_MAX) {                                                                                                                           \
11407             goto on_error;  /* The decoded data is too big. */                                                                                                      \
11408         }                                                                                                                                                           \
11409                                                                                                                                                                     \
11410         pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks);    /* <-- Safe cast as per the check above. */           \
11411         if (pSampleData == NULL) {                                                                                                                                  \
11412             goto on_error;                                                                                                                                          \
11413         }                                                                                                                                                           \
11414                                                                                                                                                                     \
11415         totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData);                                                     \
11416     }                                                                                                                                                               \
11417                                                                                                                                                                     \
11418     if (sampleRateOut) *sampleRateOut = pFlac->sampleRate;                                                                                                          \
11419     if (channelsOut) *channelsOut = pFlac->channels;                                                                                                                \
11420     if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount;                                                                                         \
11421                                                                                                                                                                     \
11422     drflac_close(pFlac);                                                                                                                                            \
11423     return pSampleData;                                                                                                                                             \
11424                                                                                                                                                                     \
11425 on_error:                                                                                                                                                           \
11426     drflac_close(pFlac);                                                                                                                                            \
11427     return NULL;                                                                                                                                                    \
11428 }
11429 
11430 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11431 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11432 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11433 
11434 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11435 {
11436     drflac* pFlac;
11437 
11438     if (channelsOut) {
11439         *channelsOut = 0;
11440     }
11441     if (sampleRateOut) {
11442         *sampleRateOut = 0;
11443     }
11444     if (totalPCMFrameCountOut) {
11445         *totalPCMFrameCountOut = 0;
11446     }
11447 
11448     pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11449     if (pFlac == NULL) {
11450         return NULL;
11451     }
11452 
11453     return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11454 }
11455 
11456 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11457 {
11458     drflac* pFlac;
11459 
11460     if (channelsOut) {
11461         *channelsOut = 0;
11462     }
11463     if (sampleRateOut) {
11464         *sampleRateOut = 0;
11465     }
11466     if (totalPCMFrameCountOut) {
11467         *totalPCMFrameCountOut = 0;
11468     }
11469 
11470     pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11471     if (pFlac == NULL) {
11472         return NULL;
11473     }
11474 
11475     return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11476 }
11477 
11478 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11479 {
11480     drflac* pFlac;
11481 
11482     if (channelsOut) {
11483         *channelsOut = 0;
11484     }
11485     if (sampleRateOut) {
11486         *sampleRateOut = 0;
11487     }
11488     if (totalPCMFrameCountOut) {
11489         *totalPCMFrameCountOut = 0;
11490     }
11491 
11492     pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11493     if (pFlac == NULL) {
11494         return NULL;
11495     }
11496 
11497     return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11498 }
11499 
11500 #ifndef DR_FLAC_NO_STDIO
11501 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11502 {
11503     drflac* pFlac;
11504 
11505     if (sampleRate) {
11506         *sampleRate = 0;
11507     }
11508     if (channels) {
11509         *channels = 0;
11510     }
11511     if (totalPCMFrameCount) {
11512         *totalPCMFrameCount = 0;
11513     }
11514 
11515     pFlac = drflac_open_file(filename, pAllocationCallbacks);
11516     if (pFlac == NULL) {
11517         return NULL;
11518     }
11519 
11520     return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11521 }
11522 
11523 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11524 {
11525     drflac* pFlac;
11526 
11527     if (sampleRate) {
11528         *sampleRate = 0;
11529     }
11530     if (channels) {
11531         *channels = 0;
11532     }
11533     if (totalPCMFrameCount) {
11534         *totalPCMFrameCount = 0;
11535     }
11536 
11537     pFlac = drflac_open_file(filename, pAllocationCallbacks);
11538     if (pFlac == NULL) {
11539         return NULL;
11540     }
11541 
11542     return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11543 }
11544 
11545 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11546 {
11547     drflac* pFlac;
11548 
11549     if (sampleRate) {
11550         *sampleRate = 0;
11551     }
11552     if (channels) {
11553         *channels = 0;
11554     }
11555     if (totalPCMFrameCount) {
11556         *totalPCMFrameCount = 0;
11557     }
11558 
11559     pFlac = drflac_open_file(filename, pAllocationCallbacks);
11560     if (pFlac == NULL) {
11561         return NULL;
11562     }
11563 
11564     return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11565 }
11566 #endif
11567 
11568 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11569 {
11570     drflac* pFlac;
11571 
11572     if (sampleRate) {
11573         *sampleRate = 0;
11574     }
11575     if (channels) {
11576         *channels = 0;
11577     }
11578     if (totalPCMFrameCount) {
11579         *totalPCMFrameCount = 0;
11580     }
11581 
11582     pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11583     if (pFlac == NULL) {
11584         return NULL;
11585     }
11586 
11587     return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11588 }
11589 
11590 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11591 {
11592     drflac* pFlac;
11593 
11594     if (sampleRate) {
11595         *sampleRate = 0;
11596     }
11597     if (channels) {
11598         *channels = 0;
11599     }
11600     if (totalPCMFrameCount) {
11601         *totalPCMFrameCount = 0;
11602     }
11603 
11604     pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11605     if (pFlac == NULL) {
11606         return NULL;
11607     }
11608 
11609     return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11610 }
11611 
11612 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11613 {
11614     drflac* pFlac;
11615 
11616     if (sampleRate) {
11617         *sampleRate = 0;
11618     }
11619     if (channels) {
11620         *channels = 0;
11621     }
11622     if (totalPCMFrameCount) {
11623         *totalPCMFrameCount = 0;
11624     }
11625 
11626     pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11627     if (pFlac == NULL) {
11628         return NULL;
11629     }
11630 
11631     return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11632 }
11633 
11634 
11635 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11636 {
11637     if (pAllocationCallbacks != NULL) {
11638         drflac__free_from_callbacks(p, pAllocationCallbacks);
11639     } else {
11640         drflac__free_default(p, NULL);
11641     }
11642 }
11643 
11644 
11645 
11646 
11647 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11648 {
11649     if (pIter == NULL) {
11650         return;
11651     }
11652 
11653     pIter->countRemaining = commentCount;
11654     pIter->pRunningData   = (const char*)pComments;
11655 }
11656 
11657 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11658 {
11659     drflac_int32 length;
11660     const char* pComment;
11661 
11662     /* Safety. */
11663     if (pCommentLengthOut) {
11664         *pCommentLengthOut = 0;
11665     }
11666 
11667     if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11668         return NULL;
11669     }
11670 
11671     length = drflac__le2host_32(*(const drflac_uint32*)pIter->pRunningData);
11672     pIter->pRunningData += 4;
11673 
11674     pComment = pIter->pRunningData;
11675     pIter->pRunningData += length;
11676     pIter->countRemaining -= 1;
11677 
11678     if (pCommentLengthOut) {
11679         *pCommentLengthOut = length;
11680     }
11681 
11682     return pComment;
11683 }
11684 
11685 
11686 
11687 
11688 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
11689 {
11690     if (pIter == NULL) {
11691         return;
11692     }
11693 
11694     pIter->countRemaining = trackCount;
11695     pIter->pRunningData   = (const char*)pTrackData;
11696 }
11697 
11698 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
11699 {
11700     drflac_cuesheet_track cuesheetTrack;
11701     const char* pRunningData;
11702     drflac_uint64 offsetHi;
11703     drflac_uint64 offsetLo;
11704 
11705     if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11706         return DRFLAC_FALSE;
11707     }
11708 
11709     pRunningData = pIter->pRunningData;
11710 
11711     offsetHi                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
11712     offsetLo                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
11713     cuesheetTrack.offset       = offsetLo | (offsetHi << 32);
11714     cuesheetTrack.trackNumber  = pRunningData[0];                                         pRunningData += 1;
11715     DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC));     pRunningData += 12;
11716     cuesheetTrack.isAudio      = (pRunningData[0] & 0x80) != 0;
11717     cuesheetTrack.preEmphasis  = (pRunningData[0] & 0x40) != 0;                           pRunningData += 14;
11718     cuesheetTrack.indexCount   = pRunningData[0];                                         pRunningData += 1;
11719     cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData;        pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
11720 
11721     pIter->pRunningData = pRunningData;
11722     pIter->countRemaining -= 1;
11723 
11724     if (pCuesheetTrack) {
11725         *pCuesheetTrack = cuesheetTrack;
11726     }
11727 
11728     return DRFLAC_TRUE;
11729 }
11730 
11731 #if defined(__GNUC__)
11732     #pragma GCC diagnostic pop
11733 #endif
11734 #endif  /* DR_FLAC_IMPLEMENTATION */
11735 
11736 
11737 /*
11738 REVISION HISTORY
11739 ================
11740 v0.12.13 - 2020-05-16
11741   - Add compile-time and run-time version querying.
11742     - DRFLAC_VERSION_MINOR
11743     - DRFLAC_VERSION_MAJOR
11744     - DRFLAC_VERSION_REVISION
11745     - DRFLAC_VERSION_STRING
11746     - drflac_version()
11747     - drflac_version_string()
11748 
11749 v0.12.12 - 2020-04-30
11750   - Fix compilation errors with VC6.
11751 
11752 v0.12.11 - 2020-04-19
11753   - Fix some pedantic warnings.
11754   - Fix some undefined behaviour warnings.
11755 
11756 v0.12.10 - 2020-04-10
11757   - Fix some bugs when trying to seek with an invalid seek table.
11758 
11759 v0.12.9 - 2020-04-05
11760   - Fix warnings.
11761 
11762 v0.12.8 - 2020-04-04
11763   - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
11764   - Fix some static analysis warnings.
11765   - Minor documentation updates.
11766 
11767 v0.12.7 - 2020-03-14
11768   - Fix compilation errors with VC6.
11769 
11770 v0.12.6 - 2020-03-07
11771   - Fix compilation error with Visual Studio .NET 2003.
11772 
11773 v0.12.5 - 2020-01-30
11774   - Silence some static analysis warnings.
11775 
11776 v0.12.4 - 2020-01-29
11777   - Silence some static analysis warnings.
11778 
11779 v0.12.3 - 2019-12-02
11780   - Fix some warnings when compiling with GCC and the -Og flag.
11781   - Fix a crash in out-of-memory situations.
11782   - Fix potential integer overflow bug.
11783   - Fix some static analysis warnings.
11784   - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
11785   - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
11786 
11787 v0.12.2 - 2019-10-07
11788   - Internal code clean up.
11789 
11790 v0.12.1 - 2019-09-29
11791   - Fix some Clang Static Analyzer warnings.
11792   - Fix an unused variable warning.
11793 
11794 v0.12.0 - 2019-09-23
11795   - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
11796     routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
11797     - drflac_open()
11798     - drflac_open_relaxed()
11799     - drflac_open_with_metadata()
11800     - drflac_open_with_metadata_relaxed()
11801     - drflac_open_file()
11802     - drflac_open_file_with_metadata()
11803     - drflac_open_memory()
11804     - drflac_open_memory_with_metadata()
11805     - drflac_open_and_read_pcm_frames_s32()
11806     - drflac_open_and_read_pcm_frames_s16()
11807     - drflac_open_and_read_pcm_frames_f32()
11808     - drflac_open_file_and_read_pcm_frames_s32()
11809     - drflac_open_file_and_read_pcm_frames_s16()
11810     - drflac_open_file_and_read_pcm_frames_f32()
11811     - drflac_open_memory_and_read_pcm_frames_s32()
11812     - drflac_open_memory_and_read_pcm_frames_s16()
11813     - drflac_open_memory_and_read_pcm_frames_f32()
11814     Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
11815     DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
11816   - Remove deprecated APIs:
11817     - drflac_read_s32()
11818     - drflac_read_s16()
11819     - drflac_read_f32()
11820     - drflac_seek_to_sample()
11821     - drflac_open_and_decode_s32()
11822     - drflac_open_and_decode_s16()
11823     - drflac_open_and_decode_f32()
11824     - drflac_open_and_decode_file_s32()
11825     - drflac_open_and_decode_file_s16()
11826     - drflac_open_and_decode_file_f32()
11827     - drflac_open_and_decode_memory_s32()
11828     - drflac_open_and_decode_memory_s16()
11829     - drflac_open_and_decode_memory_f32()
11830   - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
11831     by doing pFlac->totalPCMFrameCount*pFlac->channels.
11832   - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
11833   - Fix errors when seeking to the end of a stream.
11834   - Optimizations to seeking.
11835   - SSE improvements and optimizations.
11836   - ARM NEON optimizations.
11837   - Optimizations to drflac_read_pcm_frames_s16().
11838   - Optimizations to drflac_read_pcm_frames_s32().
11839 
11840 v0.11.10 - 2019-06-26
11841   - Fix a compiler error.
11842 
11843 v0.11.9 - 2019-06-16
11844   - Silence some ThreadSanitizer warnings.
11845 
11846 v0.11.8 - 2019-05-21
11847   - Fix warnings.
11848 
11849 v0.11.7 - 2019-05-06
11850   - C89 fixes.
11851 
11852 v0.11.6 - 2019-05-05
11853   - Add support for C89.
11854   - Fix a compiler warning when CRC is disabled.
11855   - Change license to choice of public domain or MIT-0.
11856 
11857 v0.11.5 - 2019-04-19
11858   - Fix a compiler error with GCC.
11859 
11860 v0.11.4 - 2019-04-17
11861   - Fix some warnings with GCC when compiling with -std=c99.
11862 
11863 v0.11.3 - 2019-04-07
11864   - Silence warnings with GCC.
11865 
11866 v0.11.2 - 2019-03-10
11867   - Fix a warning.
11868 
11869 v0.11.1 - 2019-02-17
11870   - Fix a potential bug with seeking.
11871 
11872 v0.11.0 - 2018-12-16
11873   - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
11874     drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
11875     and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
11876     dividing it by the channel count, and then do the same with the return value.
11877   - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
11878     the changes to drflac_read_*() apply.
11879   - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
11880     the changes to drflac_read_*() apply.
11881   - Optimizations.
11882 
11883 v0.10.0 - 2018-09-11
11884   - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
11885     need to do it yourself via the callback API.
11886   - Fix the clang build.
11887   - Fix undefined behavior.
11888   - Fix errors with CUESHEET metdata blocks.
11889   - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
11890     Vorbis comment API.
11891   - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
11892   - Minor optimizations.
11893 
11894 v0.9.11 - 2018-08-29
11895   - Fix a bug with sample reconstruction.
11896 
11897 v0.9.10 - 2018-08-07
11898   - Improve 64-bit detection.
11899 
11900 v0.9.9 - 2018-08-05
11901   - Fix C++ build on older versions of GCC.
11902 
11903 v0.9.8 - 2018-07-24
11904   - Fix compilation errors.
11905 
11906 v0.9.7 - 2018-07-05
11907   - Fix a warning.
11908 
11909 v0.9.6 - 2018-06-29
11910   - Fix some typos.
11911 
11912 v0.9.5 - 2018-06-23
11913   - Fix some warnings.
11914 
11915 v0.9.4 - 2018-06-14
11916   - Optimizations to seeking.
11917   - Clean up.
11918 
11919 v0.9.3 - 2018-05-22
11920   - Bug fix.
11921 
11922 v0.9.2 - 2018-05-12
11923   - Fix a compilation error due to a missing break statement.
11924 
11925 v0.9.1 - 2018-04-29
11926   - Fix compilation error with Clang.
11927 
11928 v0.9 - 2018-04-24
11929   - Fix Clang build.
11930   - Start using major.minor.revision versioning.
11931 
11932 v0.8g - 2018-04-19
11933   - Fix build on non-x86/x64 architectures.
11934 
11935 v0.8f - 2018-02-02
11936   - Stop pretending to support changing rate/channels mid stream.
11937 
11938 v0.8e - 2018-02-01
11939   - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
11940   - Fix a crash the the Rice partition order is invalid.
11941 
11942 v0.8d - 2017-09-22
11943   - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
11944 
11945 v0.8c - 2017-09-07
11946   - Fix warning on non-x86/x64 architectures.
11947 
11948 v0.8b - 2017-08-19
11949   - Fix build on non-x86/x64 architectures.
11950 
11951 v0.8a - 2017-08-13
11952   - A small optimization for the Clang build.
11953 
11954 v0.8 - 2017-08-12
11955   - API CHANGE: Rename dr_* types to drflac_*.
11956   - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
11957   - Add support for custom implementations of malloc(), realloc(), etc.
11958   - Add CRC checking to Ogg encapsulated streams.
11959   - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
11960   - Bug fixes.
11961 
11962 v0.7 - 2017-07-23
11963   - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
11964 
11965 v0.6 - 2017-07-22
11966   - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
11967     never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
11968 
11969 v0.5 - 2017-07-16
11970   - Fix typos.
11971   - Change drflac_bool* types to unsigned.
11972   - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
11973 
11974 v0.4f - 2017-03-10
11975   - Fix a couple of bugs with the bitstreaming code.
11976 
11977 v0.4e - 2017-02-17
11978   - Fix some warnings.
11979 
11980 v0.4d - 2016-12-26
11981   - Add support for 32-bit floating-point PCM decoding.
11982   - Use drflac_int* and drflac_uint* sized types to improve compiler support.
11983   - Minor improvements to documentation.
11984 
11985 v0.4c - 2016-12-26
11986   - Add support for signed 16-bit integer PCM decoding.
11987 
11988 v0.4b - 2016-10-23
11989   - A minor change to drflac_bool8 and drflac_bool32 types.
11990 
11991 v0.4a - 2016-10-11
11992   - Rename drBool32 to drflac_bool32 for styling consistency.
11993 
11994 v0.4 - 2016-09-29
11995   - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
11996   - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
11997   - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
11998     keep it consistent with drflac_audio.
11999 
12000 v0.3f - 2016-09-21
12001   - Fix a warning with GCC.
12002 
12003 v0.3e - 2016-09-18
12004   - Fixed a bug where GCC 4.3+ was not getting properly identified.
12005   - Fixed a few typos.
12006   - Changed date formats to ISO 8601 (YYYY-MM-DD).
12007 
12008 v0.3d - 2016-06-11
12009   - Minor clean up.
12010 
12011 v0.3c - 2016-05-28
12012   - Fixed compilation error.
12013 
12014 v0.3b - 2016-05-16
12015   - Fixed Linux/GCC build.
12016   - Updated documentation.
12017 
12018 v0.3a - 2016-05-15
12019   - Minor fixes to documentation.
12020 
12021 v0.3 - 2016-05-11
12022   - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12023   - Lots of clean up.
12024 
12025 v0.2b - 2016-05-10
12026   - Bug fixes.
12027 
12028 v0.2a - 2016-05-10
12029   - Made drflac_open_and_decode() more robust.
12030   - Removed an unused debugging variable
12031 
12032 v0.2 - 2016-05-09
12033   - Added support for Ogg encapsulation.
12034   - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12035     should be relative to the start or the current position. Also changes the seeking rules such that
12036     seeking offsets will never be negative.
12037   - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12038 
12039 v0.1b - 2016-05-07
12040   - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12041   - Removed a stale comment.
12042 
12043 v0.1a - 2016-05-05
12044   - Minor formatting changes.
12045   - Fixed a warning on the GCC build.
12046 
12047 v0.1 - 2016-05-03
12048   - Initial versioned release.
12049 */
12050 
12051 /*
12052 This software is available as a choice of the following licenses. Choose
12053 whichever you prefer.
12054 
12055 ===============================================================================
12056 ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12057 ===============================================================================
12058 This is free and unencumbered software released into the public domain.
12059 
12060 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12061 software, either in source code form or as a compiled binary, for any purpose,
12062 commercial or non-commercial, and by any means.
12063 
12064 In jurisdictions that recognize copyright laws, the author or authors of this
12065 software dedicate any and all copyright interest in the software to the public
12066 domain. We make this dedication for the benefit of the public at large and to
12067 the detriment of our heirs and successors. We intend this dedication to be an
12068 overt act of relinquishment in perpetuity of all present and future rights to
12069 this software under copyright law.
12070 
12071 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12072 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12073 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12074 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12075 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12076 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12077 
12078 For more information, please refer to <http://unlicense.org/>
12079 
12080 ===============================================================================
12081 ALTERNATIVE 2 - MIT No Attribution
12082 ===============================================================================
12083 Copyright 2020 David Reid
12084 
12085 Permission is hereby granted, free of charge, to any person obtaining a copy of
12086 this software and associated documentation files (the "Software"), to deal in
12087 the Software without restriction, including without limitation the rights to
12088 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12089 of the Software, and to permit persons to whom the Software is furnished to do
12090 so.
12091 
12092 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12093 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12094 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12095 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12096 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12097 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12098 SOFTWARE.
12099 */