tag:blogger.com,1999:blog-11077412.post715001276780021031..comments2022-03-30T06:41:16.363+01:00Comments on José Fonseca's Tech blog: Fast SSE2 pow: tables or polynomials?Unknownnoreply@blogger.comBlogger14125tag:blogger.com,1999:blog-11077412.post-71935773379296268482013-11-29T14:34:17.087+00:002013-11-29T14:34:17.087+00:00> What's the license of this code? Is it pu...> What's the license of this code? Is it public domain/GPL(2/3)/BSD?<br /><br />Consider the code in my post to be MIT license<br /><br />> I'd like to include it in my synthesizer (haruhi.mulabs.org), which is GPL-3 licensed.<br /><br />This code has been incorporated and lives on in <a href="http://www.mesa3d.org/" rel="nofollow">Mesa3D</a> source code:<br /><br />- <a href="http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/tgsi/tgsi_sse2.c?h=7.9#n727" rel="nofollow">SSE2 instrinsics</a><br /><br />- <a href="http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/gallivm/lp_bld_arit.c?h=10.0#n3161" rel="nofollow">LLVM IR</a><br /><br />The SSE2 intrinsic version has now been abandoned. We only use the LLVM IR JIT version which has much more improvements (more accuracy, NaN handling, etc.) The principle is still the same though.José Fonsecahttps://www.blogger.com/profile/12703660821260306458noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-35361039202703653192011-09-17T22:18:05.571+01:002011-09-17T22:18:05.571+01:00Anonymous> SSE2 does not mean double precision ...Anonymous> SSE2 does not mean double precision support. this code is indeed SSE2, solely because of the integer intrinsics that are used.<br />_mm_cvtepi32_ps, _mm_srli_epi32, all these do not exist in the original SSE instruction set, and you need SSE2 support to execute them.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-11077412.post-54701217412084253552011-02-06T12:22:28.951+00:002011-02-06T12:22:28.951+00:00What's the license of this code? Is it public ...What's the license of this code? Is it public domain/GPL(2/3)/BSD? I'd like to include it in my synthesizer (haruhi.mulabs.org), which is GPL-3 licensed.mcvhttp://mcv.mulabs.org/noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-88821133349816262612010-03-11T17:31:14.813+00:002010-03-11T17:31:14.813+00:00congratulations on your great work, its fantastic ...congratulations on your great work, its fantastic to see more Portuguese on OSS :)Pinheirohttps://www.blogger.com/profile/09015629986071654104noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-1672205594470066432010-02-15T15:10:31.288+00:002010-02-15T15:10:31.288+00:00@Anonymous, many months later...
Actually, there A...@Anonymous, many months later...<br />Actually, there ARE SSE2 instructions in there: the int32 arithmetic stuff cannot be done on 4 values in a shot with pure SSE.<br />I guess the version processing doubles was left as an exercise to the reader.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-11077412.post-73445470426662873002009-08-03T16:59:23.747+01:002009-08-03T16:59:23.747+01:00BTW, did you use asciidoc for formatting your code...BTW, did you use asciidoc for formatting your code?RPGhttps://www.blogger.com/profile/06976708620547905344noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-35817709673746586432009-08-03T14:38:28.732+01:002009-08-03T14:38:28.732+01:00Hi,
I am an occasional contributor to eigen (http...Hi,<br /><br />I am an occasional contributor to eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page), a wonderful linear algebra library. I am looking to adapt the pow function presented here. For this, I need to be clear on the licensing of the code. What is it's license?<br /><br />Eigen's licensing FAQis here. (http://eigen.tuxfamily.org/index.php?title=FAQ#Licensing).<br /><br />Thanks for the nice example.RPGhttps://www.blogger.com/profile/06976708620547905344noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-1440011588850529052009-05-22T09:45:39.319+01:002009-05-22T09:45:39.319+01:00Hi,
Can I use this code like under a BSD license ...Hi,<br /><br />Can I use this code like under a BSD license or something? I would use it in the context of our research project software, and that is currently not public, and we definitely cannot introduce GPL code there.Unknownhttps://www.blogger.com/profile/05601075460008932740noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-45698640256829883852009-01-03T16:29:00.000+00:002009-01-03T16:29:00.000+00:00first of all, thanks for the code.But the title 'F...first of all, thanks for the code.<BR/><BR/>But the title 'Fast sse2..' is a bit misleading. <BR/>There's not a single sse2 intrinsic in your code, rather it's just sse (float) that you're using. <BR/><BR/>Would be nice to have a true sse2 variant (double) of the code posted here.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-11077412.post-52038723761846457542008-09-30T17:26:00.000+01:002008-09-30T17:26:00.000+01:00Did you check if most of the pow's were in fact e^...<I>Did you check if most of the pow's were in fact e^x?</I><BR/><BR/>It might be the case in some examples, but not the case of a particular application we were interested in.<BR/><BR/><I>Maybe optimising for the e^x case might make things better also.</I><BR/><BR/>Optimize the e^x case when compiling the shaders would be the ideal, but I'm not sure if it is implemented or possible at all. Detecting in runtime might not compensate overall, but still worth giving it a try.José Fonsecahttps://www.blogger.com/profile/12703660821260306458noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-33901924308628186712008-09-30T01:06:00.000+01:002008-09-30T01:06:00.000+01:00As a general rule, on modern systems, lookup table...As a general rule, on modern systems, lookup tables rarely help unless they replace a substantial amount of calculation. The overhead of memory access, and the high speed of modern processors, makes it reasonable to expend many instructions to avoid a single memory access.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-11077412.post-44876056133859046392008-09-28T23:30:00.000+01:002008-09-28T23:30:00.000+01:00Did you check if most of the pow's were in fact e^...Did you check if most of the pow's were in fact e^x?<BR/><BR/>That was the case when I looked at this with idr before.<BR/><BR/>Maybe optimising for the e^x case might make things better also.Dave Airliehttps://www.blogger.com/profile/03386351362681039664noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-75409633103391963712008-09-28T14:29:00.000+01:002008-09-28T14:29:00.000+01:00Do the tables need to be CONST in a shared library...<I>Do the tables need to be CONST in a shared library? Then multiple users could share the same copy in cache.</I><BR/><BR/>If the tables are const in a shared <BR/>library and used by other applications, then the likelihood of the table is in the cache is indeed higher. But that does not help if, for example, precedent/subsequent computations need to lookup a lot of texture data, which evicts the table from the cache.<BR/><BR/>The polynomial approach not only does not suffer from that problem, as it has faster / more accurate than table lookups, at least for SSE.José Fonsecahttps://www.blogger.com/profile/12703660821260306458noreply@blogger.comtag:blogger.com,1999:blog-11077412.post-90555574074664059962008-09-28T14:02:00.000+01:002008-09-28T14:02:00.000+01:00Do the tables need to be CONST in a shared library...Do the tables need to be CONST in a shared library? Then multiple users could share the same copy in cache.Jon Smirlhttps://www.blogger.com/profile/07070851719018470603noreply@blogger.com