2013-04-12 21:45:41 < Skyler_> is AMD going to ream me for committing AVX2 with their opencl?
2013-04-12 21:48:13 < muggs> heh
2013-04-12 21:48:46 < muggs> noone can fault you for making x264 more efficient
2013-04-12 21:49:41 < Skyler_> unfortunately I can't tell anyone how much faster it is! people will probably assume it helps more than it really does though.
2013-04-12 22:05:31 < muggs> so.. you can't say how much AVX2 helps because you don't have a haswell, or because Intel has you under NDA?
2013-04-12 22:05:42 < Skyler_> Latter. I wouldn't commit this much stuff (or really be able to write it) without a haswell.
2013-04-12 22:06:10 < Skyler_> I don't think I'm actually under a specific NDA right now (?) but since I'm using someone's haswell and I might have a past NDA, I should err on the side of caution
2013-04-23 22:15:37 < kierank> how did you end up getting a haswell in the end?
2013-04-23 22:16:37 < Skyler_> kierank: ask gramner
2013-04-23 22:18:28 < kierank> Gramner: how did you get a haswell
2013-04-23 22:18:40 < Gramner> I asked Intel if they could give me one. Then lots of bureaucracy and paperwork
Uusi hotti käsky aiheuttaa pettymyksen
2013-05-13 19:28:56 < Skyler_> Gramner: also, vgather is so so so slow, I find it hard to believe
2013-05-13 19:29:13 < Skyler_> I did a vgather-based sad_x4_4x4 implementation and I was like 12->32 cycles or something
2013-05-13 19:29:23 < Skyler_> it's not just "okay it's no faster, and there's overhead in calculating addresses"
2013-05-13 19:29:26 < Skyler_> it's "wowow this is horrible"
2013-05-13 19:29:34 < Gramner> i know. i tried it once and it was kinda ridiculous
2013-05-13 19:29:53 < Skyler_> like I am amazed it was actually possible to make an implementation this terrible
2013-05-13 19:29:55 < Gramner> it's way, way, way slower than doing individual loads
Ensimmäisten revikoiden aikaan
2013-06-01 15:10:17 < Skyler_> Here were my results from a bit back, since I suppose I can mention them now
2013-06-01 15:10:19 < Skyler_> This is with AVX2
2013-06-01 15:10:36 < Skyler_> Haswell: ~17% faster than Ivy Bridge, ~28% faster than Sandy Bridge, ~39% faster than Nehalem
2013-06-01 15:10:40 < Skyler_> on a clock for clock basis
2013-06-01 15:12:03 < Skyler_> (x264 of course)
2013-06-01 15:13:16 < BugMaster> Have you tested without AVX2 support also (i.e. old builds without special support)?
2013-06-01 15:21:01 < Skyler_> AVX2 gives ~5%, so you should be able to do the math there
2013-06-01 15:21:06 < Skyler_> (just pull ~5% off each of those)
2013-06-01 15:32:38 < boiled_sugar> can you try with --asm to disable avx2?
2013-06-01 15:33:58 < Skyler_> I did, but I don't have access to the haswell over the weekend
2013-06-01 15:34:02 < Skyler_> but I can confirm it's about 5% less