Ниже приведен алгоритм на C #, который должен быть намного быстрее (и использовать меньше памяти), чем алгоритм, который вы опубликовали. Он не использует аккуратный бинарный трюк, используемый вами, и, как следствие, код становится длиннее. У него на несколько for
циклов больше, чем у вас, и может потребоваться время или два, чтобы пройти его с отладчиком, чтобы полностью его прогнать. Но на самом деле это более простой подход, когда вы понимаете, что он делает.
В качестве бонуса возвращаемые наборы находятся в более «естественном» порядке. Он вернет подмножества набора {1 2 3} в том же порядке, в котором вы перечислили их в своем вопросе. Это не фокус, но побочный эффект используемого алгоритма.
В моих тестах я обнаружил, что этот алгоритм примерно в 4 раза быстрее, чем алгоритм, который вы опубликовали для большого набора из 22 элементов (который был настолько большим, насколько я мог использовать на своей машине, без чрезмерного разбивания диска и искажения результатов много). Одна ваша попытка заняла около 15,5 секунд, а моя - около 3,6 секунд.
Для небольших списков разница менее выражена. Для набора только из 10 предметов ваш бегал 10 000 раз за 7,8 секунды, а мой - около 3,2 секунды. Для наборов с 5 или менее предметами они бегут близко к одному и тому же времени. Со многими итерациями ваша работа выполняется немного быстрее.
В любом случае, вот код. Извини, что так долго; Я пытался убедиться, что я прокомментировал это хорошо.
/*
* Made it static, because it shouldn't really use or modify state data.
* Making it static also saves a tiny bit of call time, because it doesn't
* have to receive an extra "this" pointer. Also, accessing a local
* parameter is a tiny bit faster than accessing a class member, because
* dereferencing the "this" pointer is not free.
*
* Made it generic so that the same code can handle sets of any type.
*/
static IList<IList<T>> PowerSet<T>(IList<T> set){
if(set == null)
throw new ArgumentNullException("set");
/*
* Caveat:
* If set.Count > 30, this function pukes all over itself without so
* much as wiping up afterwards. Even for 30 elements, though, the
* result set is about 68 GB (if "set" is comprised of ints). 24 or
* 25 elements is a practical limit for current hardware.
*/
int setSize = set.Count;
int subsetCount = 1 << setSize; // MUCH faster than (int)Math.Pow(2, setSize)
T[][] rtn = new T[subsetCount][];
/*
* We don't really need dynamic list allocation. We can calculate
* in advance the number of subsets ("subsetCount" above), and
* the size of each subset (0 through setSize). The performance
* of List<> is pretty horrible when the initial size is not
* guessed well.
*/
int subsetIndex = 0;
for(int subsetSize = 0; subsetSize <= setSize; subsetSize++){
/*
* The "indices" array below is part of how we implement the
* "natural" ordering of the subsets. For a subset of size 3,
* for example, we initialize the indices array with {0, 1, 2};
* Later, we'll increment each index until we reach setSize,
* then carry over to the next index. So, assuming a set size
* of 5, the second iteration will have indices {0, 1, 3}, the
* third will have {0, 1, 4}, and the fifth will involve a carry,
* so we'll have {0, 2, 3}.
*/
int[] indices = new int[subsetSize];
for(int i = 1; i < subsetSize; i++)
indices[i] = i;
/*
* Now we'll iterate over all the subsets we need to make for the
* current subset size. The number of subsets of a given size
* is easily determined with combination (nCr). In other words,
* if I have 5 items in my set and I want all subsets of size 3,
* I need 5-pick-3, or 5C3 = 5! / 3!(5 - 3)! = 10.
*/
for(int i = Combination(setSize, subsetSize); i > 0; i--){
/*
* Copy the items from the input set according to the
* indices we've already set up. Alternatively, if you
* just wanted the indices in your output, you could
* just dup the index array here (but make sure you dup!
* Otherwise the setup step at the bottom of this for
* loop will mess up your output list! You'll also want
* to change the function's return type to
* IList<IList<int>> in that case.
*/
T[] subset = new T[subsetSize];
for(int j = 0; j < subsetSize; j++)
subset[j] = set[indices[j]];
/* Add the subset to the return */
rtn[subsetIndex++] = subset;
/*
* Set up indices for next subset. This looks a lot
* messier than it is. It simply increments the
* right-most index until it overflows, then carries
* over left as far as it needs to. I've made the
* logic as fast as I could, which is why it's hairy-
* looking. Note that the inner for loop won't
* actually run as long as a carry isn't required,
* and will run at most once in any case. The outer
* loop will go through as few iterations as required.
*
* You may notice that this logic doesn't check the
* end case (when the left-most digit overflows). It
* doesn't need to, since the loop up above won't
* execute again in that case, anyway. There's no
* reason to waste time checking that here.
*/
for(int j = subsetSize - 1; j >= 0; j--)
if(++indices[j] <= setSize - subsetSize + j){
for(int k = j + 1; k < subsetSize; k++)
indices[k] = indices[k - 1] + 1;
break;
}
}
}
return rtn;
}
static int Combination(int n, int r){
if(r == 0 || r == n)
return 1;
/*
* The formula for combination is:
*
* n!
* ----------
* r!(n - r)!
*
* We'll actually use a slightly modified version here. The above
* formula forces us to calculate (n - r)! twice. Instead, we only
* multiply for the numerator the factors of n! that aren't canceled
* out by (n - r)! in the denominator.
*/
/*
* nCr == nC(n - r)
* We can use this fact to reduce the number of multiplications we
* perform, as well as the incidence of overflow, where r > n / 2
*/
if(r > n / 2) /* We DO want integer truncation here (7 / 2 = 3) */
r = n - r;
/*
* I originally used all integer math below, with some complicated
* logic and another function to handle cases where the intermediate
* results overflowed a 32-bit int. It was pretty ugly. In later
* testing, I found that the more generalized double-precision
* floating-point approach was actually *faster*, so there was no
* need for the ugly code. But if you want to see a giant WTF, look
* at the edit history for this post!
*/
double denominator = Factorial(r);
double numerator = n;
while(--r > 0)
numerator *= --n;
return (int)(numerator / denominator + 0.1/* Deal with rounding errors. */);
}
/*
* The archetypical factorial implementation is recursive, and is perhaps
* the most often used demonstration of recursion in text books and other
* materials. It's unfortunate, however, that few texts point out that
* it's nearly as simple to write an iterative factorial function that
* will perform better (although tail-end recursion, if implemented by
* the compiler, will help to close the gap).
*/
static double Factorial(int x){
/*
* An all-purpose factorial function would handle negative numbers
* correctly - the result should be Sign(x) * Factorial(Abs(x)) -
* but since we don't need that functionality, we're better off
* saving the few extra clock cycles it would take.
*/
/*
* I originally used all integer math below, but found that the
* double-precision floating-point version is not only more
* general, but also *faster*!
*/
if(x < 2)
return 1;
double rtn = x;
while(--x > 1)
rtn *= x;
return rtn;
}