As noted, by using shift registers, it's possible to control any number of outputs using a small number of CPU pins. With regard to your specific requirements, you don't specify the number of brightness levels you require, but you could probably achieve a reasonable number of brightness levels if you have the shift registers' "register latch" pin tied to an output-enable. Assume you want 128 brightness levels at a 60Hz or better refresh rate, and it takes 100us to clock out the bits to select and load a row.
Clock out bit 0 of the brightness for each light in row 0, then pulse the latch/output enable for 20us. Then clock out bit 1 the brightness for each light in row 0 and pulse the latch/output enable for 40us. Then bit 2 and pulse for 80us. For bits 3-6, the pulse lengths will keep doubling but you'll be able to clock in the next bit of data during the "on" part of the cycle (since you'll want the enable to be active for longer than it takes to shift through the bits). The first three bits will take about 100+20+100+40+100+80 microseconds (440us in total). The next four bits will take about 160+320+640+1280 (2400us), for a total of about 2840us. Doing that for all four rows will take under 12ms, so an 60Hz refresh rate should be no problem.
One slight limitation with this approach is that you should make sure that you don't try to change the brightness of the lights on a row while that row is being processed. Otherwise, if e.g. a light's brightness changes from 63 to 64 between the times bits 5 and 6 are output, the light may be turned on during the first 6 bit times (since bits 0-5 are all set, even though bit 6 is clear), and then on for the last bit time (since bit 6 will be set, even though bits 0-5 are clear), thus causing it to appear briefly at full brightness. If you latch the brightness for the lights in a row before scanning that row, however, such difficulties should be avoided.