In the worst case (for 40 AABBs) you'll need to do 780 tests. There's not going to be any problems with a simple 4 comparison test:
a_min.x <= b_max.x
b_min.x <= a_max.x
a_min.y <= b_max.y
b_min.y <= a_max.y
If all of these are true, you have an intersection.
At this point partition structures (like AABB trees) are not necessary at all. You might only want to start thinking about them when test count goes over 10 000, which is when you need to test about >140 AABBs.
Best to avoid using the broadphase for as long as you can because it will introduce a lot of complexity and hard choices and measurements. Not one of them is going to give a good speed improvement without serious workload and staying within special constraints with your AABBs.