First of all, khz and mhz are not data transfer rates. They are frequencies - "something per second" - did you mean kilobits/second or kilobytes/second (both, confusingly, abbreviated to kbps by various people).
If you have not already done so, you can specify a bitrate for your communications - serial.begin takes a parameter which specifies the speed - most example seem to use 9600 bits/second, you can crank this all the way up to 115,200 bits/second - 12 times the speed. I don't know if this will work with a Teensy, and/or USB. You will probably at least need to specify this in your terminal program and/or pySerial.
I wonder what you are doing that requires 1Mbyte/second communication with an Arduino, and if you can do that a little smarter?
At that rate, the 64kbytes of memory can be filled up in 1/16th of a second.
You can get a faster "information transfer" by increasing the information density - for example, the English word "True" takes up 4 characters = 32 bits , but a logical true value only takes up 1 bit.
If you are processing the data on the Arduino, at 1mbyte/second, you have 72 instructions to do something with that byte before the next one is there - not much time.
You could increase the data rate by pushing the data in in parallel, using multiple pins - e.g. 32 bits in parallel, one on each of 32 pins, one data point per cycle. There would be no practical way to assimilate data at that speed.
If you are sending this from a PC, perhaps some of the data could be processed on the PC before sending?
Another option would be to divide the work over multiple Teensies, with the PC combining the work - e.g. if you have temperature sensors that output 4 byte values, you want to read 250 of them at a rate of 1000 samples per second (= total of 1mbyte/second), then you might be able to run 25 Teensies, each sampling 10 sensors, and each pushing data at 40kbytes/second, rather than one running at 1mbyte/second.
Quite frankly, these are all fairly contrived examples - most sensors can't be read that quickly; even an Arduino's analog read is limited to 10,000 reads per second - if you do nothing else. This produces 100 kbits/second, or 20kbytes per second (each 10-bit read is returned as two bytes, with the remaining 6 bits being 0).